For preservation and long-term access, data collection should be accompanied with proper documentation and associated metadata.
Files should include:
- Data itself (image, database, spreadsheet, etc.)
- Documentation file (Readme.txt file, Lab notebook), including the description, file naming conventions, and methodology of how the data was collected
- Metadata (key or reference to each data field) ie. format the data was collected.
On this page:
Metadata is a standardized way of organizing data which explains the who, what, where, when of data creation and methods of use. Metadata provides the essential tools for discovery, such as a bibliographic citation, and reuse.
For data sets to be interoperable, they must be organized in a standardize way. Many disciplines have a metadata standard for data collection interoperability. Examples of well-used metadata standards include:
- Social Sciences
- DDI: The Data Documentation Initiative is an effort to establish an international XML-based standard for the content, presentation, transport, and preservation of documentation (i.e., metadata) for datasets in the social and behavioral sciences. The metadata standard created by DDI is called DDI metadata specification, which is often shortened to DDI.
- List of Social Sciences metadata standards includes standards, tools, and use cases.
- Library Science
- Dublin Core: a general purpose metadata standard for describing networked resources.
- MODS: Metadata Object Description Schema, a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. METS (Metadata Encoding and Transmission Standard) is a useful variation.
- MARC (Machine Readable Cataloging) is a standard for cataloging books and other library materials.
- List of Repository metadata standards includes standards, tools, and use cases.
In addition to structure metadata standards, ontologies provide a common language that may be useful with applied in conjunction with other forms of descriptive methods. Organized ontologies are available for a number of disciplines.
The Data Management Glossary has more detailed information on the various terminologies. Here are some examples of ontologies currently in use:
- Gene Ontology
- Medical Subject Headings (MeSH)
- WorldNet: lexical database of English
- Web Ontology Language (OWL) used for the semantic web
Example Schema: Metadata Element Set
Dublin Core is a simple metadata standard that can be adapted. Here are the primary components.
An entity responsible for making contributions to the resource.
The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant.
An entity primarily responsible for making the resource.
A point or period of time associated with an event in the lifecycle of the resource.
An account of the resource.
The file format, physical medium, or dimensions of the resource.
An unambiguous reference to the resource within a given context.
A language of the resource.
An entity responsible for making the resource available.
A related resource.
Information about rights held in and over the resource.
A related resource from which the described resource is derived.
The topic of the resource.
A name given to the resource.
The nature or genre of the resource.
File Naming Conventions
- Be consistent.
- Have conventions for naming (1) Directory structure, (2) Folder names, (3) File names
- Always include the same information (eg. date and time)
- Retain the order of information (eg. YYYYMMDD, not MMDDYYY )
- Be descriptive so others can understand your meaning. Include other relevant information such as:
- Unique identifier (ie. Project Name or Grant # in folder name)
- Project or research data name
- Conditions (Lab instrument, Solvent, Temperature, etc.)
- Run of experiment (sequential)
- Date (in file properties too)
- Use application-specific codes in 3-letter file extension: MOV, TIF, WRL
- Keep track of versions (version control)
- Use a sequential numbered system: v. 1, v. 2
- Don't use confusing labels: revision, final, final2, etc.
- Consider version control software (SVN) such as TortoiseSVN
- Record every change - no matter how small
- Discard obsolete versions (but not the raw copy)
- Use auto-backup instead of self-archiving multiple versions
File Name Example:
File Naming Applications
If you have many files already named, use a file renaming application such as: