Digital preservation glossary
This glossary defines terms and abbreviations used when discussing the management, preservation and access of digital materials. (Glossary terms were compiled from multiple sources.)
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Sources
Access: The services and functions which make the archival information holdings and related services visible to Consumers and authorized users. This includes restricting access in some instances due to copyright, confidentiality, or statutory requirements.
Access Collection: A collection of Archival Information Packages (AIPs) that is defined by a Collection Description but for which there is no Packaging Information for the collection in Archival Storage. 
Access Copy: A copy/version made from a digital object that is intended for use, so that the original item can be preserved and protected from damage. This is typically an image with low resolution allowing users to see the item such as with a thumbnail or an image preview. /
Access Format: A chosen format for the Access Copy of a digital object. 
Access Policy: A set of rules and guidelines that determine how the institution's collections, services, products and databases are accessed. 
Access Rights Information: The information that identifies the access restrictions pertaining to the Content Information, including the legal framework, licensing terms, and access control. It contains the access and distribution conditions stated within the Submission Agreement, related to both preservation (by the OAIS) and final usage (by the Consumer). It also includes the specifications for the application of rights enforcement measures. See also Permissions. 
Administration / Administration Functional Entity: The functional entity that contains the services and functions needed to control the operation of the other OAIS functional entities on a day-to-day basis. /
AIC: See Archival Information Collection.
AIFF (.aif): Audio Interchange File Format. Most often an uncompressed audio file commonly used by Apple computers, but able to be read by other platforms/devices. File extension is .aif.
AIP: See Archival Information Package.
American Standard Code for Information Interchange (ASCII): A character-encoding scheme used by many computers. The ASCII standard uses 7 of the 8 bits in a byte to define the codes for 128 characters. Example: In ASCII, the number "7" is treated as a character and is encoded as: 00010111. Because a byte can have a total of 256 possible values, there are an additional 128 possible characters that can be encoded into a byte, but there is no formal ASCII standard for those additional 128 characters. Most IBM-compatible personal computers do use an IBM "extended" character set that includes international characters, line and box drawing characters, Greek letters, and mathematical symbols. See also EBCDIC. 
API: See Application Programming Interface.
Application Programming Interface (API): An application programming interface (API) is a set of definitions of the ways in which one piece of computer software communicates with another. 
Archival Information Collection (AIC): An Archival Information Package whose Content Information is an aggregation of other Archival Information Packages. 
Archival Information Package (AIP): An Information Package, consisting of the Content Information and the associated Preservation Description Information (PDI), which is preserved within a system. The AIP often consists of the original files deposited, processed versions of data files and documentation, normalized files, and associated metadata. /
Archival Storage [Archival Storage Functional Entity]: The OAIS functional entity that contains the services and functions used for the storage and retrieval of Archival Information Packages. 
Archival Value: The ongoing usefulness or significance of records, based on the administrative, legal, fiscal, evidential, or historical information they contain, that justifies their continued preservation. 
Archive [noun]: 1. An organization that intends to preserve information for access and use by a Designated Community. 2. A data archive is a site where machine-readable materials are stored, preserved, and possibly redistributed to individuals interested in using the materials. 
Archive [verb]: To place or store in an Archive [noun]. 
Archive [data management]: A collection of data objects, perhaps with associated metadata, in a storage system whose primary purpose is the long-term preservation and retention of that data. 
Arrangement (and Description): The intellectual process of describing and putting objects into order in accordance with accepted archival principles, particularly those of provenance and original order. See Description for more information. 
ASCII: See American Standard Code for Information Interchange.
Associated Description: The information describing the content of an Information Package from the point of view of a particular Access Aid. 
Audit Trail: Data that allows the reconstruction of a previous activity, or which enables attributes of a change (such as date, time or operator) to be stored so that a sequence of events can be documented in the correct chronological order. It is usually in the form of a database or one or more lists of activity data. 
Authentication [User Authentication]: In the IT context, it is the process of establishing, to the required level of confidence, the identity of one or more parties to a transaction. Consists of identity management (establishing who you are and login management (confirming who you are). 
Authentication [Object]: A mechanism that attempts to establish the authenticity of digital materials at a particular point in time. Digital signatures and Hash Values are possible mechanisms. 
Authentication Key: A method used by an individual to authenticate his or her identity over the Internet. Examples of authentication keys include passwords, one-time passwords, software tokens, hardware tokens and biometrics. Authentication keys are also referred to as 'keys'. 
Authenticity: A mechanical characteristic of any digital object that reflects the degree of trustworthiness in the object, in that the supportive metadata accompanying the object makes it clear that the possessed object is what it purports to be. 
Authorization: 1. An "authorization" is a right or a permission that is granted to a system entity to access a system resource. 2. An "authorization process" is a procedure for granting such rights. 3. To "authorize" means to grant such a right or permission. 
AVI (.avi): Audio Video Interleave is a container format for video. There are both compressed and uncompressed codecs that can be used. The file extension is .avi.
Back to top
Backup Copy: A copy of information maintained separate from the original as a safeguard against disaster or computer failure. Back-ups are usually copied to storage devices that can be removed from the computer and kept separately from the original. The essential attribute of a back-up copy is that the information it contains can be restored in the event that access to the master copy is lost. 
Best Practices: Procedures and guidelines that are widely accepted because experience and research has demonstrated that they are optimal and efficient means to produce a desired result. 
Binary Format: Any file format in which information is encoded in some format other than a standard character-encoding scheme. A file written in binary format contains information that is not displayable as characters. Software capable of understanding the particular binary format method of encoding information must be used to interpret the information in a binary-formatted file. Binary formats are often used to store more information in less space than possible in a Character Format file. They can also be searched and analyzed more quickly by appropriate software. A file written in binary format could store the number "7" as a binary number (instead of as a character) in as little as 3 bits (i.e., 111), but would more typically use 4 bits (i.e., 0111). Binary formats are not normally portable, however. Software program files are written in binary format. Examples of numeric data files distributed in binary format include the IBM-binary versions of the Center for Research in Security Prices files and the U.S. Department of Commerce's National Trade Data Bank on CD-ROM. The International Monetary Fund distributes International Financial Statistics in a mixed-character format and binary (packed-decimal) format. SAS and SPSS store their system files in binary format. 
Binary Number: A number written using binary notation which only uses zeroes and ones. Example: Decimal number 7 in binary notation is: 111. 
Bit: A bit is the smallest unit of information that a computer can work with. Each bit is either a "1" or a "0". Often computers work with groups of bits rather than one bit at a time; the smallest group of bits a computer usually works with is a Byte, which is 8 bits. 
Bit-Level Preservation: A baseline preservation approach that ensures the integrity of digital objects and associated metadata over time, even as the physical storage media which houses them evolves and changes. 
Bit Loss: The corruption of the lowest level of information associated with digital data in transmission or during storage. 
Bitstream: Contiguous or non-contiguous data within a file that has meaningful common properties for preservation purposes. 
Born-digital: A descriptor for information that is created in digital form, as opposed to digitized from analog sources. 
Broadcast Wave (BWF or BWAV): A file format intended for the exchange of audio material between different broadcast environments and equipment based on different computer platforms. Based on the Microsoft WAVE audio file format, Broadcast Wave adds a required "Broadcast Audio Extension" (bext) chunk to hold the minimum information considered necessary for broadcast applications. File extensions include .wav, .bwf, and .bwav.
Business Continuity: Describes the processes and procedures an organization puts in place to ensure that essential functions can continue during and after a disaster. 
Byte: Eight Bits. A byte is simply a chunk of 8 ones and zeroes. For example: 01000001 is a byte. A computer often works with groups of bits rather than individual bits and the smallest group of bits that a computer usually works with is a byte. A byte is equal to one column in a file written in character format. 
Back to top
Cataloging: The process of arrangement and description of collections to produce a structured list or catalog, which enables users to locate the data resources they need.
CD-ROM: Compact Disc Read-Only Memory (CD-ROM) is a storage medium. Data are "stamped" onto the disc during the burning/saving process. The disc is read-only. A variant has appeared that is rewritable (CD-RW), but this variant is not recommended for dissemination of archival data. 
CDO: See Content Data Object.
Chain of Custody: A process used to maintain and document the chronological history of the handling, including the transfer of ownership, of any arbitrary digital file from its creation to a final state version. See also Provenance [Information] and Audit Trail. 
Character-Encoding Scheme: A method of encoding characters including alphabetic characters (A-Z, uppercase and lowercase), numbers 0-9, punctuation and other marks (e.g., comma, period, space, &, *), and various "control characters" (e.g., tab, carriage return, linefeed) using binary numbers. For a computer to print a capital "A" or a number "7" on the computer screen, for instance, we must have a way of telling the computer that a particular group of bits represents an "A" or a "7". There are standards, commonly called "character sets," that establish that a particular byte stands for an "A" and a different byte stands for a "7". The two most common standards for representing characters in bytes are ASCII and EBCDIC. 
Character Format: Any file format in which information is encoded as characters using only a standard character-encoding scheme. A file written in "character format" contains only those bytes that are prescribed in the encoding scheme as corresponding to the characters in the scheme (e.g., alphabetic and numeric characters, punctuation marks, and spaces). A file written in the ASCII character format, for instance, would store the number "7" in eight bits (i.e., one byte): 00010111. A file written in EBCDIC would store the number "7" in eight bits as 11110111. Contrast with binary format. 
Checksum: An algorithmically-computed numeric value for a file or a set of files used to validate the state and content of the file for the purpose of detecting accidental errors that may have been introduced during its transmission or storage. The integrity of the data can be checked at any later time by re-computing the checksum and comparing it with the stored one. If the checksums match, the data was almost certainly not altered. See also Fixity Check. 
Client: An application component which requests services from a server. 
Codec: A codec is the means by which sound and video files are compressed for storage and transmission purposes. There are various forms of compression: 'lossy' and 'lossless', but many codecs perform lossless compression because of the much larger data reduction ratios that occur with lossy compression. Most codecs are software, although in some areas codecs are hardware components of image and sound systems. Codecs are necessary for playback, since they uncompress [or decompress] the moving image and sound files and allow them to be rendered. 
Collection Policy: A type of Package Description that is specialized to provide information about an Archival Information Collection for use by access aids.
Collection Policy: The official statement issued by an archive identifying types of data resources it will collect or acquire and the terms and conditions under which it will do so.
Complex Digital Object: A group of multiple digital entities that are managed and preserved as one or more groups. 
Compression: A method of reducing the size of computer files. There are several compression programs available, such as gzip and WinZip. 
Compression Ratio or Reduction Ratio The ratio that is used to discuss the quantity of original data versus the quantity of data after compression. 
Conceptual Data Model: A data model that represents an abstract view of the real world. (ISO 11179-3) A higher-level data artifact that is often used to explore domain concepts with project stakeholders. Logical data models are often derived from conceptual data models. At this level, the data modeler attempts to identify the highest-level relationships among the different entities. 
Consumer: The role played by those persons, or client systems, who interact with OAIS services to find preserved information of interest and to access that information in detail. Also called User. /
Content Data Object: The Data Object, that together with associated Representation Information, comprises the Content Information. 
Content Information: A set of information that is the original target of preservation or that includes part or all of that information. In other words, it is an Information Object composed of its Content Data Object and its Representation Information. 
Context Information: The information that documents the relationships of the Content Information to its environment. This includes why the Content Information was created and how it relates to other Content Information objects. 
Conversion: The process of changing a record’s file format, from one format to another, often to make the record software-independent and in a standard or open format. ‘Reformat’ is often used interchangeably with ‘conversion’. /
CRC: Cyclical Redundancy Check 
Copy [noun]: A bitwise reproduction. A copy is identical to the original, bit for bit, except in some cases for the unique identifier (e.g., file name). 
Copy [verb]: The act of creating a bitwise reproduction. 
Copyright: A statutory right that grants creators (authors) certain exclusive rights in their creations for a legally established duration of time. See also: Rights Owner, Proprietary. 
Back to top
DAT: Digital Audio Tape; a high-density storage medium. 
Data: A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, or the recording of sounds made by a person speaking. 
Data Dictionary: A formal repository of terms used to describe data. 
Data Management: Principles, processes and systems for the sharing and management of data. 
Data Management Functional Entity: The OAIS entity that contains the services and functions for populating, maintaining, and accessing a wide variety of information. Some examples of this information are catalogs and inventories on what may be retrieved from Archival Storage, processing algorithms that may be run on retrieved data, Consumer access statistics, Consumer billing, Event Based Orders, security controls, and OAIS schedules, policies, and procedures. 
Data Submission Session: A delivery of media or a single telecommunications session that provides Data to an OAIS. The Data Submission Session format/contents is based on a data model negotiated between the OAIS and the Producer in the Submission Agreement. This data model identifies the logical constructs used by the Producer and how they are represented on each media delivery or in the telecommunication session. 
DBMS: Database Management System 
Decompression: The process used to restore data to uncompressed form after compression. 
Deposit [noun]: One or several digital resources received by an archive for preservation. Alternative: The data resource(s) placed in the custody of an archive without transfer of legal title. Also referred to as submission and Submission Information Package (particularly in the OAIS reference model). 
Deposit [verb]: To place digital resources into an archive.
Depositor: An individual, group, or organization which offers a Deposit [noun] to an archive for preservation and dissemination. 
Derivative: A transformed version of an original source file, often called an access, delivery, viewing or output file, used to facilitate access to, preservation of, or additional use of the content. An Access Copy is one version of a derivative. Other types of derivatives might be created for long-term preservation purposes. 
Derived AIP: An Archival Information Package (AIP) generated by extracting or aggregating information from one or more source AIPs. 
Description: The process of recording information about the nature and content of the records in archival custody. The description identifies such features as provenance, arrangement, format and contents, and presents them in a standardized form. 
Descriptive Information: The set of information, consisting primarily of Package Descriptions, which is provided to Data Management to support the finding, ordering, and retrieving of OAIS information holdings by Consumers. 
Designated Community: An OAIS concept describing the constituency for which the archived information should be relevant and understandable. The Designated Community is often composed of multiple user communities and includes depositors, producers, and users/consumers. A Designated Community is defined by the Archive and this definition may change over time. /
Digital Archive: A repository for the long-term maintenance of digital resources often with the purpose of making them available. 
Digital Content: Any arbitrary item created, published or distributed in a digital form, including, but not limited to, text, data, sound recordings, photographs and images, motion pictures and software. Used interchangeably with Digital Materials. 
Digital Curation: Digital curation is all about maintaining and adding value to a trusted body of digital information for future and current use; specifically, the active management and appraisal of data over the entire life cycle. Digital curation builds upon the underlying concepts of digital preservation whilst emphasizing opportunities for added value and knowledge through annotation and continuing resource management. Preservation is a curation activity, although both are concerned with managing digital resources with no significant (or only controlled) changes over time. 
Digital Fingerprint: A bit sequence generated from a digital document using an algorithm that uniquely identifies the original document. Often referred to as Checksum value or an item's Fixity Information.
Digital Forensics: Digital forensics (sometimes known as digital forensic science) is a branch of forensic science encompassing the recovery and investigation of material found in digital devices. The technical aspect of an investigation is divided into several sub-branches, relating to the type of digital devices involved; computer forensics, network forensics, forensic data analysis and mobile device forensics. The typical forensic process encompasses the seizure, forensic imaging (acquisition) and analysis of digital media and the production of a report into collected evidence. Digital forensics can be used to identify sources (for example, in copyright cases), or authenticate documents. 
Digital Materials: Any arbitrary item created, published or distributed in a digital form, including, but not limited to, text, data, sound recordings, photographs and images, motion pictures and software. Used interchangeably with Digital Content. 
Digital Migration: The transfer of digital information, while intending to preserve it, within the OAIS. It is distinguished from transfers in general by three attributes: a focus on the preservation of the full information content that needs preservation; a perspective that the new archival implementation of the information is a replacement for the old; and an understanding that full control and responsibility over all aspects of the transfer resides with the OAIS. 
Digital Object: An object composed of a set of bit sequences. 
Digital Preservation: A term that encompasses all of the activities, policies, strategies and actions required to ensure that the digital content designated for long-term preservation is maintained in usable formats, for as long as access to that content is needed or desired, and can be made available in meaningful ways to current and future users, for as long as necessary regardless of the challenges of media failure and technological change. Digital preservation goals include ensuring enduring usability, authenticity, discoverability, and accessibility of content over the very long term. /
Digital Rights Management: An umbrella term referring to any of several technical methods used to control or restrict the use of digital content. 
Digital Signature: Data which, when appended to a digital document, enable the user of the document to authenticate its origin and integrity.
Digitization: The process of converting an analogue document (paper, microform, film, analogue audio or audiovisual tapes) to digital format for the purpose of preservation or access.
DIP: See Dissemination Information Package.
Dissemination Format: A format used to present a digital resource to a user who has requested it. This may or may not be the same format as the original. See also Access Format, Preservation Format. 
Dissemination Information Package (DIP): The Information Package, derived from one or more AIPs, received by the Consumer in response to a request to the OAIS." An Archive works with Consumers over time to ensure that DIPs remain useful. /
Document Type Definition (DTD): A set of rules that applies SGML (Standard Generalized Markup Language) or XML (eXtensible Markup Language) to the markup of documents of a particular type. A DTD provides a list of the elements, attributes, comments, notes, and entities that may be used in the document, as well as their relationships to one another. 
DRM: Digital Rights Management 
DTD: See Document Type Definition.
Dublin Core: Dublin Core is a 15-element metadata element set intended to facilitate discovery of electronic resources. Dublin Core can also refer to the Dublin Core Metadata Initiative as an organization, or the wider set of properties and vocabularies maintained by DCMI. For more details see http://www.dublincore.org. 
Back to top
EBCDIC: EBCDIC stands for Extended Binary Coded Decimal Interchange Code and is a character-encoding scheme used by IBM mainframe computers and some other computers. Unlike ASCII, the EBCDIC standard specifies use of the entire 8 bits of each byte. Example: In EBCDIC the number "7" is treated as a character and is encoded as: 11110111. 
Electronic Records: Records created digitally in the day-to-day business of the organization and assigned formal status by the organization. For example, they may include word processing documents, emails, databases, or intranet web pages.
Element (XML): See XML: Element.
Emulation: A means of overcoming technological obsolescence of hardware and software by developing techniques for imitating obsolete systems on future generations of computers. 
Encryption: The process of encoding messages, including electronic data, for security purposes. 
Event Based Order: A request that is generated by a Consumer for information that is to be delivered periodically on the basis of some event or events. 
Events Log: Documentation which records audit trail data related to the system operations.
eXtensible Markup Language: See XML.
Back to top
File: A named and ordered sequence of bytes that is known by an operating system. 
File Compression: The process for reducing the file size of digital objects for storage, processing and transmission purposes.
File Format: 1. An attribute of a file which describes its encoding. 2. The organization of information for storage, printing, or displaying. 3. The specification of how the bits stored in a file should be interpreted. Examples include: .msg, .pdf, TIFF v6, JPEG 2000, Microsoft Word 7.0. See also: File, Access Format, Dissemination Format, Preservation Format. 
Finding Aid: A type of access aid or reference material such as a catalog, list or index providing information on the data resources held by an archive allowing users to search for and identify Archival Information Packages of interest. /
Fixity Check: A mechanism to verify that a digital object (file or bitstream) has not changed or been altered in an undocumented manner. Checksums, message digests and Digital Signatures are examples of methods for running fixity checks. Fixity Information, the information created by these fixity checks, provides evidence for the integrity and authenticity of the digital objects and are essential to enabling trust. 
Fixity Information: The information which documents the authentication mechanisms and provides Authentication Keys to ensure that the data resource (Content Information) has not been altered in an undocumented manner. An example is a Cyclical Redundancy Check (CRC) code for a file. See also: Checksum, Digital Signature. /
Format: Specific, pre-established structure for the organization of a file or bitstream. See File Format. 
Format Migration: A means of overcoming technical obsolescence by preserving digital content in a succession of current formats or by transforming the original format into the current best practice format for presentation. The purpose of format migration is to preserve the digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. 
Format Registry: An accessible compilation of information on file formats. It may provide identifiers for formats, definitive names, methods of identification, descriptions and other information useful for identifying preservation needs. Example is PRONOM. 
Format Verification: Process of checking that a file in a given format is complete and conforms with the format's technical specification. Ex. DROID: Digital Record Object IDentification. 
FTP: File Transfer Protocol (FTP) is a reliable method of transferring files electronically over the Internet. 
Back to top
GIF: A file format that supports up to 256 colors and compresses file size without loss of image quality. GIF format works best on line drawings (such as Clip Art) that contain few colors, or on pictures that use large blocks of solid color.
Back to top
Hardcopy: Documentation and data resources in physical formats (e.g., paper) capable of being read without the assistance of a technical device. 
HFMS: Hierarchical File Management System 
Holdings: The whole of the archival material and collections found in an archives. See also: Collection, Data Resource. 
HTML: HyperText Markup Language (HTML) is a hypertext document format based on SGML used on the Web. Tags are embedded in the text to control display and presentation of a document. 
Back to top
Identifier: An identifier is a language-independent label, sign or token that identifies an object from another object. See Unique Identifier, Persistent Identifier. 
IE: See Intellectual Entity.
Information: Any type of knowledge that can be exchanged. In an exchange, information is represented by data. An example is a string of bits (the data) accompanied by a description of how to interpret the string of bits as numbers representing temperature observations measured in degrees Celsius (the Representation Information). 
Information Object: A Data Object together with its Representation Information. 
Information Package: A logical container composed of optional Content Information and optional associated Preservation Description Information. Associated with this Information Package is Packaging Information used to delimit and identify the Content Information and Package Description information used to facilitate searches for the Content Information. 
Information Property Description: That part of the Content Information as described by the Information Property Description. The detailed expression, or value, of that part of the information content is conveyed by the appropriate parts of the Content Data Object and its Representation Information. 
Ingest: (1) The OAIS functional entity that contains the services and functions that accept Submission Information Packages from Producers, prepares Archival Information Packages for storage, and ensures that Archival Information Packages and their supporting Descriptive Information become established within the OAIS. (2) To accept one or many Submission Information Packages (SIPs) into an Open Archival Information System (OAIS). /
Integrity: Internal consistency or lack of corruption in electronic data. See Checksum and Fixity Check. 
Intellectual Entity (IE): A coherent set of digital objects or a singular digital object that is described as a unit, for example, a book, a map, a photograph, or a serial. 
ISO: International Organization for Standardization 
Back to top
Java Servlet: Technology that provides Web developers with a simple, consistent mechanism for extending the functionality of a Web server and for accessing existing business systems. A servlet can be thought of as an applet that runs on the server side -- without a face. 
JPEG (.jpg): An abbreviation for the Joint Photographic Expert Group. JPEG is a method of lossy compression used in digital images. JPEG is also an image file format, most often seen with the extension .jpg.
JPEG2000 (.jp2) A standard method for image compression that does not compromise image quality. JPEG2000 is also a format with the file extension .jp2 or .jpx.
Back to top
Keyword: Keywords are used to retrieve documents in an information system, for instance in a catalog or when using a search engine. 
Knowledge Base: A set of information, incorporated by a person or system, that allows that person or system to understand received information. 
Back to top
Legacy System: Previous generation or version of a system (information technology architectures) and its contents (legacy data) which needs special treatment to make it usable in a current IT environment.
Life Cycle: A set of iterative, modular processes that govern the creation, acquisition, selection, description, sustainability, access and preservation of digital content over time. 
Logical Record: All the data for a given unit of analysis. It is distinguished from a physical record because it may take several physical records to store all the data for a given unit of analysis. For instance, in card image data, a "card" is a physical record and it usually takes several "cards" to store all the information for a single case or unit of analysis. 
Long-Term: A period of time long enough for there to be concern about the impacts of changing technologies, including support for new media and data formats, and of a changing Designated Community, on the information being held in an OAIS. This period extends into the indefinite future. 
Long-Term Preservation: The act of maintaining information, independently understandable by a Designated Community, and with evidence supporting its Authenticity, over the Long-Term. 
Lossless Compression: The use of a compression algorithm which causes no loss of original information during compression. Resulting files are generally larger than those compressed using lossy compression algorithms.
Lossy Compression: A use of a compression algorithm which causes the loss of some of the original information during compression. Resulting files are generally smaller than those compressed using lossless compression algorithms.
Back to top
Management: The role played by those who set overall OAIS policy as one component in a broader policy domain. /
Markup: The characters and codes that change a text document into an XML or other Markup Language document. This includes the < and > characters as well as the elements and attributes of a document.
Metadata: Structured information that describes the context, content and structure of a document and their management over time to allow users to find, manage, control, understand or preserve information over time. See Metadata:Administrative, Metadata: Descriptive, Metadata: Event, Metadata: Preservation, Metadata: Rights Management, Metadata: Structural, and Metadata: Technical. 
Metadata: Administrative: Information needed to help manage the digital object. Often included in administrative metadata is rights management, technical, and preservation information. 
Metadata: Descriptive: Metadata that identifies a resource and describes its intellectual content for purposes such as discovery, identification, and use. 
Metadata: Event: Metadata which provides an audit trail of actions by an agent on an object. Sometimes considered a specific type of Preservation Metadata. 
Metadata: Preservation: The contextual information necessary to carry out, document, and evaluate the processes that support the long-term retention and accessibility of digital content. Preservation metadata documents the technical processes associated with preservation (Migration/Refreshing), specifies rights management information, establishes the authenticity of digital content, and records the chain of custody and provenance for a digital object. 
Metadata: Rights Management: Administrative metadata that indicates the copyrights, user restrictions, and license agreements that might constrain the end-use of digital content (including metadata files). 
Metadata: Structural: Information that provides information on how the digital object is organized or how compound objects are put together or related. This may include the page or chapter order of a book, its table of contents or indexes. Structural metadata is often used by software programs. 
Metadata: Technical: Information about aspects of the object often closely related either to its file format or the original software used to create the file. This may include things like the scanning equipment used to create a digital object and the settings used to create/modify it. 
Metadata Schema: A metadata schema defines a framework for representing metadata. In general it includes definitions of terms used in the schema, structural constraints and data structure definitions, and bindings to physical description syntax. For more information please see "A Metadata Schema Registry as a Tool to Enhance Metadata Interoperability". 
Metadata Encoding and Transmission Standard (METS): The Metadata Encoding and Transmission Standard (METS) is a standard for encoding descriptive, administrative, and structural metadata about objects within a digital library, expressed using XML. See METS. 
Metadata Object Description Schema (MODS): The Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes and particularly for library applications. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress with input from users. See MODS.
METS: See Metadata Encoding and Transmission Standard.
Migration: Set of organized tasks designed to achieve the periodic transfer of digital materials from one hardware or software configuration to another, from one generation of computer technology or system to a subsequent generation, or from one format to another. 
MPEG: MPEG stands for the Moving Picture Experts Group, which is a working group of the ISO/IEC with the mission to develop standards for coded representation of digital audio and video (http://mpeg.chiariglione.org/). Standards created often are named MPEG-(x) and file extensions are based on the standard. Some common extensions are .mp3, .mp4, and .m4a.
MODS: See Metadata Object Description Schema.
MOV (.mov): Associated with QuickTime (Apple) environment, this is a multimedia container file for video. File extension is .mov. See QuickTime.
Back to top
Native Format: The format in which the record was created or in which the originating application stores records. 
Network: A number of computers connected together to share information and hardware. A Local Area Network (LAN) is small, usually confined to a single building or group of buildings. A Wide Area Network (WAN) is a large system of LAN’s with many computers linked together. 
Network File System (NFS): A Network File System is a process for mounting magnetic disks on a network so that disks not physically attached to a computer can be accessed as if they were physically attached.
Non-Reversible Transformation: A Transformation which cannot be guaranteed to be a Reversible Transformation. 
Normalization: In a preservation context, normalization refers to a preservation strategy that involves the imposition of "standard" formats and rules to create preservable file formats. Normalization has specific connotations within the database (e.g., normalized tables), the Web (e.g., normalized URLs), and other communities, but the essence of the term is to standardize for more effective processing and exchange of information. 
Back to top
OAI-PMH: The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a lightweight harvesting protocol for sharing metadata between services. In the OAI context, harvesting refers specifically to the gathering together of metadata from a number of distributed repositories into a combined data store. See Open Archives Initiative. 
OAIS: See Open Archival Information System.
Offline (storage): See Storage: Offline.
Offsite (storage): See Storage: Offsite.
Online (storage): See Storage: Online.
Open Archival Information System (OAIS): The Open Archive Information System (OAIS) Reference Model is an ISO standard that formally expresses the roles (producer, management, consumer, and implicitly archives), functions (common services, ingest, archival storage, data management, administration, preservation planning, and access), and content (submission information package, archival information collection, archival information package, and dissemination information package) of an archive. It was approved as an ISO standard in 2003 and updated in 2012: ISO 14721:2012. More information can be found here. 
Open Format: In a computer environment, an open format is a data format that is not considered proprietary and is free of commercial ownership or patents. Typically the technical specifications for the format are also publicly available, allowing users to alter and develop the format to suit their specific needs. 
Open Source: Open source refers to software in which the source code is available to the general public for use and/or modification from its original design. Open source code is typically created as a collaborative effort in which programmers improve upon the code and share the changes within the community. 
Open Standard: Recognized national or international platform-independent standards. They are often developed collaboratively through due process, are often vendor-neutral and do not rely on commercial intellectual property. 
Open Systems: Systems (usually operating systems) that are not tied to a particular computer system or hardware manufacturer. An example is the UNIX operating system, with versions available for a wide variety of hardware platforms. 
Operating Environment: All the hardware and software that is needed to run a digital resource. 
Operating System: The special software required to make a computer work. It provides the link between the user and the hardware. 
Order Agreement: An agreement between the Archive and the Consumer in which the physical details of the delivery, such as media type and format of Data, are specified. 
Organizational Unit: A department, division, program, sector or other group working to curate and preserve a digital collection. 
Original Version: The original deposited data resource that is preserved without any changes or alterations to the content. 
Back to top
Package [noun]: Any arbitrary container of digital data. 
Package [verb]: The act of creating an arbitrary container of digital data. 
Packaging Information: The information that is used to bind and identify the components of an Information Package. For example, it may be the ISO 9660 volume and directory information used on a CD-ROM to provide the content of several files containing Content Information and Preservation Description Information. 
PDF: See Portable Document Format.
PDI: See Preservation Description Information.
Permissions: The access available to system users attached to specific roles in a computing environment, as well as the mechanism for administering access to a specific object on a computer system. Depending on the system or application, permissions can be defined for a specific user, specific groups of users, or all users; or for a role, or groups of roles; or based on one or more user attributes. Access Rights are a specific type of permission. 
Persistent Identifier: A persistent identifier is a language-independent label, sign or token that identifies an object from another object that cannot be changed over time. See Identifier and Unique Identifier. 
PI: See Principal Investigator (PI).
Plain Text File (.txt): A file format consisting of text with limited or no formatting. The file extension is .txt.
Portable File: In computer usage, a file or program is "portable" if it can be used by a variety of software on a variety of hardware platforms. SPSS portable files can be produced using the "export" command. 
Portable Document Format (PDF): A universal file format that retains the page layout, typography, and graphics of the original document and can be viewed, printed, and searched with viewer software such as Adobe Acrobat. The file extension is .pdf. 
Portal Document Format - Archival (PDF/A) An ISO-standardized version of the Portable Document Format (PDF) specialized for the digital preservation of electronic documents. There are three such standards. PDF/A-1 is based on the PDF Reference Version 1.4 from Adobe Systems Inc. (implemented in Adobe Acrobat 5 and later versions) and is defined by ISO 19005-1:2005. PDF/A-2 is based on ISO 32000-1 – PDF 1.7 and is defined by ISO 19005-2:2011, published on June 20, 2011 under the formal name Document management – Electronic document file format for long-term preservation – Part 2: Use of ISO 32000-1 (PDF/A-2). PDF/A-3 is based on ISO 32000-1 – PDF 1.7 and is defined by ISO 19005-3:2012, published on October 15, 2012 under the formal name Document management — Electronic document file format for long-term preservation — Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3). 
PREMIS: See Preservation Metadata Implementation Strategies.
Preservation: The processes and operations in ensuring the technical and intellectual survival of digital objects through time. 
Preservation Copy: A copy made and used to preserve the intellectual content of a digital resource. 
Preservation Description Information (PDI): The information which is necessary for adequate preservation of the Content Information and which can be categorized as Provenance, Reference, Fixity, Context, and Access Rights Information. 
Preservation Format: A format chosen for preservation purposes based on standards and best practices. One resource for choosing a preservation format is the Sustainability Factors section of the Library of Congress Sustainability of Digital Formats page. Other formats may be chosen for different purposes. See also: Access Format, Dissemination Format. 
Preservation Metadata Implementation Strategies (PREMIS): The Preservation Metadata Implementation Strategies (PREMIS) working group has established a data model and data dictionary for preservation metadata. The PREMIS Data Dictionary for Preservation Metadata is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. For more information on the Preservation Metadata Standard see PREMIS. 
Preservation Repository: A repository that intends to preserve and manage content in perpetuity, or for as long as needed. The repository may also enable access to the digital content. 
Preservation Planning: The OAIS functional entity that provides the services and functions for monitoring the environment of the OAIS and providing recommendations to ensure that the information stored in the OAIS remains accessible to the Designated User Community over the long term, even if the original computing environment becomes obsolete. /
Preservation Strategy: Coherent set of objectives and methods for maintaining digital components and related information over time, and for reproducing the related authentic data resources. See also: Digital Preservation, Migration, and Copy. 
Principal Investigator (PI): The person or organization responsible for a study; equivalent to "author" in bibliographic citations. 
Producer The role played by those persons, organizations, or client systems, which provide the information to be preserved. /
Proprietary Format: A file format that is privately owned and controlled, the specifications are generally not open. 
Provenance Information: The information that documents the history of the Content Information. This information tells the origin or source of the Content Information, any changes that may have taken place since it was originated, and who has had custody of it since it was originated. The Archive is responsible for creating and preserving Provenance Information from the point of Ingest; however, earlier Provenance Information should be provided by the Producer. Provenance Information adds to the evidence to support Authenticity. 
Back to top
QA: Quality Assurance 
QuickTime Format (.mov): A file format that wraps video, audio, and other bit streams associated with Apple computers, but also viewable on the Windows platform. See also MOV.
Back to top
RDF: See Resource Descriptive Framework.
Reference Information: The information that is used as an identifier for the Content Information. It also includes identifiers that allow outside systems to refer unambiguously to a particular Content Information. An example of Reference Information is an ISBN number. 
Reference Model: A framework for understanding significant relationships among the entities of some environment, and for the development of consistent standards or specifications supporting that environment. A reference model is based on a small number of unifying concepts and may be used as a basis for education and explaining standards to a non-specialist. 
Reformatting: Copying information content from one storage medium to a different storage medium (media reformatting) or converting from one file format to a different file format (file re-formatting).
Refresh / Refreshing: The process of copying digital resources from one storage medium to the same storage medium. This is generally done to minimize the risk of information loss caused by media degradation. /
Render: To process a digital object (generally with a software application) in order to view, listen to, or interact with the content. This is usually done in a fashion consistent with the format encoding of the file. 
Replication: A Digital Migration where there is no change to the Packaging Information, the Content Information, and the PDI. The bits used to represent these Information Objects are preserved in the transfer to the same or new media instance. 
Representation Information: The information that maps a Data Object / Intellectual Entity into more meaningful concepts. An example of Representation Information for a bit sequence which is a FITS file might consist of the FITS standard which defines the format plus a dictionary which defines the meaning in the file of keywords which are not part of the standard. Another example is JPEG software which is used to render a JPEG file; rendering the JPEG file as bits is not very meaningful to humans but the software, which embodies an understanding of the JPEG standard, maps the bits into pixels which can then be rendered as an image for human viewing. An example is the ASCII definition that describes how a sequence of bits (i.e., a Data Object) is mapped into a symbol. 
Resolution: The clarity or fineness of detail in an image produced by a monitor or printer.
Restricted Use: A category of digital content restricted for any number of reasons including copyright restrictions, donor agreements, security clearance, presence of personally identifying information (PII), or simply that the content is intended for internal use only. 
Resource Descriptive Framework (RDF): A family of specifications for a metadata model. The RDF family of specifications is maintained by the World Wide Web Consortium (W3C). The RDF metadata model is based upon the idea of making statements about resources in the form of a subject-predicate-object expression and is a major component in what is proposed by the W3C's Semantic Web activity: an evolutionary stage of the World Wide Web in which automated software can store, exchange and utilize metadata about the vast resources of the Web, in turn enabling users to deal with those resources with greater efficiency and certainty. RDF's simple data model and ability to model disparate, abstract concepts has also led to its increasing use in knowledge management applications unrelated to Semantic Web activity. See Semantic Web. 
Reversible Transformation: A Transformation in which the new representation defines a set (or a subset) of resulting entities that are equivalent to the resulting entities defined by the original representation. This means that there is a one-to-one mapping back to the original representation and its set of base entities. 
Rights Owner: An individual, group, or organization which holds intellectual property rights to specific digital resource(s). See also: Copyright. 
Back to top
SAML: See Security Assertion Markup Language.
Schema: A formal description of a data structure. For XML, a common way of defining the structure, elements, and attributes that are available for use in an XML document that complies with the schema. 
Security Assertion Markup Language (SAML): An XML-based standard that defines messages for communicating a range of security-related statements about individual parties, including their authentication. 
Semantic Web: The Representation Information that further describes the meaning beyond that provided by the Structure Information. 
Server: An application which responds to requests from a client. 
SGML: See Standard Generalized Markup Language.
Simple Digital Object: A single digital entity (i.e. file). 
SIP: See Submission Information Package.
Standard Generalized Markup Language (SGML) Standard Generalized Markup Language (SGML) is a generic language for document representation. SGML is an international standard that describes the relationship between a document's content and its structure. 
Standards: Rules typically developed, adopted, and promoted by large organizations that can advocate for their broad usage. Data standards enable the exchange of data while technology standards enable the delivery of data between systems. 
Storage: Archival: The category of digital storage that provides the services and functions for the long-term storage, maintenance and retrieval of digital objects. 
Storage: Nearline: A term used in computer science to describe an intermediate type of data storage that represents a compromise between online storage (supporting frequent, very rapid access to data) and offline storage/archiving (used for backups or long-term storage, with infrequent access to data). Nearline is a contraction of near-online. See also Storage: Offline and Storage: Online. 
Storage: Offline: Any digital storage medium that must first be attached to a computing device before being made accessible to the computing system. Offline storage may be in the form of tape drives, fixed media (CDs, DVDs, flash drives) or hard drives that are not continuously network accessible. Also called removable storage. See also Storage: Nearline and Storage Online. /
Storage: Offsite: Storage that is located a sufficient distance from the location in which the main data is stored. Often the goal is to separate backup copies, and place them in locations in which they are unlikely to be affected by the same [natural or other] disaster. 
Storage: Online: Local or network-accessible storage utilized for data that is immediately accessible to an application without the need to stage it in from a lower tier of storage. See also Storage: Nearline and Storage: Offline. /
Storage Migration: The process of copying content from one generation or configuration of digital data storage onto an updated generation or configuration. 
Structure Information: The Representation Information that imparts meaning about how other information is organized. For example, it maps bit streams to common computer types such as characters, numbers, and pixels and aggregations of those types such as character strings and arrays. 
Structured Data: A record created from data that has been collated and managed in a structured environment, often in a database-type business information system. The captured data is highly-structured, predictive and repetitive. 
Submission Agreement: The agreement reached between an OAIS and the Producer that specifies a data model, and any other arrangements needed, for the Data Submission Session. This data model identifies format/contents and the logical constructs used by the Producer and how they are represented on each media delivery or in a telecommunication session. 
Submission Information Package (SIP): An Information Package that is delivered by the Producer to the OAIS for use in the construction or update of one or more AIPs and/or the associated Descriptive Information. /
Succession Plan [data]: The plan of how and when the management, ownership and/or control of the OAIS holdings will be transferred to a subsequent OAIS in order to ensure the continued effective preservation of those holdings. 
System File: A generic term for the native or internal storage format used by statistical software. When statistical software reads a raw character format data file consisting of ASCII or EBCDIC characters, it must read each byte in sequence. It can be more efficient in its storage, retrieval, and calculations by storing a data file in a special binary format called a system file. Typically, a system file for one brand of software cannot be read by another brand of software or by the same brand on another hardware platform. Some software is capable of creating a portable file that can then be read by other software or on other platforms. 
Back to top
Tag Library: A collection of documents explaining the correct way to tag documents in XML for a particular Document Type Definition (DTD). A Tag Library goes beyond the basic rules of the DTD in that it provides pointers on what is considered best practice. 
Tag: Fragments of text used to organize content, usually delimited in a set format. Example of XML tags: <book> <title>This is the Title of The Book</title> <intro>This is the book introduction...</intro> </book> In the example above, book, chapter, title, and intro are tags. They do not convey content, but rather the context of the content. The < and > are used to signify what is a tag and what is content.
Text File: In computer usage, any file written in pure character format. Sometimes called a Plain Text File. 
Thumbnail: A miniature version of an image that is generally used to allow quick browsing through multiple images.
TIFF (.tif): Abbreviation for Tagged Image File Format. The TIFF file format is an image format with the file extension .tif or .tiff.
Transformation: A Digital Migration in which there is an alteration to the Content Information or PDI of an Archival Information Package. For example, changing ASCII codes to UNICODE in a text document being preserved is a Transformation. 
Trusted Digital Repository (TDR): A trusted digital repository is one whose mission is to provide long-term access to managed digital resources to its designated community, now and into the future; that accepts responsibility for the long-term maintenance of digital resources on behalf of its depositors and for the benefit of current and future users; that designs its system(s) in accordance with commonly accepted conventions and standards to ensure the ongoing management, access, and security of materials deposited within it; that establishes methodologies for system evaluation that meet community expectations of trustworthiness; that can be depended upon to carry out its long-term responsibilities to depositors and users openly and explicitly; and whose policies, practices, and performance can be audited and measured. 
.txt: See Plain Text File.
Back to top
UML: Unified Modeling Language 
UNICODE: Universal Code 
Unique Identifier: A unique identifier is a language-independent label, sign or token that uniquely identifies an object from another object. See Identifier. 
URI or URL: Uniform Resource Identity or Uniform Resource Location, a unique web address that includes the protocol, server name, path and the document name.
User: Anyone who needs, uses or benefits from the data resources held by an archive. The role played by those persons or client systems that interact with an archive to find preserved data resources of interest. In the OAIS terminology User is the same as Consumer. 
User Friendly: Computer software or hardware that is simple to set up, run and use. 
Back to top
Validation: The process of making sure that data is correct and useful when checked against a set of data validation rules. These might include rules for package or file structure or specific file format profiles. 
Virus: A computer program that is transferred to one or more computers with the intention of corrupting or wiping out information in the recipient computer. 
Back to top
WAVE (.wav): A Windows based format for storing uncompressed audio files. The file extension is .wav.
Workflow: The tasks, procedural steps, organizations or people, required input and output information and tools needed for each step in a business process. A workflow approach to analyzing and managing a business process can be combined with an object-oriented programming approach, which tends to focus on documents, data, and databases. 
Workflow Analysis: The examination and evaluation of the tasks, procedural steps, staff involved, required input and output information, and tools needed for each step in a business process. 
Back to top
XFDU: XML Formatted Data Unit 
XML: XML is an abbreviation for eXtensible Markup Language, a computer language for enriching data with information about structure and meaning. It is an open standard, defined by the World Wide Web Consortium and is platform independent. 
XML Attributes: XML elements can have attributes that further describe them, such as the following: <Price currency="Euro">25.43</Price> In the example above, "currency" is an attribute of "Price", and the attribute's value is "Euro".
XML Document: A storage unit (i.e. a file) containing XML markup and content. 
XML Element: An XML element is everything from (including) the element's start tag to (including) the element's end tag. A sample paragraph element would be "<p>This is the text of the paragraph.</p>".
XML Schema: Defines the vocabulary (elements and attributes), the content model (structure, element nesting and text content) and data types (value constraints) of a class of XML documents. 
XML: Valid: An XML document that is verified correct against a DTD or schema. The process of checking to be sure that document is valid is called validation. Note this is more stringent than simply verifying that the document is well-formed.
XML: Well-Formed: An XML document that follows the rules set forth by the XML specification, including having an XML declaration, correct comments, all tags are closed, all attributes are quoted, every document has one "container" element. Note this means that the XML is correct, but not necessarily following the rules specified by the DTD.
XQuery: XML Query (XQuery) is a query language with some programming language features designed to query collections of XML data. 
XSL/XSLT: XSLT, the Extensible Stylesheet Language for Transformations, is an official recommendation of the World Wide Web Consortium (W3C). XSLT is a language used for transforming XML into other formats, most commonly HTML, PDF, or different forms of XML. If XML is all about content, then XSLT is about display. 
Back to top
The majority of the terms in this glossary were compiled or modified from the following:
 ISPCR's glossary which was originally prepared by James Jacobs, formerly at the University of California, San Diego and called the Glossary of Selected Social Science Computing Terms and Social Science Data Terms
 The Reference Model for an Open Archival Information System (OAIS)
 Archives New Zealand's Glossary Digital Continuity Definitions
 AHDS Digital Preservation Glossary
 The National Digital Stewardship Alliance (NDSA) Glossary
 A glossary compiled by the Education Subcommittee of the State Electronic Records Initiative of the Council of State Archivists. 
Terms whose sources are not specified were combined and modified from multiple resources.
Back to top