Reopening information

Masks are required in library buildings. Learn more about fall reopening.

Digital preservation practices

The University of Minnesota Libraries uses a multi threaded approach to digital preservation to support the Libraries’ Digital Preservation Framework. 

Preservation practices

This section describes preservation strategies that may be taken to help preserve digital content held by the Libraries. These include encouraging the use of preservation friendly file formats, format migration or normalization, bitstream copying, fixity checking, and documenting and monitoring file formats over time.  

Technological support levels distinguish the broad levels of support effort the Libraries will use to address its stated objectives. Support effort and preservation strategy is guided by a number of appraisal criteria, including uniqueness, relative risk of loss, and feasibility/cost of preservation.  Digital preservation staff, along with content experts, who understand the enduring value of content, in consultation with record analysts and system administrators, make these decisions.  These are also documented in the Preservation Framework document.  

The implementation activities describes actions the Libraries take with digital materials around storage, security, file integrity, interoperability over time, and chain of custody of materials.  Each activity is done to help mitigate a specific risk.    

The level of support given depends on a confidence level associated with a file format (high, medium, low) and the tools available to address specific preservation issues.  Current confidence levels by generic file types are provided below and represent preferred file formats of the Libraries. 

For questions please contact the Digital Preservation Repository Technology department at dprt@umn.edu.

Back to top

Preservation strategies

This section describes preservation strategies that may be taken to help preserve digital content held by the Libraries. These include encouraging the use of preservation friendly file formats, format migration or normalization, bitstream copying, fixity checking, and documenting and monitoring file formats over time.  

The following strategies are actions that may be taken to help preserve digital content held by the Libraries. 

Preservation friendly file formats: The Libraries is committed to the use of file formats that support long term sustainability. In general, the considerations for selecting file formats include the “openness” of the file format, its level of support as a preservation format in the academic/scholarly community, and how well the format is suited for format migration later on.

Format migration: When the Libraries perceive that a file format is at risk of obsolescence, a new version of this content will be created in a format more suited for long-term preservation and use. This transformation may consist of migration to a newer version of the content’s existing format, or transformation to a different format altogether. In all cases, preservation of the object’s intellectual content will be prioritized over the preservation of a specific presentation style.

Normalization: Upon ingest, materials not conforming to the Libraries’ accepted standards may be converted to one of the preferred formats. To the extent possible, the Libraries will attempt to preserve the essential characteristics of the object. In cases requiring compromise, transformations that maintain the content of the object will be prioritized over those that preserve the presentation. [This process is done on an as needed basis]

Bitstream copying: The Libraries creates multiple copies of all information contained in the Libraries’ digital repositories, for use in the event of data loss. In combination with regular fixity checks, which identify potentially damaged content, this process assists with ensuring the integrity of content, and provides a foundation for its disaster recovery plans.

Fixity checking: If not provided with the content, an initial fixity value is generated for all materials subject for preservation.  These values are recalculated and compared at certain points of time to verify that the content has not changed.  This activity, when combined with bit stream copying enables the repository managers to identify damaged or corrupted content, and to revert to a valid version of the object from a previous point in time.

Documentation of file formats: Inventories of content identify file formats and other characteristics including the PRONOM identifier in the UK National Archive’s online file format registry.  This association ensures that information is always available on the internal structure of the file, and can be further used to determine when the format migration activity should take place in order to mitigate the risks posed by the obsolete file formats.

 

Adapted from McMaster University Preservation Activities.

Back to top

Technological support levels

Technological support levels distinguish the broad levels of support effort the Libraries will use to address its stated objectives. Support effort and preservation strategy is guided by a number of appraisal criteria, including uniqueness, relative risk of loss, and feasibility/cost of preservation.  Digital preservation staff, along with content experts, who understand the enduring value of content, in consultation with record analysts and system administrators, make these decisions. These are also documented in the Preservation Framework document.  

Fundamental support level: Reasonable effort will be made to ensure long-term preservation for digital objects under this stewardship level.  A moderate level of available resources (staff, technologies, funding) will be considered for use.  Treatment strategies will be selected from widely available best practices and may include fixity, validation, geographic replication, and others as developed.  

Advanced support level: All effort will be made to ensure long-term preservation for digital objects identified at this level. A high level of available resources (staff, technologies, funding) will be considered for use. In addition to the strategies under the fundamental support level, strategies here may also include migration, emulation, normalization, and the development of material-specific solutions.

 

Back to top

Implementation activities

Associated with the Digital Preservation Framework's Technological Support Levels.

The University of Minnesota Libraries uses a variety of preservation actions based on general preservation strategies to be able to preserve digital materials for the long-term.  The Libraries’ Digital Preservation Framework document describes two technological support levels. These levels assume a continuum of actions from less to more as the support level increases.  This page discusses the continuum of those preservation actions performed by the Libraries.

The implementation activities describes actions the Libraries take with digital materials around storage, security, file integrity, interoperability over time, and chain of custody of materials.  Each activity is done to help mitigate a specific risk.

Fundamental preservation actions

All materials receive the following: 

Storage

  • One local copy (secondary copy) of the data is stored separately from the Libraries primary copy
  • One remote copy stored at a separate geographical location, preferably on a different system (such as tape)

Risk mitigated: Loss of data, including loss of use across all local systems.

Integrity

  • Fixity checks created at time of receipt (if not received with objects)
  • Fixity checked when content is moved from one location to anothe
  • Fixity is periodically checked on at rest items [set periodically
  • Corrupted items will be corrected from non-corrupt copies
  • Provenance of objects will be documented and traced as appropriate

Risks mitigated: Objects protected against bit-rot (media corruption) and accidental changes. Objects will be traceable back to their source, to verify authenticity.

Security

  • Virus checking on all received materials
  • Limited access to main preservation storage area
  • Monitor and audit and document who has read and read/write access to the primary and secondary storage locations.
  • Workflows will use authorizations and document the chain-of-custody of people/roles allowed to perform tasks such as ingest, storage, and edits.

Risks mitigated: Prevent corruption of data via viruses and unauthorized changes or unauthorized access to the data. Provide a chain-of-custody and a history of actions as proof of due diligence.

Interpretability over time

  • At minimum guarantee that the bits deposited will be the bits that are returned upon request.  In addition, file formats will be identified, recorded and tracked where possible.
  • Identification: format(s) of items will be identified if possible
  • Validation: the conformance of the identifiable item to the format specification will be validated and recorded, if possible
  • Metadata: technical metadata about the items will be generated and preserved alongside the items
  • Formats at risk: the data owners will be warned when their data is preserved in an at-risk format, if possible
  • Risk mitigated: Attempt to maintain the ability to recover and understand the preserved items at a later date.  Prevent loss of comprehensibility caused by changing standards and technologies over time.

Risk mitigated: Attempt to maintain the ability to recover and understand the preserved items at a later date.  Prevent loss of comprehensibility caused by changing standards and technologies over time.

Succession

  • Materials transferred to the University Libraries for preservation must either come from within the Libraries or have a written agreement stating or allowing for:
  • Owners have the right to grant the Libraries permission to preserve and/or provide access to the materials. 
  • Libraries has the permission to preserve the materials over time including using file format migration

Risks mitigated: Ensure chain-of-custody and management of preserved assets over long periods of time.

Advanced preservation actions

In addition, some materials receive: 

Storage

  • A third copy stored offsite (most likely on tape)

Integrity

  • Audit logs of corruption and remediation events will be gathered and made available in reports, if possible

Interoperability

  • Format migrations may be completed on a case-by-case basis
  • Software/system environment may be preserved to aid in recoverability of data

[The general thoughts around this section were adapted from a document the University of Wisconsin - Madison created.]

Back to top

Preferred file formats

The level of support given depends on a confidence level associated with a file format (high, medium, low) and the tools available to address specific preservation issues.  Current confidence levels by generic file types are provided below and represent preferred file formats of the Libraries. File formats are listed in alphabetical order under the confidence level.

Text

Highest confidence

  • PDF/A-1 - ISO 19005-1 (.pdf)
  • Plain Text - with encoding: US-ASCII, UTF-8 (.txt)
  • XML - with included schema (.xml)

Medium confidence

  • HTML -  include a DOCTYPE declaration (.htm, .html)
  • LaTeX with referenced files (.latex, .tex, .ltx)
  • Microsoft Word 2007 or newer (.docx)
  • PDF - with embedded fonts (.pdf)
  • Rich Text Format 1.x (.rtf)
  • SGML (.sgml)

Lowest confidence

  • Microsoft Word 2003 or older (.doc)
  • PDF - encrypted (.pdf)
  • WordPerfect (.wpd)

Raster images/graphics

Highest confidence

  • PNG - 24bit (.png)
  • Tiff - uncompressed (.tif, .tiff)

Medium confidence

  • Digital Negative DNG (.dng)
  • GIF (.gif)
  • JPEG2000 - lossless (.jp2)
  • JPEG/JFIF (.jpg)
  • PNG -  8 bit (.png)
  • Tiff - compressed (.tif, .tiff)

Lowest confidence

  • JPEG2000 - lossy (*.jp2)
  • Photoshop document (.psd)
  • RAW formats (.raw, etc)

Vector graphics

Highest confidence

  • SVG -- no JavaScript binding (.svg)

Medium confidence

  • Computer graphics Metafile (.cgm)

Lowest confidence

  • Encapsulated Postscript (.eps)

Spreadsheet/database

Highest confidence

  • Comma- or tab-separated Values (.csv, .tsv, .txt)
  • Delimited text (.txt, .csv)
  • SIARD: Software Independent Archiving of Relational Databases (.siard)

Medium confidence

  • Excel 2007 or newer (.xlsx)
  • Open Document Spreadsheet (.ods)
  • XML (.xml)

Lowest confidence

  • Excel 2003 or older (*.xls)

Audio

Highest confidence

  • AIFF - uncompressed (.aif, .aiff)
  • Free Lossless Audio Codec (.flac)
  • WAV - uncompressed (.wav)

Medium confidence

  • Advanced Audio Coding (.mp4)
  • Apple Lossless Audio Codec (ALAC) (.m4a)
  • MP3  (*.mp3)
  • SUN audio -- uncompressed (.au, .snd)

Lowest confidence

  • AIFC -- compressed AIFF (.aifc)
  • RealAudio (.ra, .rm)
  • WAV -- compressed (.wav)
  • Windows Media Audio (.wma)

Video

Highest confidence

  • AVI - uncompressed (.avi)
  • QuickTime - uncompressed, motion JPEG (.mov)

Medium confidence

  • Material Exchange Format - uncompressed (.mxf)
  • Motion Jpeg2000 (.jp2)
  • Mpeg-1, mpeg-2 (.mp1, .mp2)
  • MPEG-4 - preferably H.264 (.mp4)

Lowest confidence

  • RealVideo (.rv, .rm)
  • Quicktime - compressed (.mov)
  • Windows Media Video (.wmv)

PowerPoint or similar

Highest confidence

  • [none provided]

Medium confidence

  • OpenOffice (.odp)
  • Powerpoint 2007 or newer (.pptx)
  • Portable Document Format (.pdf)

Lowest confidence

  • PowerPoint 2003 and older (.ppt) (2003 and older)

Websites

Highest confidence

  • WebARChive (.warc)

Medium confidence

  • Internet Archive ARC (.arc)

Lowest confidence

  • Files from content management system

Email

Highest confidence

  • MBOX (.mbox)
  • Exports as plain text with message header and context (.txt)

Medium confidence

  • EML (.eml)
  • MSG (.msg)
  • PST (.pst)

Lowest confidence

  • [none provided]

Geospatial

Highest confidence

  • geoJSON (.json, .tif)
  • Geotiff (.tif)

Medium confidence

  • ESRI Database file
  • ESRI shape file
  • GeoMarkup (GML)
  • GeoPackage
  • Keyhole markup (KML)

Lowest confidence

  • Other geospatial files

CAD (Computer Aided Design)

Highest confidence

  • PDF/E - ISO 24517-1:2008 (.pdf)

Medium confidence

  • AutoCad (.dwg)
  • AutoDesk’s Drawing Interchange File Format/Data eXchange Format (.dxf)

Lowest confidence

  • [none provided]

Back to top