Archival Ingest

Scope of submission

What is a SIP?

“An Information Package that is delivered by the Producer to the OAIS for use in the construction or update of one or more AIPs and/or the associated Descriptive Information.” — OAIS, 1-15

Shape of SIPs to come

  • File
  • Folder(-structure)
  • ZIP, TAR, etc.
  • BagIt Bag
  • Digital objects + database entry
  • A video tape, film, image, tape, …
  • …anything! :D

Mom… Where do SIPs come from?

  • Creators
  • Digitization vendors
  • Staff
  • Passionate collectors
  • All of the above!
  • and more…

How to decide…

what is accepted/required?

  • Source (e.g., film, video)
  • Agreements with producers
  • Internal capabilities
  • Metadata only available from producer
  • Policies: collection, format
  • etc…

What is…?

Ingest activities

Typical ones

  • Prepare & record (analogue) source
  • Generate unique ID
  • Validate or generate fixity/hash data
  • Format policy checks
  • Format normalization
  • Create derivatives
  • Create metadata
  • Logging
  • Virus check
  • etc…

The “unique” Identifer

A must-have !

Also known as:

  • ID
  • Object ID
  • Item ID
  • Archive signature
  • UID, UUID

The “unique” Identifer

Examples

  • V-00815
  • W/S #00034
  • FBW002984
  • 38AF2EC1A13494B9DF6FD6E75960307
  • 111-ADC-4319
  • VHS-0317
  • adBDwKf_aSE
  • Q83697636

Identifier: Considerations

  • Distinguish which type of object/media?
  • How many objects to expect (per time/year)?
  • Human readable/handleable? (vs as unique as possible)
  • Print on stickers on physical objects?
  • Print as bar codes?
  • How to “ingest” external collections into that schema?
  • Does it scale enough?
  • Valid for which duration?

Format Normalization

  • Popular SIP to AIP use case
  • Improve preservation properties
  • By switching to a “better” format
  • Cleaning/normalizing data (dialects)

Format Normalization

Examples

  • Rewrap container (eg MKV, MOV, MXF)
  • Audio to PCM
  • Convert video to FFV1, V210, H.264, etc

Format Policy Checks

  • Define conditions for tech-MD properties
  • “whitelist” formats
  • Spot irregularities

MediaConch

Policy checking

MediaConch

MediaConch

MediaConch

MediaConch

Possible Bottleneck

“Ingest can be a dangerous bottleneck. Don’t let the perfect be the enemy of the good”

Comments?

Questions?