File formats:
Making choices / Best practices

Peter Bubestinger-Steindl

2022-04-26

Summary: Preservation Format

  • Can be used to generate all other versions.
  • Depicts the “original” source as accurately as possible.
  • No artifical restrictions for using it.
    Now and under unknown future (=unknown) conditions.
  • Well documented, no secrets, FOSS implementation exists.
  • Bit error resilience would be nice.
  • Consider GOP=1 (=Intraframe only).
  • Audio format: Normalize to uncompressed PCM/WAV.
  • For video container formats, consider using MKV or MOV.
    MXF only if really necessary because:
  • As simple as possible, as complicated as necessary.

summarizes as: “preserves well

Best practices for ingest/digitization

  • Capture analog video without adding generation loss.
    Uncompressed (v210) or lossless (FFV1, J2K).

  • Or fallback option: high-quality lossy. At the highest quality (bitrate) you can store and manage well over time.

  • Capture digital tape as “natively” as possible. (MiniDV, DAT, DigiBeta, etc.)

  • Store already-digital files “as original” as possible.
    Transcode only if codec does not satisfy “sustainability” checklist. Rewrap/rewrite container. Always. Even if identical.

Tempting…

  • Hey, it’s a standard!
  • Hey, everyone’s using it!
  • Hey, the “big ones” are using it!
  • Hey, it’s from a major company!
  • Hey, it can do everything!
  • Hey, it’s so easy to use!
  • Hey, it’s gratis!

Rather…

  • ask, ask, ask.
  • get documentation.
  • get sample files.
  • try handling/opening them outside their “usual” bubble.
  • with at least 1 open implementation.
  • Before you commit to a format.
  • Try transcoding it “losslessly” to uncompressed.

Wrapping up with our checklist:

Sustainability:

  1. Disclosure?
  2. Open reference implementation/libs?
  3. Adoption/popularity?
  4. Complexity?
  5. Independence vs external contingencies?
  6. Artificial restrictions?
  7. Self descriptive?

Quality and functionality:

  1. Preserve “original”?
  2. Image/sound quality?
  3. Interoperability?
  4. Editing?
  5. Support for (additional/expected) properties?
  6. Performance & data size?

⭐Which translates to:

Sustainability:

  1. Documentation openly accessible?
  2. Open reference implementation?
  3. How likely is it to be supported in tools/devices for which userbase?
  4. Which features are implemented/tested/stable?
  5. Which choice/requirements do I have to handle it beyond “shelf life”?
  6. Is it legal/possible to handle it in future/different situations?
  7. Can it contain proper metadata?

Quality and functionality:

  1. Preserve significant properties?
  2. Sufficient image/sound quality and robustness to multi-generation copies?
  3. Interoperability / ease of usage & access?
  4. Direct use for editing?
  5. How many different formats will I need (pile up)?
  6. Handle performance / data size requirements?

Comments?

Questions?

p.bubestinger@av-rd.com