Format Normalization:
A Whitelist Approach

Peter Bubestinger-Steindl
(peter @ ArkThis.com)

2022-10

Format Normalization

Convert source data-format to one that is “more suitable”.

More suitable = …?

  • preserves better.
  • handles easier.
  • better fits your environment.

AV Format Normalization

PROs: 🤩️ CONs: 😒️
  • Reduce number of different formats/variants.
  • Even out differences.
  • Detect issues (early).
  • Improved choice of tools.
  • Simplify workflows.
  • Modified copy of
    “The Original”.
  • Adds extra effort and runtime to ingest.
  • Not all properties from source may be depicted in target format.

Why a Whitelist?

  • Blacklist: ❌️
    Define which formats to avoid.
  • Whitelist: ✅️
    Define which formats are okay for DLTP.

For digital-ingest it’s more practical to just define which formats are “okay”, and convert the rest.

What to whitelist?

  • Formats that can be considered to “preserve well”.
  • Formats that are easier to handle:
    Over time, and in different environments.

Examples:
Containers: AVI, MOV, MKV, MP4, MXF
Codecs: DV, MPEG-1/2/4/*, FFV1

Preserves well

  • Technical specification available?
  • Without artificial restrictions?
  • Open Standard?
  • FOSS implementation available?
  • Widely adopted?
  • Minimalistic Complexity?

Simple and Short

  • Don’t include more formats than necessary in your Whitelist.
  • Focus on your environment, ressources and workflows.
  • Define a preferred format if there’s >1 similar option (eg MOV vs MP4?)

Uncompressed?

  • Preserves quasi-well, but: huge!
  • Uncompressed ≠ Uncompressed (eg RGB ≠ BGR)
  • No error resilience built-in (in bitstream)
  • Uncompressed = also lossless.

Open Lossless Target Format

  • Only reason: smaller filesize than Uncompressed.
  • But: creates larger files for lossy sources.
  • Yet: Preserves/handles better than proprietary/mixed formats.
  • Plus: No generation loss.
  • Lossless-to-lossless: Fine.

Conversion Options

  • Rewrapping:
    • Modifying only container format.
    • AV streams inside stay as-is.
    • No transcoding necessary.
  • Transcoding:
    Re-encoding actual AV data streams from one format to another. (eg Convert from codec A to codec B)

Audio to PCM?

Suggestion:

Convert all audio encodings to uncompressed LPCM (Linear Pulse Code Modulation).

Audio to PCM

  • Eliminates different source-format behavior (throughout later lifecycle)
  • Most common for (professional) audio recording.
  • The standard for audio preservation.
  • Widely supported, well known.

Practical Whitelist Example

Containers: Codecs Image formats
AVI, MOV, MKV, MP4 DV, MPEG-1/2/4/* DPX (most flavors)
MXF (depends…) FFV1 TIFF (some flavors)
WAV PCM PNG, JPG

Examples / Discussion

  • WMV/WMA
  • ProRes
  • JPEG-2k
  • (Proprietary) MXF extensions/variants.
  • Same for DPX: some yes, some no?
  • 3GP, FLV, MXV
  • M2TS
  • …?

Carved in Stone?

Please don’t. 😉️
Things change.

Embrace revisiting, maybe adapting/changing your whitelist (and reasons) over time.

Questions?

Comments?