Data Integrity

Peter Bubestinger-Steindl
(p.bubestinger@av-rd.com)

March 2019

Data Integrity

What is that?

Different levels

  • Filesystem
  • File (=data)
  • Content (=payload)

Level 1

$ ls -la --time-style=full-iso
Filesystem
Filesystem

Level 2

File data
File data

Level 3

Content payload
Content payload

Different tools

  • Filesystem
    • ls
    • dir
  • File
    • md5sum (=any MD5 tool)
    • bagger
    • exactly
  • Content
    • ffmpeg

Different algorithms

  • CRC
  • MD5
  • SHA .. 1 .. 2 .. 256 .. SHA512?
  • WTF

Hashcodes

CRC =
4294967295

MD5 =
d41d8cd98f00b204e9800998ecf8427e

SHA256 =
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

md5sum

Basic. Reliable. Shell.

# Show:
$ md5sum FILENAME

# Store:
$ md5sum FILENAME > FILENAME.md5
$ md5sum *.* > MD5SUMS.md5
$ md5sum *.mkv >> MD5SUMS.md5

# Validate:
$ md5sum -c MD5SUMS.md5

Exercise 1

Create hash manifest for:

  • a file
  • a folder
  • a filemask

Exercise 2

Validate hash manifest with:

  • No change.
  • Added space.
  • Changed filename.
  • Edited manifest.

HashCheck

GUI to handle hashcodes (Windows only).
Website: code.kliu.org/hashcheck

BagIt "Bags"

"Bags have built-in inventory checking, to help ensure that content transferred intact."

Bagger

A GUI for handling BagIt bags.

Exercise 1

  • Prepare bag: Add metadata
  • Create bag.
  • Validate bag with:
    • No change.
    • Changed filename.
    • Edited manifest.

Did the test-results match your expectations?

Exercise 2

(This requires an even number of groups)

  • Repeat creating a new bag + metadata.
  • Mangle the bag in any way (or choose not to).
  • Exchange with neighbour group.
  • Find - and present findings.

Exactly

Exactly is another GUI for BagIt bags, but with:

  • (s)FTP transfer capabilities built-in.
  • better support for large files (AV).
  • eMail notification.

Website: weareavp.com/products/exactly

Questions?

Back to index