Topic 5 - Archival Storage

Reality check

  • "Just storing files" is not preservation
  • Storage market focus may deviate from your use case
  • There's more than just one right solution...
  • Mixing is usually a good idea.

Physical Media Types

Optical Disks

Optical Disks
Optical Disks

HDD: Hard Disk Drive

Classic "Spinning Platters" harddrive
Classic "Spinning Platters" harddrive

Data Tape

LTO: Linear Tape Open cartridge
LTO: Linear Tape Open cartridge

Flash Memory

SSD: Solid State Disk (no cover) SD cards

Media Types: Overview

Classic "Spinning Platters" harddrive LTO: Linear Tape Open cartridge Optical Disks SSD: Solid State Disk (no cover) SD cards

HDD vs SSD

Storage Types

External Hard Disks

USB/firewire chassis + HDD
USB/firewire chassis + HDD

External Hard Disks

Ideal for:

* Collections ≤ [size of largest HDD] * Accessed by only 1 computer at a time * Moving data * Quick access

Advantages:

* Relatively low cost (100-300 EUR) * Portable

Disadvantages

* Risky: Drives may fail * Backup is manual and easily out of sync * Access is limited

Network Attached Storage (NAS)

Microserver: Can function as NAS
Microserver: Can function as NAS

Network Attached Storage (NAS)

Ideal for:

* Collection < ca. 40 TB (if standalone). * Larger is possible, if clustered. * Quick access (incl. multiple networked users) * Networks with ≤ 1 GbE for large files (10 GbE for film) * Organizations with some IT support

Advantages:

* Multiple users can access in parallel * Relatively affordable (200-1000+ EUR)

Disadvantages:

* Requires heating / cooling * Potentially less secure (always on) * Less portable * Requires IT skills for problem solving

Data Tape

Data Tape

Ideal for:

* Collections from 10 TB to x PB * Collections that don’t need to be accessed in seconds * Back up scenarios

Advantages:

* Relatively low cost for tape stock * Scalable * Portable * Low failure rates

Disadvantages:

* Offline or nearline * Management & migration can be challenging * Proprietary tape filesystems

Data Tape Library (Robot)

The Cloud

The Cloud

Ideal for:

* Collections from any size * Institutions with limited IT support * Collections that don’t need to be accessed immediately * Fast access to smaller resolution files (streaming)

Advantages:

* Only pay for what you use * Scalable * Reduces the day-to-day management needs * AV: Low-resolution access scenarios

Disadvantages:

* HTTP access can be very slow for large files * Requires careful planning to ensure the correct services are being purchased * Depends on Internet connection * Vendor migration/lock-in

Networks

Network Basics
Network Basics

Consider: Layers!

Data storage has layers...
Data storage has layers...

The File System

File- / foldernames, size, timestamp, access rights
File- / foldernames, size, timestamp, access rights

The File System?

(*) Open formats

LTFS: Linear Tape File System

  • Open specification = vendor neutral
  • Better for preservation, but may not support "comfort" features.
  • All implementations must:
    • Correctly read media that was compliant with any prior version.
    • Write media that is compliant with the version they claim compliance with.

Errors? Backup!

Production Backup

The 3-2-1 Backup Rule

  • Keep at least three copies of your data.
  • Store two backup copies on different devices or storage media.
  • Keep at least one backup copy offsite.

Statistics of HDD failure

https://www.backblaze.com/blog/how-long-do-disk-drives-last/
https://www.backblaze.com/blog/how-long-do-disk-drives-last/

The Story of ToyStory2

https://www.youtube.com/watch?v=8dhp_20j0Ys

Data errors

Digibeta Dropouts

Digibeta Dropouts
Digibeta Dropouts

Audio Bit-Errors

Broken Sample Bits
Broken Sample Bits

Small Bit, Big Problem

JPEG header "signature" = FF D8 FF DB

  • 0xFF = 0b11111111
  • 0xBF = 0b10111111

Corruption / Bit rot

"Bit rot can be caused by a number of sources but the result is always the same – one or more bits in the file have changed, causing silent data corruption. The ‘silent’ part of the data corruption means that you don’t know it happened – all you know is that the data has changed (in essence it is now corrupt)." — Jeffrey B. Layton, Linux Magazine, June 2011

Data scrubbing, Fixity checking

How do you know your data is intact?

A check a day keeps the bitrot away...

Example GUI for integrity validation
Example GUI for integrity validation

Headcount

Hashcode manifest files can also be used to check if all files expected are present or additional ones exist that are unaccounted for.

Storage: Challenges / Risks

  • Storage media failure
  • Obsolescence
  • Humans
  • Catastrophes / War

More Storage Terms

  • S.M.A.R.T. Self-Monitoring, Analysis and Reporting Technology
  • NAS Network Attached Storage
  • SAN Storage Area Network
  • RAID Redundant Array of Inexpensive Disks
  • Object Storage

Good practice

  • At least 1 backup (=2 copies)

  • Preferrably 2 backups (=3 copies)

  • Geographic separate locations (with different threat profile)

  • Mix storage media

  • Migrate timely and with a plan (~5-7 years)

  • Use the right tech for your needs

  • Periodically check fixity of content and backup

  • Work with IT to implement and maintain technology.

Comments?

Questions?