What’s this all about?

Metadata- and File-Wrangling.
Professional and Private.

My “AHA” Effect…

Simply making use of “name=value” tags;
Moving metadata to the same level as a file or foldername;
Keeping the META where the DATA is.
Allow using the filesystem directly as database.
All FOSS and standard and awesome, of course.

The idea for this first came to my mind, on April 21 in 2022, while I was preparing 2 presentations for the National Archives of Singapore (by INA):

Large scale archival data storage.
Digital asset management systems.
Including workflows and reality checks.

There was my first “AHA” Moment: This may be a key to building blocks towards data-“enjoyment” (rather than Management): Simply by dissolving the distance between the database and the described objects.

By simply converting both of them into filesystem “Data Objects”. Then go along with that new box of old, but now “the new default” options. Enjoy.

Why bother at all? I assume most of us here are already engaging in “wrestling” an increasing amount of “Digital Objects” in digital workflows - and systems all interconnected - and hopefully well-configured and working smoothly.

If you’re nodding now, you may feel happy to hear that there may be some change coming up: Object Storages.

What’s with the “Holodeck”?

Yes, I’m trying to be funny.
Yes, I’m very serious about this idea.

A “Holodeck” is a fictional device from Star Trek which can generate any computer-generated images/objects, that allow to be interacted with “as if they were real”.

Short: A virtual holographic “anything”-simulator.

I’d already be happy if I could tag my personal photo collection - without having to worry about a lock-in commitment to any application that annotates my pictures. Or media files. Or documents.

Damn. We all need Digital Asset Management systems. Like glasses, when you get older: If you’re older than 25, you probably want/need one.

A cloud storage with “search by metadata/tags” is doing that already, btw.

By taking the step to using “Data Objects” (meta+data) as regular paradigm for handling our computing needs, we might seriously get closer to something as generic (and stable) as Star Trek technology 😇️.

My Perception:

Current Project Desires:

“Make things all digital and awesome! And easy. In no-time.”

My Perception:

Disclaimer

What I’m proposing is not a new technology.

I’m suggesting a different usage of what already exists.

And I apologize: I may be an ad for FOSS.

Existing Components

“Object Storages” already support payload with metadata.
(Jonáš Svatoš has practical insights on object storages)
Local “search-tools” already deal with indexing and UIs for existing meta+data.
(Marion Jaks has experience with in-house DAM)
Other base components already exist too.
Things need to be orchestrated.
And used/tested/tuned.

https://min.io/resources/img/subnet/subnet-instantaneous.gif

Where to begin?

Terms

Files/Folders?
Metadata?
Data “payload”?
Data Objects?
Filesystem?
All clear?

Status quo: Assigning “a plain title”?

See: “Keywords and Text Strings” (PNG Specification, W3.org)
And: “What software can I use to read png metadata?”

The Unix Philosophy

“Everything is a file.” See: https://en.wikipedia.org/wiki/Unix_philosophy

If we translate files to Objects, it becomes…?

The AHA-Holodeck Philosophy

“Everything is an Object.”

Anything can have its metadata with its payload content.
As simple a Lego and to-be taken for granted like filenames.
The filesystem serves as (semantic-graph-) database.

What if you could simply…?

store (and use) any metadata reliably/persistent with its payload content?
drag-n-drop, copy/paste, convert, view, edit any metadata:
In any file manager or tool?
import/export catalog (database) entries, like copy/pasting files?

https://arstechnica.com/gaming/2016/09/how-time-travel-works-in-star-trek/3/

What if…?

that was as easy as using file/foldernames?
but without classical restrictions?
over its whole lifecycle?
you can store hashcodes with any file format?
Even for .txt or .bin?
By default?

https://en.wikipedia.org/wiki/Back_to_the_Future_Part_II

What if…?

you don’t have to worry/think about “naming a file” anymore? (even keep multiple filenames in parallel (if you like))
same was true for folders?
you could keep as many (nested) “name=value” information as you’d like?
this would work from local/small (USB-Stick) to network/large scale storage?

https://min.io/docs/minio/linux/reference/minio-mc/mc-sql.html

Metadata-only Objects = Catalog entries

Right-click “New Object”:
Annotate as desired.
No payload required.
Relate two Data Objects:
Drag-n-drop, then describe Relationship(-Object).
Any “Catalog Object” could also have a payload. (eg preview image/icon)

Interoperability of Features

Any music/photo/collection application becomes (even (more)) compatible.
Metadata is transformed on access/demand.
Metadata exchange and usage between systems is facilitated.

Embedded metadata?

What is the use-case for embedding any metadata?

So the meta stays with the data!

Media container formats?

What is the use-case for (media) container formats?

So related data stays together!

Performance?!

Size?
Speed?
Interoperability?
MacGyver-able?

Size: I don’t think that’s a problem.

It’s Metadata. With todays “default” storage capacities: trivial. The payload is currently stored already - so that has to work anyways.
The Metadata is also currently stored (in databases/files, etc).
Even if you keep/accumulate Metadata for a longer period: It’s still Metadata(-sizes).
Okay… I propose storing metadata as binary-proof UTF-8 encoded strings/text. Even numbers. For starters. That might increase storage demands by “blowing up” long numbers (16, 32, 64 word-size (bits), for example) into “way larger” strings. Yet, if we can “afford” this “waste” of digital storage space, imagine you could see/read any information as plaintext by default.

Or maybe the metadata layout can indeed by declared like a programming code Object (instead of a SQL-table design). Like “Class” definitions per Data Object Types? Then numbers could be integer/long/etc data types. And strings could even be active methods. ;)

Speed: Speed could indeed be a major factor.

However, I still think that any performance “Einbußen”, compared to now are still worth investigating on how to make things faster.

For now, I propose the same technology that is used for indexing websites (and even locally stored files - in different formats). All this already exists, and is widely in use - even by “smaller” websites. So it can’t be that hard. And hardware is still cheaper than manpower and life- and braintime.

Also, any application that has its own kind of “library” database/config stored somewhere, is already performing these tasks sufficiently. Initial indexing of existing contents takes a while, but then, only changes are monitored - and that doesn’t even bother nowadays “average” computer systems.

Maybe it’d drain a bit more on battery-powered systems. But hey, I’ve heard they’re working on “batterifying” everything by 2020. Or some other horizon.

On the other hand: Considering, that with Data Objects, less individual code-libraries will need to be used (to access metadata, and provide library-functionality, etc). And since this filesystem provides basic functionality by default, there’d be less code necessary to run (and use resources).

How large the impact of this could be, and how far in the future, the coverage and support to speak of “real world tested” is unclear at the moment.

Interoperability:

Would be greatly improved. I’m quite serious about this. Most of my work in the last 20+ years was: Making things interoperable, and improve upon existing technologies and possibilities.

With files becoming Objects - and Metadata “accumulating” along its payload, becoming well-annotated (semi-automatically) and therefore (more) self-sustaining and easier (re-)usable.

Imagine all “future” applications to simply provide tagging and filtering options, optimized UI for different use cases. Music Browser, Video & Film Browser, Document Browser, etc - IIIF-design by default.

By proposing serious usage of Metadata on any kind of Digital Object (Collection) - from personal to professional, small to large scale: We archivists can now pimp the computing world, by applying our decades of real “META-EXPERIENCE FU” knowledge and skills.

Where to begin (implementing this)?

MinIO?
NoSQL / MongoDB?
Search Indexers?
…More ideas?

What to feed it with?

Real-world collection (web-)access copies.
Corresponding data (catalog) entries. (XML, JSON, CSV, etc)
…More ideas?

The rabbit hole goes deeper.

But that’s a story for another time…

If such an implementation/prototype is not awesome…

…then it’s not what I suggest here 😎️
(Or simply not finished yet)

Oh, btw:

IMO, we all will sooner or later use Object storages, because hierarchical filesystems don’t scale well enough anymore (with today’s use cases and sizes)
So you’ll have the underbelly to support metadata-with-payload Objects anyways 🤩️

Ideas? Input? Questions?

Please! Go ahead. Now and later :)

Peter Bubestinger-Steindl
Peter@ArkThis.com

https://github.com/ArkThis/AHA_ObjectWorld/
https://diode.av-rd.com/nextcloud/index.php/s/z2M4JZY8RFt8Nnd

CC-BY-SA

Keep your META where your DATA is. In the filesystem?

What’s this all about?

My “AHA” Effect…

What’s with the “Holodeck”?

My Perception:

My Perception:

Disclaimer

Existing Components

Where to begin?

Terms

Status quo: Assigning “a plain title”?

The Unix Philosophy

The AHA-Holodeck Philosophy

What if you could simply…?

What if…?

What if…?

Metadata-only Objects = Catalog entries

Interoperability of Features

Embedded metadata?

Media container formats?

Performance?!

Where to begin (implementing this)?

What to feed it with?

The rabbit hole goes deeper.

If such an implementation/prototype is not awesome…

Oh, btw:

Ideas? Input? Questions?