Decoding The Importance Of Metadata In Digitization And Preservation Of Content

A image of Code in computer screen

INTRODUCTION

Digital media has come a long way over the past decade. The shift from single-screen to multiple-screen and multi-device, from the subscription-based model to OTT service providers is apparent over the years. Keeping in line with the demand, broadcasters are also broadening their distribution channel.

With the audience having a wide variety of choice to consume video across platforms at their preferred time – broadcasters are leaving no stones unturned to digitize video content, even those dating back to decades.

Broadcasters are now focused on aggregation and distribution of highly-targeted content that reaches narrow-interest audiences. As broadcasters develop and store digital content to use and reuse across devices and platforms, the value of good shareable content is increasing.

However, the problem lies elsewhere. An estimated 98% of archived media is not available for digital distribution.

Why?

Migrating hours of media content from tape to digital storage is time-consuming. Though automated migration systems convert tapes to multiple digital formats simultaneously, tagging these files to make them searchable is a challenge.

Have you ever wondered how – when you Google – some videos top the search results? With an average of 300 hours of video content being uploaded to YouTube alone every minute, content producers and owners sweat over making their content optimized for search results.

The Solution

The key to ensuring that your content doesn’t get lost in the crowd is tagging it with relevant keywords. While search engines have evolved over the years, they are still not human – hence can’t read/watch your content. They need a hint (or metadata) to understand the content and apply analytics to list them. While filtering, the search engine follows the following order – title, description, and tags. If you optimize these three, half of the battle is won.

In this paper, we will explore:

  • What is metadata?
  • Types of metadata
  • Metadata Schema Models
  • The importance of metadata in content digitization
  • Optimizing metadata for content digitization

What Is Metadata?

Metadata refers to “data about data.” It represents a detailed description of the underlying data within an object concerning its title, date & time of creation, format, length, language, year of reference, narration describing the object’s identity & purpose, etc.

For long-term digital archiving, metadata refers to the preservation techniques that are applied to the digital objects in the archives. Metadata does the following:

  • Helps in easy identification, location, and retrieval of information by the end-users
  • Provides information about quality aspects or issues of the created object along with its access privileges/rights
  • Ensures smooth data management

Types Of Metadata

Depending on the nature of data and usability in a real-world scenario, metadata can be categorized as:

  • Descriptive: Helps to identify, locate, and retrieve information related to an object through indexing and navigation to related links. It includes elements such as title, creator, identity, and description
  • Structural: Defines the complexity of an object along with the role of individual data files, ordering of pages to form a chapter, file names, and their organization, etc.
  • Administrative: Helps to manage the resources in terms of its creation, methods, access rights, associated copyright, and the techniques required for preserving it
  • Rights: Defines access permissions and constraint over the stored objects and information contained in them at different levels
  • Preservation: Records activities or methodology opted in the archive for preserving digital data.
  • Technical: Provides technical information embedded with the digital object (content files). It describes attributes of the digital image (not the analog source of the picture) and helps to ensure that the image will be rendered with accuracy, capture process of the data, and their transformation.
  • Provenance: Records object’s origin/nativity and the changes that were performed to these objects for its resolution, format, perspectives, etc.
  • Tracking: Keeps track of the data at different stages of the workflow (data automation processes, digital capturing, transformation, processing filters and toolsets, enhancement, quality control and management, and data archival and deliverables)

For long-term digital preservation, two types of metadata play a crucial role:

  1. Packaging Metadata

Defines three kinds of information packages, which are as follows:

  1. Submission Information Package (SIP) – Contains information delivered to the archive from the content provider
  2. Archival Information Package (AIP) – Related content information stored in the archive
  3. Dissemination Information Package (DIP) – On request delivery of information to the user
  1. Preservation Metadata

Records the process that supports the preservation of digital data

Metadata Schema Models

According to ISO 23081, a schema is “a logical plan showing the relationships between metadata elements, normally through establishing rules for the use and management of metadata specifically as regards the semantics, the syntax and the optionality (obligation level) of values.”

The amount of metadata that needs to be stored for an object depends on its functional usage & significance. With a large amount of metadata already there, and more being published regularly for a different purpose by different communities, metadata schema designers need unique experience of using the Semantic Web to consider a metadata schema.

For long term preservation of data, a varying Metadata Schema Models has been developed, which includes the following:

  • MARC: Machine Readable Cataloguing
  • MARCXML: XML version of MARC 21
  • METS: Metadata Encoding & Transmission Standard
  • MODS: Metadata Object Description Schema
  • DCMI: Dublin Core Metadata Initiative
  • CDWA: Categories for the Description of Works of Art
  • CRM: CIDOC Conceptual Reference Model
  • MPEG-7: Moving Picture Coding Experts Group
  • EAD: Encoded Archival Description
  • RDF: Resource Description Framework
  • VRA CORE: Visual Resources Association
  • DDI: Data Documentation Initiative
  • MIX: Metadata for Images in XML Standard
  • IEEE LOM: Institute of Electrical and Electronics Engineers Standards Association for the description of “learning objects”

The Importance Of Metadata In Content Digitization

Metadata plays a key role in processing, managing, accessing, and preserving digital content –be it audio, video, or image collections. Metadata has the following key functionalities:

  • Search: To search for data associated with a file like Author, Date Published, Key Words, etc.
  • Distribute: To determine when and where the content will be distributed
  • Access: To determine delivery of targeted content based upon preset rules matching metadata values
  • Retain: To determine which records to archive

Optimizing Metadata For Content Digitization

The importance of metadata lies in the fact that it makes the content searchable – both online and offline. While filtering, the search engine follows the following order – title, description, and tags. Some key points to remember while using metadata for content digitization are:

Optimize The Title

Grab the attention with a catchy and compelling title. To make a title search engine (and mobile) friendly, limit it to 120 characters and include your top keywords. Think what the audience would relate to, and make the title informative and relevant.

Optimize The Description

Follow and include the keywords, and detail what the content is all about. Limit the most critical information within the first 22 words of your description – as search engine displays it on the list before you click ‘see more’ button.

Optimize The Tags

A couple of things to keep in mind while tagging a digital asset are:

  1. Assign keywords that cover the 5 W’s – what, when, who, why, and where – to make it a well-captured asset
  2. Avoid grammatical errors while assigning keywords
  3. Avoid ambiguous words or words with multiple meanings
  4. Be consistent with abbreviations and acronyms
  5. Use a minimum of 8 – 12 tags per asset

Conclusion

Metadata plays a crucial role in keeping track of content right from its inception to its processing and accessibility. It provides a complete description of the purpose and functionality of the data, making it easier for end-users to locate and retrieve the data. Therefore, it is crucial that all contents should have embedded metadata in them.

[1] https://www.recode.net/2014/4/8/11625358/modernizing-the-entertainment-industry-supply-chain-in-the-age-of

[2] https://merchdope.com/youtube-stats/

[3] https://www.techopedia.com/definition/1938/metadata

[4] https://committee.iso.org/sites/tc46sc11/home/projects/published/iso-23081-metadata-for-records.html