digital archiving

Home / Posts tagged "digital archiving"
HOW CULTURAL INSTITUTIONS ARE BENEFITING FROM DIGITIZATION OF PHOTO ARCHIVES

HOW CULTURAL INSTITUTIONS ARE BENEFITING FROM DIGITIZATION OF PHOTO ARCHIVES

“Today digital technology is pervasive. It is mandatory that museums, libraries, and archives join with educational institutions in embracing it.”

  • Wayne Clough, Author, Best of Both Worlds

Museums and cultural institutions are leaving no stone unturned to digitize history. Archiving photos form an integral part of documenting history. Continuing with our previous post on how cultural institutions are leveraging photo archiving, in this post, we will detail why museums and cultural institutions should leverage photo archiving.

Easy Sharing and Distribution

Unlike physical copies, scanned photos can be easily shared across multiple locations with multiple users. Easier to track electronically, it is also cost effective for researchers and curators as it eliminates the need for physical reproduction and mailing.

Prepare for Disasters

Museums and cultural institutions are not free from the risk of losing valuable content. Natural calamities like earthquakes, floods, heavy rains, or hurricanes and tsunamis have destroyed museums and libraries over the centuries, resulting in the loss of valuable content. Digitization will curb the risk of loss of valuable photographs.

Save Cost and Clutter

Maintaining physical copies of photo prints requires physical storage space and involves cost. Digitizing photos can save institutions cost that is involved in keeping physical copies and make it easier to share and reproduce.

Source of Revenue

Owners of photos of rare events and occurrences can generate a revenue stream in terms of royalty or licensing fee. Different types of models can be adopted like selling prints through your own website, third-party portals, exhibiting in galleries etc.

Tip for Successful Photo Digitization – Prioritizing Which Items to Digitize

Depending on the priority and goals, every institution shortlists the photos that need to be digitized.  Some questions that organizations need to ask before selecting the images for digitization are:

  1. Are the records unique?
  2. Do the photos appeal visually?
  3. Who will be the prospective consumer of the digitized images?
  4. Does the demand justify the cost that will be incurred to digitize the photos?
  5. Will digitization add any value to the picture?
  6. How will the institution control access to the digitized images? Will, there be any restriction or can it be accessed openly?
  7. Does the institution have the legal right to scan?
  8. What is the long-term preservation strategy of the photos being digitized?
  9. What is the metadata that will be required?

Once institutions have selected items that need to be digitized, here are some critical considerations while scanning photos.

  1. Once you have a flatbed scanner ready, set the scanner, photoshop, and the printer to the same color space – CMYK or RGB.
  2. To capture many shades of gray (which is essential especially for black and white photos), choose the right DPI. Depending on the size of the picture, DPI should be around 3000 – 4000 pixels along the length of the image.
  3. Choose the format of preservation carefully. For Masterfile, the recommended format is TIFF.
  4. Save a JPEG copy for easy distribution among researchers.
  5. To avoid damage and file loss, keep the Master copy separate from the distributed copy.

Photo/ image archivists should prioritize digitizing susceptible photos like colored photos and cellulose nitrate or films. The context of each of these photos should also be documented, and each item needs to have metatags to make them easily accessible in time of need. To know about the top six mistakes to avoid while digitizing photos, read this blog.

Decoding the Importance of Metadata in Digitization and Preservation of Content

Decoding the Importance of Metadata in Digitization and Preservation of Content

Introduction

Digital media has come a long way over the past decade. The shift from single-screen to multiple-screen and multi-device, from the subscription-based model to OTT service providers is apparent over the years. Keeping in line with the demand, broadcasters are also broadening their distribution channel.

With the audience having a wide variety of choice to consume video across platforms at their preferred time – broadcasters are leaving no stones unturned to digitize video content, even those dating back to decades.

Broadcasters are now focused on aggregation and distribution of highly-targeted content that reaches narrow-interest audiences. As broadcasters develop and store digital content to use and reuse across devices and platforms, the value of good shareable content is increasing.

However, the problem lies elsewhere. An estimated 98% of archived media is not available for digital distribution.[1]

Why?

Migrating hours of media content from tape to digital storage is time-consuming. Though automated migration systems convert tapes to multiple digital formats simultaneously, tagging these files to make them searchable is a challenge.

Have you ever wondered how – when you Google – some videos top the search results? With an average of 300 hours[2] of video content being uploaded to YouTube alone every minute, content producers and owners sweat over making their content optimized for search results.

The solution

The key to ensuring that your content doesn’t get lost in the crowd is tagging it with relevant keywords. While search engines have evolved over the years, they are still not human – hence can’t read/watch your content. They need a hint (or metadata) to understand the content and apply analytics to list them. While filtering, the search engine follows the following order – title, description, and tags. If you optimize these three, half of the battle is won.

In this paper, we will explore:

  • What is metadata?
  • Types of metadata
  • Metadata Schema Models
  • The importance of metadata in content digitization
  • Optimizing metadata for content digitization

What is metadata?

Metadata refers to “data about data.”[3] It represents a detailed description of the underlying data within an object concerning its title, date & time of creation, format, length, language, year of reference, narration describing the object’s identity & purpose, etc.

For long-term digital archiving, metadata refers to the preservation techniques that are applied to the digital objects in the archives. Metadata does the following:

  • Helps in easy identification, location, and retrieval of information by the end-users
  • Provides information about quality aspects or issues of the created object along with its access privileges/rights
  • Ensures smooth data management

Types of metadata

Depending on the nature of data and usability in a real-world scenario, metadata can be categorized as:

  • Descriptive: Helps to identify, locate, and retrieve information related to an object through indexing and navigation to related links. It includes elements such as title, creator, identity, and description
  • Structural: Defines the complexity of an object along with the role of individual data files, ordering of pages to form a chapter, file names, and their organization, etc.
  • Administrative: Helps to manage the resources in terms of its creation, methods, access rights, associated copyright, and the techniques required for preserving it
  • Rights: Defines access permissions and constraint over the stored objects and information contained in them at different levels
  • Preservation: Records activities or methodology opted in the archive for preserving digital data.
  • Technical: Provides technical information embedded with the digital object (content files). It describes attributes of the digital image (not the analog source of the picture) and helps to ensure that the image will be rendered with accuracy, capture process of the data, and their transformation.
  • Provenance: Records object’s origin/nativity and the changes that were performed to these objects for its resolution, format, perspectives, etc.
  • Tracking: Keeps track of the data at different stages of the workflow (data automation processes, digital capturing, transformation, processing filters and toolsets, enhancement, quality control and management, and data archival and deliverables)

For long-term digital preservation, two types of metadata play a crucial role:

  1. Packaging Metadata

Defines three kinds of information packages, which are as follows:

  1. Submission Information Package (SIP) – Contains information delivered to the archive from the content provider
  2. Archival Information Package (AIP) – Related content information stored in the archive
  3. Dissemination Information Package (DIP) – On request delivery of information to the user
  1. Preservation Metadata

Records the process that supports the preservation of digital data

Metadata Schema Models

According to ISO 23081[4], a schema is “a logical plan showing the relationships between metadata elements, normally through establishing rules for the use and management of metadata specifically as regards the semantics, the syntax and the optionality (obligation level) of values.”

The amount of metadata that needs to be stored for an object depends on its functional usage & significance. With a large amount of metadata already there, and more being published regularly for a different purpose by different communities, metadata schema designers need unique experience of using the Semantic Web to consider a metadata schema.

For long term preservation of data, a varying Metadata Schema Models has been developed, which includes the following:

  • MARC: Machine Readable Cataloguing
  • MARCXML: XML version of MARC 21
  • METS: Metadata Encoding & Transmission Standard
  • MODS: Metadata Object Description Schema
  • DCMI: Dublin Core Metadata Initiative
  • CDWA: Categories for the Description of Works of Art
  • CRM: CIDOC Conceptual Reference Model
  • MPEG-7: Moving Picture Coding Experts Group
  • EAD: Encoded Archival Description
  • RDF: Resource Description Framework
  • VRA CORE: Visual Resources Association
  • DDI: Data Documentation Initiative
  • MIX: Metadata for Images in XML Standard
  • IEEE LOM: Institute of Electrical and Electronics Engineers Standards Association for the description of “learning objects”

The importance of metadata in content digitization

Metadata plays a key role in processing, managing, accessing, and preserving digital content –be it audio, video, or image collections. Metadata has the following key functionalities:

  • Search: To search for data associated with a file like Author, Date Published, Key Words, etc.
  • Distribute: To determine when and where the content will be distributed
  • Access: To determine delivery of targeted content based upon preset rules matching metadata values
  • Retain: To determine which records to archive

Optimizing metadata for content digitization

The importance of metadata lies in the fact that it makes the content searchable – both online and offline. While filtering, the search engine follows the following order – title, description, and tags. Some key points to remember while using metadata for content digitization are:

Optimize the title

Grab the attention with a catchy and compelling title. To make a title search engine (and mobile) friendly, limit it to 120 characters and include your top keywords. Think what the audience would relate to, and make the title informative and relevant.

Optimize the description

Follow and include the keywords, and detail what the content is all about. Limit the most critical information within the first 22 words of your description – as search engine displays it on the list before you click ‘see more’ button.

Optimize the tags

A couple of things to keep in mind while tagging a digital asset are:

  1. Assign keywords that cover the 5 W’s – what, when, who, why, and where – to make it a well-captured asset
  2. Avoid grammatical errors while assigning keywords
  3. Avoid ambiguous words or words with multiple meanings
  4. Be consistent with abbreviations and acronyms
  5. Use a minimum of 8 – 12 tags per asset

Conclusion

Metadata plays a crucial role in keeping track of content right from its inception to its processing and accessibility. It provides a complete description of the purpose and functionality of the data, making it easier for end-users to locate and retrieve the data. Therefore, it is crucial that all contents should have embedded metadata in them.

[1] https://www.recode.net/2014/4/8/11625358/modernizing-the-entertainment-industry-supply-chain-in-the-age-of

[2] https://merchdope.com/youtube-stats/

[3] https://www.techopedia.com/definition/1938/metadata

[4] https://committee.iso.org/sites/tc46sc11/home/projects/published/iso-23081-metadata-for-records.html

Leveraging Artificial Intelligence in Digitization

Leveraging Artificial Intelligence in Digitization

Digitization is a necessity today – both for restoring and making it searchable. Be it physical libraries or digital media, media organizations and content owners are investing in digitization and archiving of legacy content. Organizations often spend hours in recreating or searching for content that already exists. Aged and untreated content, discounting metadata, and not choosing the right storage solution often takes a hit on the broadcasters.

While you’re oblivious, artificial intelligence (AI) is changing this scenario. Think of personalized playlists on YouTube or Spotify or recommendations on Netflix and Amazon Prime; broadcasters are using AI to curate a selection of tailormade content.

Few weeks after Donald Trump was elected, the Internet Archive’s TV News Archive aggregated more than 520 hours of televised Trump speeches, debates, interviews, and other broadcasts way back from 2009. Thanks to the Trump Archive, the footage doesn’t get lost in the crowd of news giving journalists, scholars and citizens a chance can track and analyze Trump’s statements on public policy issues.

Netflix claims to save about US$1bn annually due to AI technology’s ability to automate workflows and reduce customer churn.

After Wimbledon 2017, IBM Watson used a cognitive algorithm to produce highlight reels of what it believed were the best shots of the tournament. By automatically analyzing audio and video from the footage to identify highlight worthy shots and points, artificial intelligence saved hundreds of manhours of editors.

Here are five ways in which artificial intelligence is revolutionizing the way we archive, process, and store documents and extract information out of it.

Automated processing

Optical character recognition can recognize texts. AI can additionally read, classify, and automate workflows based on that information in minutes. Initially fed with a set of rules, AI uses machine learning to improve its identification and processing capabilities.

Data extraction

Data extraction reaches a whole new level with AI-powered document management system, which can accurately read the information and understand the context.

Document clustering

AI can also group unclassified documents based on topics, which can help organizations understand the documents within a larger context, find resemblances, and draw conclusions that would otherwise be time-consuming or impossible.

Advanced security

Document management system powered by AI can help impose user access. By using secure biometric techniques like facial recognition to identify employees who can access the data, it can prevent unauthorized viewing or alteration of documents.

Data analytics

Cognitive platforms as a service (PaaS) like Microsoft Azure Cognitive Services and IBM Watson apply techniques like predictive analytics, machine learning, and data visualization to analyze the collected data to improve decision making.

The way ahead…

At IBC2017, for the first time, AI was one of the main themes, which speaks loads about its adoption. Recently, a company named Ripcord has patented and built robots to scan and sort a box full of paper from business cards to legal documents and enter the contents into a searchable database in the cloud. As AI adoption across industries is increasing, we can only hope to see better and faster analysis, and improved decision making across the broadcast industry.

The Comeback of the Vinyl Records,Thanks to Digital Archiving

The Comeback of the Vinyl Records,Thanks to Digital Archiving

As sunshine peeped through the window today morning, Larry smiled. A long lost smile for the 70-year old gentleman, whose children have moved to greener pastures to pursue their dreams. What remains now is eternal wait – wait for Christmas and Easter holidays, when his big mansion turns to a home.

However, today’s sunshine has a different story to share. Story of a different comeback is waiting for Larry. No, not of his children. But of his passion. Today is Saturday. And Larry is excited to watch Vinyl, which will be aired on television in sometime.

A series that made Larry nostalgic and excited like a child simultaneously. That made him travel back 40 years. And he suddenly started living it all in flashback.

A banker in the prime of youth, Larry was passionate about music. Every day, before numbers and finances took control of his day, he had 30 minutes for himself. 30-minutes of absolute bliss, when, as sunshine peeped through the window, he would close his eyes enjoying Earl Grey tea as Beethoven played in his record player. Those 30-minutes of uninterrupted solitude.

Then one day, the record player broke. Leaving him with “can’t be repaired” from technicians and thousands of favorite vinyl records. And a void that his family failed to understand. Except his grandson Alan.

Alan was browsing through Larry’s collection of records one afternoon and was intrigued by his rare collection. His excitement was answered by a proud smile from Larry. And the 70-year old found camaraderie in his teenage grandson.

It was Alan who gave him the idea to digitize his collection. “It’s easy Grandpa, they’ll just convert it and give it in a USB for storage. And then you can listen to your collection anytime you want, from anywhere,” Alan said.

Alan’s easy solution seemed like a dream come true. Couple of store visits later, he realized his records can be converted to high-resolution audio files, and he can carry them anywhere. The age of technology, he smiled to himself. Who knew he can enjoy the authentic sound of vinyl, with all its warmth and smoothness, anytime and anywhere with digital convenience.

 As Larry savored his morning tea, the sunshine made him happy. His plan for the day is already made. His favorite show on television, a date with the past, and then he would head to the store to get the means to digitize his prized collection.

And then, his 40-year old morning routine would resume. Thanks to digital archiving!

It is indeed a bright sunny Saturday morning!