Skip to Main Content

Managing Data Sets: Metadata Standards

Metadata Overview

A metadata standard is a set group of elements that have been standardized for a particular field to describe data. Some scientific disciplines already have established metadata standards for data sets. Additionally, some data repositories also have their own standards. If there is not a standard already in place for your discipline, there are several general purpose schemas that you can adapt to meet your needs.

Introduction to Metadata provides an overview of metadata—its types, roles, and characteristics—as well as a discussion of how it relates to Web resources; a description of methods, tools, standards, and protocols for publishing and disseminating digital collections; and a handy glossary.

The Metadata Research Center at the UNC School of Library and Information Science is developing the HIVE model for dynamically integrating multiple controlled vocabularies.

Descriptive Metadata Standards

Dublin Core Metadata Initiative is a simple yet effective element set for describing a wide range of networked resources. The Dublin Core standard includes two levels: Simple and Qualified. The semantics of Dublin Core have been established by an international, cross-disciplinary group of professionals from librarianship, computer science, text encoding, the museum community, and other related fields of scholarship and practice.

Simple Dublin Core comprises fifteen elements.

Qualified Dublin Core includes three additional elements (Audience, Provenance and RightsHolder), as well as a group of element refinements (also called qualifiers) that refine the semantics of the elements in ways that may be useful in resource discovery.

Structural Metadata Standards

Metadata Encoding & Transmission Standard (METS) is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.

Metadata Object Description Schema (MODS) is a schema for a bibliographic element set that may be used for a variety of purposes, and particularly for library applications. The standard is maintained by the Network Development and MARC Standards Office of the Library of Congress.

XML Formatted Data Units (XFDU) is a standard being developed by NASA's CCSDS.

FOXML is a simple XML format that directly expresses the Fedora digital object model.

Open Archive Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources.

Domain Specific Metadata Standards

Biology

Darwin Core includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity.

GUID and Life Science Identifiers provides guidance on how to use Globally Unique Identifiers to meet specific requirements of the biodiversity information community.

TAPIR - TDWG Access Protocol for Information Retrieval is a standardized, stateless, HTTP transmittable, XML-based request and response protocol for accessing structured data that may be stored on any number of distributed databases of varied physical and logical structure.

Ecology

Ecological Metadata Language (EML) is a metadata specification developed for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts. EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document ecological data.

Education

Learning Object Metadata (LOM) is a multi-part standard designed to facilitate search, evaluation, acquisition, and use of learning objects. It also allows for the sharing and exchange of learning objects by enabling the development of catalogs and inventories, while taking into account the diversity of cultural and lingual contexts in which the learning objects and their metadata are reused.

Geospatial

The Content Standard for Digital Geospatial Metadata provide a common set of terminology and definitions for the documentation of digital geospatial data. The standard establishes the names of data elements and compound elements (groups of data elements) to be used for these purposes, the definitions of these compound elements and data elements, and information about the values that are to be provided for the data elements.

ISO 19115:2003 defines the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data.

Federal Geographic Data Committee is an interagency committee that promotes the coordinated development, use, sharing, and dissemination of geospatial data on a national basis. This nationwide data publishing effort is known as the National Spatial Data Infrastructure (NSDI). The NSDI is a physical, organizational, and virtual network designed to enable the development and sharing of this nation's digital geographic information resources.

Humanities

The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.

The VRA Core is a data standard for the description of works of visual culture as well as the images that document them.

Social Science

The Data Documentation Initiative (DDI) is an effort to create an international standard for describing data from the social, behavioral, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving.