Metadata Primer -- A "How To" Guide on Metadata Implementation


Authors

David Hart
University of Wisconsin-Madison, Land Information and Computer Graphics Facility

Hugh Phillips
3001, Inc., Gainesville, FL
formerly with Wisconsin State Cartographer's Office

Fax questions not answered during the Q&A sessions of the October 1997 Metadata Satellite Videoconference.

The Metadata Primer is one phase of a larger metadata research and education project undertaken by the National States Geographic Information Council and funded by the Federal Geographic Data Committee's Competetive Cooperative Agreements Program (CCAP). For more information on the NSGIC Metadata Research and Education Project, please click here. Send comments on the metadata primer to: dhart@macc.wisc.edu or hphillips@ibm.net.


Organization of the Primer

This primer is designed to provide a practical overview of the issues associated with developing and maintaining metadata for digital spatial data. It is targeted toward an audience of state, local, and tribal government personnel. The document provides a "cook book" approach to the creation of metadata. Because much of the most current information on metadata resides on the Internet, the primer summarizes relevant material available from other World Wide Web (WWW) home pages. The primer begins with a discussion of what metadata is and why metadata is important. This is followed by an overview of the Content Standards for Digital Geospatial Metadata (CSDGM) adopted by the Federal Geographic Data Committee (FGDC). Next, the primer focuses on the steps required to begin collecting and using metadata. The fourth section deals with how to select the proper metadata creation tool from the growing number being developed. Section five discusses the mechanics of documenting a data set, including strategies on reviewing the output to make sure it is in a useable form. The primer concludes with a discussion of other assorted metadata issues.

While we call this a 'cook book' of sorts for metadata, the metadata recipes you derive from it will be flavored by your institutional arrangements, your GIS and operating platform, and your data variety and volume. Bon appétit!


Table of Contents

Section 1 -- Metadata: What is it and Why is it Important?
Section 2 -- Get Acquainted with the Content Standards for Digital Geospatial Metadata (CSDGM)
Section 3 -- Where Does One Begin?
Section 4 -- Select the Proper Metadata Tool
Section 5 -- Start Out Simple
Section 6 -- Other Metadata Issues

Section 1

Metadata: What is it and Why is it Important?

1.1 What is it?
1.2 Importance of Metadata
1.3 Forms of Metadata
1.4 Geospatial Data Clearinghouses
1.5 Who is Going to Create the Metadata?
1.6 Additional Information on Metadata

1.1 What is it?

At first glance, the term metadata evokes a technical image and almost guarantees a trip to the dictionary. Metadata is not viewed as a "user friendly" topic, but this image is not completely deserved. Simply defined, metadata is "data about data." Used in the context of digital spatial data, metadata is the background information which describes the content, quality, condition, and other appropriate characteristics of the data. Paper maps contain metadata, primarily as part of the map legend. In this form, metadata is readily apparent and easily transferred between map producers and map users. When map data are in a digital form, metadata is equally as important, but its development and maintenance often require a more conscious effort on the part of data producers and the chain of subsequent users who may modify the data to suit their particular needs.

1.2 Importance of Metadata

Metadata serves many important purposes, including:

Metadata can be organized into several levels ranging from a simple listing of basic information about available data to detailed documentation about an individual data set. At a fundamental level, metadata may support the creation of an inventory of the data holdings of a state or local government agency. Metadata is also important in the creation of a spatial data clearinghouse, where potential users can search to find the data they need for their intended application. At a more detailed level, metadata may be considered as insurance. Metadata insures that potential data users can make an informed decision about whether data are appropriate for the intended use. Metadata also insures that the data holdings of an agency are well documented and that agencies are not vulnerable to losing all the knowledge about their data when key employees retire or accept other jobs.

Metadata may soon play an important role in the provision of actual insurance policies within the GIS profession. Gary Hunter of the University of Melbourne recently wrote an article for URISA News (Issue 155, September/October 1996, pp. 1-3 -- contact URISA at (202) 289-1685) on the implications of the increasing trend toward purchase of database insurance policies for spatial data sets used in high-risk GIS application areas, e.g. emergency response. Hunter points out that insurance companies will likely require conditions for the issuance of such policies ranging from detailed background information on the organizations producing and using the data to a certified quality assurance program in place. Another obvious condition would be comprehensive metadata on the data sets in question.

1.3 Forms of Metadata

Metadata may exist in forms other than ones compliant with the Content Standards for Digital Geospatial Metadata. (The Content Standards will be described in detail in section 2.) Perhaps the most common form of metadata is a file folder filled with notes on data sources and procedures used to build the data. Less common is complete, organized metadata such as the Wisconsin Department of Natural Resources' GIS Data Users Guide. This catalog provides concise metadata in a form that is easy to read and has recently been made available on-line.

CSDGM compliant digital metadata may be created, stored, and used in a variety of formats. The most basic is an ASCII text document. An ASCII document is easy to transfer to other users independent of the hardware/software platform they use. Another common format is Hypertext Markup Language (HTML). HTML provides an attractive way to view metadata using a browser such as Netscape Navigator, Mosaic, or Microsoft Internet Explorer. Recently, there has been strong interest in creating metadata in Standard Generalized Markup Language (SGML). SGML provides an effective way to tag metadata elements. This will be important for indexing and searching metadata on Clearinghouses and to provide a means to exchange metadata between metadata users, metadata databases, and metadata tools.

1.4 Geospatial Data Clearinghouses

A Geospatial Data Clearinghouse is a location, typically accessed through a home page on the World Wide Web (WWW), to search for spatial data sets. A Clearinghouse may contain specific data sets which can be downloaded or may contain information about data sets that aid the user in making a determination about whether it is worthwhile to obtain the data set and how to do so. The existence of many Clearinghouses of geospatial data and metadata may seem surprising to persons new even to the CSDGM. Well, Clearinghouses are out there and they have both data and metadata! If you are hesitant to jump into metadata, don't worry, the water is fine! Here are some robust examples of spatial data/metadata clearinghouses:

1.5 Who is Going to Create the Metadata?

This is not an idle question. Metadata creation is typically considered to be an obligation of the data producer. Although you may be a data producer and custodian, it may not be most effective for you to learn the CSDGM and produce the metadata if you only produce/document a few data sets per year. If your state has a GIS coordinating council, there may be individuals specially trained to help you produce metadata for your data sets if you will contribute the metadata to a Clearinghouse.

1.6 Additional Information on Metadata

For additional background information on the importance of metadata, please review the following brochures and on-line material:


Section 2

Getting Acquainted with the Content Standards for Digital Geospatial Metadata (CSDGM)

2.1 Overview of Content Standards for Digital Geospatial Metadata
2.2 Resources that Document the Content Standards or Help in Understanding Them
2.3 Tutorial
2.4 FAQs
2.5 General Metadata Resources

2.1 Overview of Content Standards for Digital Geospatial Metadata

Metadata, or "data about data," describes the content, quality, condition, and other characteristics of data. The Content Standards for Digital Geospatial Metadata, Veresion 2 (CSDGM) specify the information content of metadata for a set of digital geospatial data. The purpose of the content standards is to provide a common set of terminology and definitions for documentation related to these metadata. Information about what elements of the metadata are mandatory, optional, repeatable, or one of a choice are encoded in the production rules of the CSDGM.

The first impression of the CSDGM is its apparent complexity; in printed form it is about 75 pages long. This is necessary to convey the definitions of the 334 different metadata elements and their production rules. Do not let the length dismay you; the CSDGM is meant to be a reference, not recreational reading! The content standards are meant to be a framework to convey those things you need to know about a data set to evaluate its usability, to obtain it, and to use it effectively. To help better understand the CSDGM, it is useful to break it down into its major sections, not all of which may necessarily be required or present in the metadata for a data set.

2.1.1 Major sections of the CSDGM

Identification Information
data set title, area covered, keywords, purpose, abstract, access and use restrictions
Data Quality Information
horizontal and vertical accuracy assessment, data set completeness and lineage
Spatial Data Organization Information
raster, vector, or an indirect (e.g. address) link to location
Spatial Reference Information
lat/long, coordinate system, or map projection
Entity and Attribute Information
definitions of the attributes of the data set
Distribution Information
distributor, file format of data, off-line media types, on-line link to data, fees
Metadata Reference Information
who created the metadata and when

Additionally, the content standards define three 'floating' minor sections.

2.1.2 Minor sections of the CSDGM
Citation Information
originator, title, publication date, publisher
Time Period Information
single date, multiple dates, range of dates
Contact Information
contact person and/or organization, address, phone, email

The minor sections are never used alone, but are always inserted into one of the major sections as a block. Because they are used multiple times, it made the content standards more compact to define them separately. Contact Information, in particular, is a section of which a site might have several instances, and common to all or many metadata documents. This situation may be used to advantage in site specific metadata template documents, or database based metadata tools.

If only the 'mandatory' elements of metadata are included, the metadata is rather brief and may be quickly prepared as shown in this tongue-in-cheek 'minimum metadata document.' The minimum mandatory elements may be appropriate for initial data set documentation, but should not be substituted for complete documentation.

Some elements of the content standards are termed compound elements, because they (parents) are composed of, or are containers for other sub-elements (children). Typically this parent/child relationship is indicated in metadata by indenting the child element one level deeper than its parent (hierarchical indentation), or with numbers, by adding another decimal point refinement to the element number, e.g. a 2.1 parent might have 2.1.1 and 2.1.2 children. Neither the indentation or the numbering is a required part of the content standards.


2.2 Resources that Document the Content Standards or Help in Understanding Them

2.2.1 Example metadata
While the CSDGM does provide definitions for the various metadata elements, it is sometimes difficult to discern from them exactly what is meant, or what would be an example for that element. In this case, there is nothing which will substitute for example metadata. It is a good exercise to look at example metadata side-by-side with the CSDGM.

Metadata, even that conforming to the CSDGM, does not all look the same. This is because the CSDGM, as its name implies, specifies only the content of the metadata, not its format. In the past, this has led to a number of difficulties in incorporating data from different organizations who use different metadata tools or template documents into a common clearinghouse. The different forms might employ hierarchical indentation of the elements, the numbering system of the CSDGM, colons following the element names (or none), string the element names together with underlines (e.g. Identification_Information) or leave them as separate words. Luckily, there is a tool which will bring some order to this allowed chaos, Peter Schweitzer's cns. The cns software (which stands for chew 'n spit) reads in metadata files and produces output with consistent element names which can be read by other metadata tools. Some CSDGM compliant documents in various formats follow.

Modern Average Global Sea Surface Temperature
- a metadata document showing hierarchical indentation of metadata elements in a text form document. (Peter Schweitzer, USGS)
Dodge County GPS Net
- a metadata document in HTML form with hierarchical indentation (Wisconsin NSDI Clearinghouse)
Montana National Forests
- a metadata document in HTML form (Montana NSDI Clearinghouse)
Vilas County WI SSURGO Soils
- a metadata document in text form with centered major headings (NRCS)
Montgomery North (AL) DRG
- a metadata document with numbering and element names (USGS)

These are just a few examples of metadata conforming to the CSDGM. It appears that there is a lot of latitude in the form of the metadata. This is true, and content is much more important than form, but there are several points to keep in mind if you want your metadata to be easily incorporated into an National Spatial Data Initiative (NSDI) Clearinghouse:

The element names must be spelled out exactly as in the CSDGM
(That rule has some explicit exceptions: cns can rectify aliases for the element names (that it knows about), it can fix mixed capitalization, and will supply underlines as needed)

The parent/child relationships must be maintained, i.e. the child elements must always follow their respective parent elements.
(The hierarchical relationship between parent and child does not have to be indicated with hierarchical indentation or numbering - cns can figure this out. What is necessary is the correct ordering of elements - i.e. children of a parent must follow the parent and before any additional elements at the level of the parent or above.)

Don't leave out intermediate compound element headings
(cns can supply some of these, but not always.)

There are innumerable other metadata documents which can be utilized as sample metadata, although one must be cautious about what is selected for use as an example. Just because a document is billed as metadata doesn't mean that it is CSDGM compliant. Peter Schweitzer of the USGS has put together a package of sample metadata documents which may be useful as examples for metadata. Although they are CSDGM compliant, there are no examples in the sample metadata of data sets used by local governments - e.g. tax parcels or road centerlines.

The FGDC has recognized that example metadata is important to help understand the content standards. Moreover encapsulating what geospatial data are available will be helpful to users who do not currently have good Internet access. For these reasons, the FGDC has commissioned Applied Geographics, Inc. of Boston, MA to compile metadata from existing Clearinghouses and solicit metadata from the general NSDI user community for the production of a CD of metadata, with a (hopeful) estimated six month update cycle. When that becomes available, you will find information on how to obtain it here!

2.2.2 Printable versions of the content standards
When documenting data sets it is almost impossible to have enough copies of the CSDGM floating around to always have one handy. The FGDC has the 1998 version of the CSDGM (the most recent one) available for download and printing in several forms:

2.2.3 The Green Book
When metadata wranglers gather at the watering hole to spin yarns about the data sets *they* documented in the 'good ol' days' they might mention the famous 'Green Book,' that is, the Content Standards for Digital Geospatial Metadata Workbook, Version 1.0 (FGDC, March 24, 1995). This handy book contains the definitions of the metadata elements and some handy FAQs about specific elements. The workbook does not contain the production rules for the elements in text form; they have been substituted for by a subset of Susan Stitt's graphical representation of the standard. Additionally, the workbook contains two example metadata documents, albeit each with some minor errors in the metadata. You can sometimes pick up a copy of this handy comb-bound document at FGDC workshops or you can obtain one (or several) by requesting them from:

Publications
US Geological Survey
590 National Center
Reston, VA 22092

Include in your order (as applicable) your Name, Position, Organization, Street Address, City, State, ZIP (or postal) code, Country, Telephone number, FAX number, and email address.

You can also FAX your request for the Workbook to (703) 648-5755

2.2.4 The On-Line version of the CSDGM
For those all-too-frequent times when you have misplaced your printed copy of the CSDGM, there is a quick reference (in fact the complete content standards) available on the Internet, the on-line CSDGM, produced by Peter Schweitzer of the USGS.

2.2.5 Graphical Representation of the CSDGM
Although the production rules of the content standards are very explicit, it is somewhat cumbersome to follow because compound elements are not fully expanded. In a fine example of one picture being worth a thousand words, and one of the most useful representations of the standard ever produced, Susan Stitt of the National Biological Division of the U.S. Geological Survey has encoded the production rules of the CSDGM into graphics. Nested boxes are used to indicate compound elements, color is used to indicate mandatory and optional elements, and a three dimensional appearing prism in a compound element box is used to indicate an element that requires some text entry by the user (if its parent is used). This representation is also available as a Power Point slide show.

2.2.6 Colorized CSDGM
The Colorized CSDGM is meant to provide an alternate view of the content standards wherein mandatory, optional, and choice elements are encoded through font style and color in an HTML document. It is essentially the "flatland" version of Susan Stitt's metadata graphics. If you have a color inkjet printer there is a version especially designed for printing to that sort of device.

2.2.7 Power Point presentations
The FGDC has developed a series of presentations, slide shows essentially, which discuss metadata and clearinghouse implementation issues. However, these presentations suffer from the drawback that there is no way to go directly to a specific slide; one must proceed through the presentation in serial fashion (forwards or backwards).

2.2.8 Metadata Resource CD
The FGDC has contracted with PlanGraphics, Inc. to produce a CD of FGDC, metadata, and NSDI related miscellany. It will contain the hypertext version of the Content Standards for Digital Geospatial Metadata (CSDGM) embellished with expanded element definitions and examples, tutorials on accuracy assessment and map projections/coordinate systems, the graphical and the colorized versions of the content standards, available metadata tools, and other metadata educational materials developed by the FGDC and Competitive Cooperative Agreements Program (CCAP) participants as the result of their projects.

Like the example metadata CD, when the Metadata Resource CD becomes available you will be able to find out here how to obtain it. In a wicked recursion, this primer might also be found on the Resource CD.

2.2.9 Future Revisions to the Content Standards for Digital Geospatial Metadata
Although you may find the CSDGM complicated enough already, in time you will discover some problems with it that others have already noted. In fact, the FGDC has been encouraging feedback on the CSDGM since it was released on June 8, 1994. MITRE Corporation has compiled the essence of these comments into a report that FGDC will used in deciding revisions to the CSDGM. If you have any suggested revisions to the CSDGM, then send them to the FGDC. The "What's New With Metadata" section of the FGDC Home Page summarizes the latest news regarding review of the CSDGM. The FGDC is currently working with the International Organization of Standards (ISO) on development of an international metadata standard (ISO Technical Committee 211, Working Group 3).


2.3 Tutorials

Barney, the BLM dinosaur turned metadata tutor (and now turned invisible) provides a light-hearted approach to learning the CSDGM. Be aware that some of the suggested responses are geared towards BLM internal operations.


2.4 Metadata FAQs

The definitions of the metadata elements in the CSDGM are succinct. Unfortunately, in their brevity, they sometimes lead to users wishing they had a little more expanded explanation or an actual example of what was being asked for. It was mentioned above that example metadata can be useful in this regard. Sometimes, though, one must just ask for additional explanation. Some of these questions come up on the listserver nsdi-l, however they have never been collected into a package. Eric Miller of Ohio State University and the Online Computer Library Center, Inc. has a searchable archive of postings to nsdi-l and GeoWeb which might uncover a question about a particular element.

The Green Book contains a sprinkling of FAQs throughout its pages, and Peter Schweitzer of the USGS has compiled a collection of FGDC Metadata FAQs, including FAQs about his three metadata tools, cns, mp, and xtme.

Ideally, the on-line version of the CSDGM would be expanded to include FAQs and examples.


2.5 General Metadata Resources

In summary, there are many ways to learn about the CSDGM, but you will also need to learn about strategies for implementing metadata. In addition, it is to learn about tools for creating metadata, checking compliance with the CSDGM, and querying metadata. The next three sections will help guide those efforts.

Section 3

Where Does One Begin?

3.1 Inventory Data Sets
3.2 Prioritize Data Sets
3.3 Metadata Examples

This primer started with a discussion of the importance of metadata. This was followed by a review of the Content Standards for Digital Geospatial Metadata. Now is the time to roll up our collective sleeves and tackle the task. This section outlines a strategy for collecting metadata at the agency or corporate level. The first step often involves getting an organizational committment to "do metadata." This may involve explaning the tradeoff of the short term costs versus the long term benefits of metadata implementation. Once an organizational committmemt is secured, the next step involves inventory of the spatial data holdings of the agency and prioritizing their importance for documentation. Examination of sample metadata can help one become more familiar with the range of approaches to documenting spatial data and to avoid later misinterpretation of the CSDGM. Finally, this section will examine the detail to which metadata can be collected, including complete (and exhausting) documentation, partial documentation through a template or profile of the CSDGM and thumbnail sketches of the most basic information about a data set.


3.1 Inventory Data Sets

Spatial data sets seem to have the ability to multiply and fill up all available disk space and proliferate beyond the initial source of creation. Agencies often find it difficult just to keep track of the growing inventory of spatial data they have developed, much less fully document it. Although when one thinks of the significant resources that many agencies invest in database development, the time spent keeping a current inventory is certainly justified. For those using GIS software by Environmental Systems Research Institute (ESRI), there is a very useful spatial data inventory software program called Findarc by Geographic Designs, Inc. (web page no longer in service) Findarc is a proprietary UNIX-based product presently running on the SUN Sparcstation platform that will search a file space or directory and locate Arc/Info coverages, GRIDs, ArcView shapefiles and ArcView projects. Findarc identifies file names, software version, feature type, size, date and time of last edit, file ownership, status of topology, and information about map projections. This is the output from a Findarc session. The output is a comma delimited file that can be imported into ArcView, a spreadsheet, or a database for sorting and querying or filtered with a UNIX tool such as grep.


3.2 Prioritize Data Sets

With a spatial data inventory such as that produced by Findarc, one can begin the task of prioritizing the data most important to the agency. These data sets are prime candidates for early documentation, along with those that will be shared with other agencies or sold. There may be other reasons particular to an individual agency for placing a high priority on early metadata creation.

This is a good point to discuss the timing of collecting metadata. Database developers often intend to document spatial data shortly after completion of data entry. However, it is human nature to put off technical tasks such as database documentation. Months or years may slip by before metadata creation is undertaken. Key attributes of the data may be forgotten in the ensuing time. The most efficient strategy for metadata creation is to make it an ongoing process during database development.


3.3 Metadata Examples

An important step before actually initiating the creation of the metadata document is to review some sample metadata. Metadata may be collected in a variety of forms and to varying levels of detail. Examining how other agencies have documented their data holdings may provide insight into the most appropriate strategy for your agency. Examining sample metadata may also result in time savings through reduced effort associated with metadata creation. An example would be duplication of the documentation of a coordinate system already completed by another agency.

Metadata may be collected at many different levels of detail, ranging from that supporting a quick "thumbnail sketch" of a data set to very detailed documentation of a data set which may support decisions involving life-threatening situations or protection of multi-million dollar investments. Some agencies may wish to go into great detail describing the purpose, access constraints, use constraints, or distribution liability associated with the data set. Other agencies may wish to document aspects of data sets which are not part of the content standards.

Agencies may review the CSDGM and identify a minimal subset on which they wish to focus in order to minimize the effort associated with development of metadata. These subsets are often referred to as "core metadata." When structuring core metadata, agencies should make sure that the resulting metadata meets the needs of the agency in areas such as data management and archival, data sharing and transfer, clearinghouse support, and subsequent decisions on fitness of use in future situations. Core metadata for the sole purpose of minimizing metadata implementation effort may not serve the agency well.

It should be noted that although the FGDC has considered the issue of 'core' metadata, there is resistance to defining such a set, i.e. there isn't an accepted set of 'core' metadata elements. The 'Dublin Core' is an example of a 'minimum searchable set' of metadata elements. The 'Metadata Summit' in Denver, February 1996, identified another set of metadata elements which is sometimes referred to as the 'Denver Core.' Although a 'minimum searchable set' and a 'core' set of metadata elements are probably closely related, most would agree that these would not be identical sets.

Several states have developed metadata profiles to meet their specific needs. Metadata profiles are closer to the full CSDGM and may involve reordering or renaming FGDC metadata elements, adding or subtracting elements, or developing customized production rules or response lists for specific elements.

The Minnesota Governor's Council on Geographic Information has developed the Minnesota State Geospatial Metadata Guidelines (WordPerfect document). The guidelines include a table that cross-references FGDC and Minnesota State metadata element names and identifies where there is either no FGDC equivalent of the state element or where there is a compound element related to the state element.

In general, metadata profiles serve a useful function, customizing metadata implementation for specific circumstances. Unfortunately, this can lead to metadata which may not be compatible with mainstream metadata software and the CSDGM.

You have been exposed to the full CSDGM, 'core' metadata, and metadata profiles. It's now time to 'fish or cut bait,' i.e. it is time to pick one of these approaches to metadata. The authors recommend, in order to get started, that you stay with the CSDGM as opposed to defining any metadata profile which differs only slightly from the CSDGM.


Section 4

Select the Proper Metadata Tool

4.1 Deciding Between a Metadata Database or Discrete Metadata Documents
4.2 GIS / Operating System and Other Considerations
4.3 Categories of Metadata Tools
4.4 The Available Metadata Tools and How to Get Them

4.1 Deciding Between a Metadata Database or Discrete Metadata Documents

Deciding between holding your metadata in a database or to produce discrete metadata documents for each data set is somewhat dependent on the variety and volume of your data sets, as well as how often they (and the metadata) are updated. This decision will determine which metadata tools are appropriate to consider for use.

The FGDC recommends that metadata be stored in a database if your data sets are subject to frequent change, or if some of the metadata is common to many of your data sets (e.g. a data set maintained in tiles). With a database for metadata, it may be necessary to write a specialized output report generator to produce CSDGM metadata when needed or for submission to an NSDI Clearinghouse where it can be made searchable (Metamaker and NOAA metadata tools already have built-in report generators to produce CSDGM compliant, or nearly compliant metadata). On the other hand, if your site is an NSDI Clearinghouse, then it may be possible to write the appropriate SQL interface between your Isite server and your database to allow the database to be queried directly, without the need to produce any intermediate discrete metadata documents.

If your data holdings have few metadata elements in common, then discrete metadata documents are a simple way to hold your metadata, and almost any tool can be used to produce it.

Making the change from discrete metadata documents to a database system is entirely possible if your metadata is brought to a common form with cns and mp. It is likely that some standard metadata database structures will be developed and conversion programs will allow compliant metadata to be imported into those structures.


4.2 GIS / Operating System and Other Considerations

The tools which are an option for your site depends on the hardware and operating systems that are available, what your GIS package is, and perhaps even the approval of management at your agency. It is certainly desirable to use a tool which operates on hardware and under the operating system that you are familiar with. If your GIS supports a metadata function, or there are metadata tools which are specific for it, then using that built-in or specific tool is probably the most efficient way to produce at least a portion of the metadata. Unfortunately, at the present time, there are no Geographic Information Systems with extensive or CSDGM compliant metadata functionality built-in. There are some metadata tools which are specific for workstation level Arc/Info.

If you are working under a specific metadata profile, then there may be a tool which is tuned to it. Examples are the new Metamaker for National Biological Service extensions, DataLogr for the IMAGIN Data Sharing Network, and mdc for the Florida Data Directory. The following is a breakdown of metadata tools by GIS/platform/OS:

UNIX with Arc/Info
blmdoc (aml), data dictionary (aml), document (aml), fgdcmeta (aml) 1.1, metalite (aml) Beta 1.8, findarc
UNIX (and possibly Linux)
cns, mp, mdc, Oklahoma metadata creator, xtme
MS-Windows
NOAA FGDC Metadata Toolkit 1.0 Beta, Metamaker 2.10, DataLogr 1.0, The MDC (Metadata Collector),
KMDD (Klamath Metadata Dictionary), Corpsmet95, Dataset Cataloger 4.0, Metadata Manager Professional 2.0,
Metadata Management System, Metagen32
MS-DOS
cns, mp, Corpsmet, Oklahoma metadata creator
Any platform with a Web browser
Metamorph, BIC Metadata Form, Metadata Lite Entry Form, Metadata Validation Service
Any platform with a text editor or a word processor
ASCII templates

4.3 Categories of Metadata Tools

Metadata tools may be separated into categories based on their operating characteristics and function. The following four categories of metadata tools seem distinct:

Intelligent
These tools extract some information from spatial data sets without the user having to determine it and then separately record it. Examples in this category are data dictionary (aml), document (aml), fgdcmeta (aml), blmdoc (aml), metalite (aml), and findarc. The sort of information automatically determined from Arc/Info coverages are bounds, projection information, attributes, and vector feature count. None of these tools perform all documentation - the user will need to supply descriptive information such as the abstract, contact and distribution information, and explanation of attributes, although the ability to do this may be built into the editing functions of the tool.

Forms-based
These tools provide a user interface which helps guide the user throughout the documentation process. Typically a series of forms with fill in boxes or pick lists is central to the tool. Some of these tools indicate which are the optional and mandatory elements and have on-line help. Several of these are built on the framework of a database which makes it easy to recycle portions of metadata which may repeat between data sets. This category has the most representatives and includes: NOAA FGDC Metadata Toolkit, Metamaker 2.10, xtme, Corpsmet 1.02, Oklahoma Metadata Creator, The MDC (Metadata Collector), DataLogr 1.0, Metamorph, BIC Metadata Form, Corpsmet95, Dataset Cataloger 4.0, Metadata Lite Entry Form, Metadata Management System, Meta Data Manager Professional 2.0, Metagen32, NOAA FGDC Metadata Toolkit 1.0 Beta, and KMDD (Klamath Metadata Dictionary)

ASCII and word processor templates
These are not metadata tools per se; instead an existing text editor and word processor is used to edit these template documents which contain all or most of the possible metadata elements and to add text to those elements that are appropriate. Unneeded or empty elements are deleted, repeating elements must be copied and pasted repeatedly. ASCII templates are simple to use, require no GIS software or other specialized software, and may be cloned for parts of the metadata which are common to several data sets. A major drawback for templates is that there is no built in control of the structure; in the process of cutting and pasting it is easy to damage the structure of the template so it is no longer CSDGM compliant. There are a number of representative templates around in various word processor and ASCII forms.

Utilities
This category includes tools and services which are not used for the primary production of metadata, but rather are used to process it in some form. In that category there are tools to find data sets (findarc), to pre-process metadata into consistent format (cns), and to validate metadata (mp and the Metadata Validation Service, mp's on-line counterpart).

4.4 The Available Metadata Tools and How to Get Them

Based on the above material, you may have now arrived at a specific category of tools, or possibly even the most appropriate tool. Now you may want some more in depth information about it, or perhaps even a critical review by someone who has used it. You are in luck -- there are several reviews available which also give links to where to obtain these tools.

Finally, keep in mind that metadata tools are evolving rapidly. Watch for postings to nsdi-l announcing new tools or new versions of tools which may not be covered in the above reviews.


Section 5

Start Out Simple

5.1 Choosing and Loading the Metadata Tool
5.2 Using the Metadata Tool
5.3 Metadata Validation and Review
5.4 WLIA Metadata Mentoring Exercise

The previous two sections developed a strategy for metadata collection and reviewed metadata creation tools. With this information, you are ready to begin documentation of spatial data. As the title of this section implies, metadata novices may wish to pick simple spatial data to begin with. Examples may include data created from a single well understood source such as a planimetric feature from a specific aerial flight or a small set of discrete points such as well locations and associated attributes. Complicated spatial data generated from a variety of sources and containing hundreds of attributes will distract from the purpose of gaining a better understanding of the content standards and the chosen metadata tool. Once a degree of experience with the content standards is obtained, more complicated data sets can be tackled. It is important to note that even with simple data sets, there may be a number of confounding factors that complicate metadata creation.

To illustrate the metadata creation process, we have chosen a relatively simple spatial data set documenting coastal recession rates on the Great Lakes coast of Wisconsin. The graphic data are derived from Public Land Survey Sections (PLSS) Land Net as originally developed by the U.S. Geological Survey and enhanced by the Wisconsin Department of Natural Resources. Sections bordering on the Wisconsin coasts of Lake Michigan and Lake Superior were extracted from the PLSS Land Net and attributes were added representing minimum and maximum annual recession rate, minimum and maximum bluff height, and recommended construction setback.


5.1 Choosing and Loading the Metadata Tool

The metadata creation tool selected to develop the coastal recession metadata was Peter Schweitzer's xtme, a program that runs on UNIX workstations. It was chosen because the menu driven X-windows interface is easy to use, there is an extensive help file system which incorporates the production rules of the content standards, its cut and paste function supports copying sections within or between xtme sessions, and its output is readily ingested by mp, the metadata validation software.

The mp program checks metadata for consistency with the content standards and produces formatted output in a variety of forms, including ASCII text, HTML, and SGML. It also produces a diagnostic error report for metadata files that do not meet the CSDGM.

Both xtme and mp were downloaded from a server at the U.S. Geological Survey. Loading the software on a SUN workstation for use involved some minor work at the user shell level. Alias commands were added to the .cshrc file and a resource file for xtme was added to the .Xdefaults file. After these modifications, both programs were operational.


5.2 Using the Metadata Tool

Xtme has a split window interface where the upper window includes a pulldown menu system and an outline view of metadata elements arranged in an indented hierarchical form and the lower window is designed for data entry and editing. Here is a graphical view of the interface.

Because metadata already exists for the PLSS Land Net as developed by the Wisconsin Department of Natural Resource, it was decided to use this as a starting point for the coastal recession metadata. Working from this existing basic metadata saved time associated with metadata creation for the coastal recession data set. In this case, the existing PLSS Land Net metadata included only the two most basic sections of the content standards. These are Section 1 - Identification Information and Section 7 - Metadata Reference Information. Xtme was used to edit the elements of these sections to make them relevant to the coastal recession data set. Metadata for the remaining five sections (Section 2 - Data Quality, Section 3 - Spatial Data Organization Information, Section 4 - Spatial Reference Information, Section 5 - Entity and Attribute Information, and Section 6 - Distribution Information) was entered from scratch, which took significantly more time. The help system in xtme was very useful in completing these sections.


5.3 Metadata Validation and Review

After completion, the coastal recession metadata was run through the mp software to test compliance with the content standards. Several iterations on metadata validation were undertaken. "Error messages" from mp were very useful in fine-tuning the metadata and actually aid in learning the production rules of the Content Standards. A final step in the metadata creation process is to have someone experienced with the Content Standards check the final metadata document and point out any inconsistencies in the metadata or ways in which it could be enhanced.

Here is the (nearly) final version of the coastal recession metadata.


5.4 WLIA Metadata Mentoring Exercise

An exercise has been created to walk you through the metadata creation process. The exercise was developed by Hugh Phillips of the Wisconsin State Cartographer's Office to support a Metadata Mentoring Workshop at the Wisconsin Land Information Association 1997 Annual Conference in Lake Geneva, Wisconsin.


Section 6

Other Metadata Issues

6.1 What are you Going to do with the Metadata?
6.2 Making Your Metadata Exchangeable and Versatile
6.3 Metadata Maintenance
6.4 Data Set vs. Feature Metadata
6.5 Do the Content Standards Meet State and Local Needs?
6.6 Using Metadata to Document Items which are not Geospatial Data, e.g. Application Software or Applications of GIS Software

6.1 What are you Going to do with the Metadata?

If you have already created the metadata for your data sets you have accomplished the majority of the work. What you choose to do next can either preserve the benefits of your documentation work, or magnify it by making it more generally accessible.

6.1.1 Bundle metadata with data

If you distribute your data sets, distribute the metadata along with them.

6.1.2 Put it in a manila folder

If you put your metadata in a file folder and anyone asks about it specifically, you can point to the file cabinet and they can read about your data sets. If you have a lot of data sets they might spend a while looking at the metadata for many data sets before they realize that a specific one or none are appropriate. A paper copy of the metadata preserves much of the value of a data set within your own organization. The problem with paper copies of metadata is that they can get lost, there is not an efficient way to search them for specific content or criteria, and other users (who might be willing to pay money for your data) may never find out about it.

6.1.3 Print a metadata catalog

You could distribute your metadata as a printed catalog describing your data sets. This makes the benefits of your documentation work available to a larger audience and may save time for your organization because it no longer has to dig metadata out of the file cabinet when requested or answer so many phone inquiries about the same. The problem with a printed catalog is that it is hard to keep up-to-date, may have limited distribution because of cost reasons, and also is not efficiently searched.

6.1.4 Set up an in-house data warehouse

You could make your metadata available internally to your agency through your own in-house electronic data warehouse. By using indexing software such as WAIS or by clever design of Web pages you could make it possible for members of your agency to easily access or search your own metadata holdings.

6.1.5 Set up your own NSDI Clearinghouse node

If your metadata was squeaky clean (CSDGM and mp compliant), you could convert it to SGML and set up your own registered NSDI Clearinghouse node on the Internet. The process is very streamlined now. If you have a computer (running UNIX or Linux) and continuous connection to the Internet, an operational site can be established in less than an hour. Such a site really maximizes the payoff from your metadata documentation work because the metadata is completely (and specifically) searchable, and available to all who use the USGS metadata gateway on the Internet.

6.1.6 Contribute your metadata to an established NSDI Clearinghouse

If the idea of maintaining your own NSDI node seems like an inconvenience, or you have security concerns, or if you don't have a good connection to the Internet, you should consider contributing your metadata (and possibly also your data) to an already established NSDI Clearinghouse node. If you don't know an appropriate one for your agency, contact the FGDC (fgdc@www.fgdc.gov) for advice.


6.2 Making Your Metadata Exchangeable and Versatile

It was mentioned above that if you distribute data, you should also distribute the metadata with it. Because the formats of GIS data files are well established, their exchange between agencies is facilitated. At the present time there are no formatting rules for metadata. This freedom has resulted in several unfortunate consequences: an entire generation of metadata tools has been developed which have no common import and export function; metadata from different agencies, even though it may be CSDGM compliant, may not all be easily incorporated into, and indexed on a Clearinghouse; and agencies receiving data and metadata may find the latter difficult to incorporate into their own in-house metadata repository.

In the absence of a format standard for metadata, the next best thing for exchangeability is to insure that your metadata will pass Peter Schweitzer's metadata parser, mp. This tool will also produce an output in Standardized General Markup Language (SGML). SGML has been used for years in the printing industry and is likely to become *the* exchange format for metadata and metadata tools. You safeguard the value of your own in-house metadata by insuring that it passes mp now so that its SGML may be loaded into more sophisticated metadata tools and databases of the future.

As an added benefit, different views and subsets of complete SGML metadata may be generated for the metadata searcher based on the SGML equivalent of the style sheet, a DSSSL (Document Style Semantics and Specification Language) document. This will be a feature on future NSDI Clearinghouses.


6.3 Metadata Maintenance

With the exception of legacy data sets, metadata is unlikely to be static for data sets in current use. Feature counts and processing information are likely to change as a data set is maintained. For this reason, it must be realized that metadata creation is not a one-time event for a data set, but rather an issue which must be revisited periodically.

Manually documenting individual minute data set processing steps in the metadata is generally agreed to be unworkable as well as unwieldy. Such operations are better tracked with dedicated lineage tools, and the major cumulative processing steps reserved for the metadata. Ideally this information would be added to the metadata after completion of major steps. Alternatively, an agency may choose to review its metadata periodically to determine if it warrants update. There is a field in the Metadata Reference Information for this purpose - a scheduled Metadata Review Date.


6.4 Data Set vs. Feature Metadata

The issue of data set granularity as it affects metadata is the concern of a number of users. It is a special concern, e.g. to those who deal with parcel data, for which the reason of entry, reason for editing, and lineage of individual arcs may be of importance. An experienced GIS professional put it this way:

"My other big issue with the metadata as written by the feds is that it is theme based. My metadata is feature based. Every line, point, and text string knows who put it in, when they put it in, how they put it in, and how good it was. Land-based records have associated the history of transactions (book and page of deeds) which are the basis of the land base. Deeds are the data source for that data layer. This "metadata" is used by us because we need it to do business."

Thus, although most users consider a data set consisting of hundreds of features to be the basic unit of data, data distribution, and the level at which they would perform metadata documentation, it isn't detailed enough to satisfy those who are concerned with data at the feature level. Under the existing CSDGM there isn't a good way to deal with feature level metadata. As a short term solution, a limited amount of feature level metadata might stored as the attributes of each feature.

If one goes the other direction in data granularity, then individual data sets (consisting of many features) are only parts of an even larger whole, the data series (e.g. a series of 7.5 minute DEMs or a series of photographs from an aerial photography project). Among individual data producers/custodians, typically one of these different scales (data series, data set, and feature) will be of primary importance.

At each different scale the variety in metadata will be quite different. The data series custodian will have many instances (perhaps thousands) of data sets whose metadata is identical with the exception of a few metadata elements (like data set name and bounding coordinates). The data set custodian will typically have a limited number of data sets, but they will all be very different in theme, and the metadata for each will be quite unique (except perhaps for contact and distribution information). The feature custodian may only work on a single (but very dynamic) data set for which the major portions of metadata are fairly static, but for which lineage may be extensive. Among the data set custodians, it is probably the first who has the easiest job, the second who has the largest job initially, and the third who has the task to update the metadata most frequently.

These issues of data set granularity and the need for software cataloging tools necessary to support inheritance, entry, update and reporting of metadata at the feature, data set, or data series level has been touched upon in the paper Draft Implementation Methods for Access to Digital Geospatial Metadata. Although the idea of inheritance of metadata may work from the feature to the data set level, or the data set level to the series level, it seems unlikely that it will work (in many cases) from the feature to the data set to the data series level for something as common as tax parcels without county or statewide standardization in the data set model and the GIS software. The simple truth is that different agencies in different cities use different GISs at different times to convert different base data into polygons in different coordinate systems, and assign a different number of differently named and defined attributes to what is basically the same desired end for each agency, the geospatial description for tax parcels. In this case, very little metadata can be inherited across scales.


6.5 Do the Content Standards Meet State and Local Needs?

It isn't the intention here to try to answer that question. The intention is to mainly just raise the question that only your agency, city, county, or state can answer and do something about. But think about the following before you choose to deviate far from the CSDGM.

The usual first response to the CSDGM is that is just too complex, that is, the CSDGM doesn't just meet state and local needs, it exceeds it. This reaction is experienced by data producers as they attempt to create metadata, and by users as they examine raw, unfiltered CSDGM metadata. To the first we say, for robust and useful metadata it is going to take some work, and for data sets that may have taken months or years and thousands of dollars to produce, some effort to produce quality metadata is not unwarranted. Some of this pain in producing metadata will be reduced as metadata tools evolve. To the second we say, compliant metadata can be converted to SGML, from which, with the proper templates, almost any view of any portion of the metadata will be obtainable (such templates are coming soon).

The usual reaction to the impression of too much complexity in the CSDGM is to try to use just a portion or a profile of the CSDGM, or to abandon it entirely and make up your metadata elements and format. The latter course is not recommended, but if you are not a federal agency, or through some other leverage required to produce CSDGM metadata you can do whatever you want (including nothing). If you hope to sell your data, but your metadata is not complete and CSDGM compliant, you may have more difficulty in marketing it, especially in the future, as more and more data producers and users adopt the CSDGM. It may also be hard to let others know about your data because your metadata won't be useable on an NSDI Clearinghouse. If you choose to adopt a profile of the content standards, at least try to include the mandatory elements from the CSDGM and insure that it is compliant by testing it against mp. The MITRE Corporation study to recommend a minimum searchable set of metadata elements might also be consulted for ideas for a metadata profile, but does not itself constitute a compliant profile.


6.6 Using Metadata to Document Items which are not Geospatial Data, e.g. Application Software or Applications of GIS Software

Metadata forms a robust means to document and search for geospatial data. A question that occasionally arises is: "Can the CSDGM be used to document non-geospatial items like computer software?" Well, the answer is yes, and indeed there are examples of this, e.g. the metadata for Peter Schweitzer's metadata parser, mp. To satisfy the mandatory elements of the CSDGM however, some entries will be rather contrived (like the bounding coordinates).

Using the CSDGM to document software or mineral specimens is a little bit like using a wrench to pound nails - it will work, but is certainly not ideal. The CSDGM simply wasn't designed to document everything. One can easily envision several metadata elements which might be appropriate for documenting software, and which might be proposed as an extension to the CSDGM. The National Biological Service has incorporated a brief extension section to document application software which is to be used with a data sets into its metadata profile, but that extension is really insufficient if the primary aim is to document the software.

Another area of consideration is the documentation of an application of GIS software and geospatial data. That is how do you document the analytical results of modeling with geospatial data? Such a result may be derived from several geospatial data sets. This is an area for which the CSDGM is not perfectly suited. If you have such information then you may want to seek the guidance of FGDC or post a message to nsdi-l to see if anyone else is grappling with the problem.



Dessert Anyone?

[back to top] [back to the metadata project home page]


last updated by David Hart on June 10, 1998