Metadata is basically information about a particular data. In simple terms, it is data about data. It is structured information that allows retrieval, usage and management of an information resource. If there is information content, whether in form of print, electronic, image, text or video, then a metadata of that information content will exist which will tell you about its length, author, publishing date etc. The metadata also includes the brief summary, abstract or preview of a particular information content. The metadata is very useful tool to locate and access electronically archived data.
Such as to meet a person you have to know its name, contact number and address, likewise to access a specific information and data, you have to know its structure, type and location which is contained in metadata. Metadata Defined The term metadata has wide usage in different organizations and communities. For some, metadata is a kind of machine readable information which is in the form of procedures and is used to perform operations on information. However for others, it is simply information which describes several kinds of electronic information resources.
Metadata is also extensively used in libraries to describe digital and non digital data. The library catalogues are form of metadata. Metadata is used to describe information objects ranging from published books to electronic data, artistic items, training materials and scientific knowledge. There are three main categories of metadata. 1. Descriptive: This type of metadata is used for the purpose of finding and locating information resource and includes elements such as abstract, title, author and keywords. 2. Structural: This type is used for compound objects and indicates how single objects were combined to form a compound one.
Such as how pages are organized to form chapters and subsequently how chapters are organized to form book. 3. Administrative: This type of metadata is used to manage information resource, such as date created, data and file type, who is authorized to access it and other technical info. The administrative metadata has two subsets. a. Rights management metadata: It is related to intellectual property rights. b. Preservation metadata: It contains information to archive and store information content. Hence metadata ensures safe haven of key information resources which is likely to be used in the future.
Examples There are many instances in which metadata have been used to describe other data and information. The libraries is a popular place where books are indexed using the metadata standard of Dewey Decimal System. This system is one of the oldest which uses 3 x 5 inch cards to contain information regarding the book, such as title, author, theme and brief abstract. Each book is also assigned an alpha numeric abbreviated code to indicate the location of book on the shelves. Nowadays libraries use MARC and AACR2 as the rules and standards to develop metadata about the resource.
It provides good bibliographic information about the information resource existing in the library. Apart from the textual data, images and videos also uses metadata to describe its content. Usually the metadata includes the name of the owner / creator, copyright info, the camera model and keywords. Some of the photographic metadata standards used are Information Interchange Model by IPTC (International Press Telecommunication Council), Core Schema by IPTC, Extensible Metadata Platform, Exchangeable Image File Format, Dublin Core and PLUS (Picture Licensing Universal System).
In videos, the transcripts or textual descriptions are very important component of metadata as it help computer to understand video content and subsequently make the video searchable on computer or internet. The most common use of metadata is in the creation of Web page. The language used to create web page is HTML (Hyper Text Mark-up Language) and XHTML (Extensible HTML). This language allows the web developer to store the metadata in the header of the web page. In HTML the metadata is known as meta-tags or elements. Every meta-element in a web page has four attributes, such as name, content, http-equiv and scheme.
For example, a web page normally contains two important attributes of meta-tags: <meta name=”title” content=”Welcome to the Web Page” /> <meta name=”date created” content=”10/8/2010” /> The search engine such as Google and Yahoo use this information to index the web page. Hence many organizations creatively use metadata for search engine optimization so that their web site is shown first when relevant keywords are searched by the user. Metadata Structure Metadata has a definite structure which is defined in metadata schemes. These schemes contain certain elements which define and describe another set of information resource.
Each element has a specific purpose and they provide the guidelines to write metadata about the information resources, hence they are also known as semantics of the scheme. The writer of metadata has to give the values of each meta-element which forms the content of metadata. There are also rules which direct how the content should be formulated, such as how to identify the variable (name, date, publisher, and keyword) and what must be the correct form of content values. The elements and its content also need to be correctly encoded and this is done through following the syntax rules.
The most famous syntax rules used in encoding are SGML (Standard Generalized Mark-up Language) and XML (Extensible Mark-up Language) which are developed by W3C (World Wide Web Consortium). XML is the extension of HTML which is commonly used to develop websites. For different purpose and disciplines, many organizations have developed large number of metadata schemes. Some of them are Dublin Core, TEI (Text Encoding Initiative), METS (Metadata Encoding and Transmission Standard), MODS (Metadata Object Description Schema), EAD (Encoded Archival Description) and LOM (Learning Object Metadata).
Following is the brief description of four of these metadata schemes. Dublin Core: In 1995, venue Dublin, Ohio, a workshop was organized by OCLC and National Centre for Supercomputing Application (NCSA) and during the discussion the participants decided to develop a set of elements for metadata which came to be known as Dublin Core. Since then there have been many changes and improvements in Dublin Core elements, which is managed by Dublin Core Metadata Initiative (DCMI). The Dublin Core was initially made to assist web developers in describing their web resources.
However, later on, its utility increased because of proliferation of electronic resources as well as it also proved to be an efficient way to catalogue library’s information resources. Hence the Dublin Core initially consisted of 13 elements which were later increased to 15 in order to cater to wide usage of metadata in different environment and information resources. The example of Dublin Core scheme is provided in Fig. 1 in Appendix. The Text Encoding Initiative: It is an initiative which provides two support functions, such as mark-up guidelines for electronic resources and storing of metadata.
In other words, it helps both in writing and encoding the metadata in an electronic resource. The electronic text can be anything ranging from plays, novels and poetry. TEI uses the SGML as its mark-up language which provides the syntax, structure and set of tags to write metadata which is usually stored in the header of the resource. This metadata about electronic text is very helpful in conducting research on topics of science and humanities. There is now increasing trend of converting printed text into electronic one which is encoded with metadata to provide bibliographic information.
The libraries are increasingly using its MARC records to convert them into TEI headers so that it provides a full range of bibliographic information to the researcher. Metadata Encoding and Transmission Standard: There is also the need to define the structure of the data and information resource. This task is undertaken by METS which describes complex digital library objects. Hence METS especially perform the function of structural metadata along with providing the information regarding descriptive and administrative metadata.
There are instances when different pages of a book are digitized separately. METS then provide the needed information to ensure correct organization of the complete book. Now as the pages are digitized and added, the technical metadata updates the digital book and tells the reader what percentage of the original text is digitized so far and what is left to be converted. The technical metadata also ensures connectivity among digital libraries to ensure the completeness of information resource when data are transferred among the libraries.
Metadata Object Description Schema: It is a descriptive metadata scheme which uses XML as its syntax. MODS is used to carry existing MARC 21 records which is used in cataloguing libraries as well as create the original description of resources. MODS is a very flexible scheme which can support other metadata standards also, such as MODS can be integrated with METS to create a comprehensive metadata using a wide variety of elements and tags. MODS elements are more comprehensive and extensive than Dublin Core; hence it allows richer description of information resources than allowed by other schemes.
Simple in use, compatible with other standards and extensive list of elements make MODS a very powerful tool to embed metadata in a resource. An example of MODS record is given in Fig. 2 in Appendix. The Creation of Metadata Metadata is usually created by the author or writer of information resource, but in some conditions, disciplines and resources, the creator can be different. When the data and objects are being digitized, the structural and administrative data are usually created by the technical staff assigned or it can be generated through an automatic process.
There are available softwares and tools which creates structural and administrative metadata automatically by monitoring the type of data, the date on which it was created and person who is authorized to see it. The best strategy is to develop the required metadata as soon as the resource is created or published and then to update it subsequently as the new description about the resource is known. However, when descriptive metadata is concerned, it can only be developed by the original creator of the resource because he only knows the rationale of creation and its utility.
If the original creator does not provide the metadata then it is very difficult for the indexer to develop it because the knowledge required for descriptive metadata cannot be obtained from elsewhere. When the original author does not have the time and skills to develop metadata then he can hire the services of the indexer and information professional which will assist him with the technical stuff of standards and syntax. There are numerous ways to store a metadata. Firstly, the metadata can be contained in the same file as the resource.
The example is the web page where HTML tags are embedded in the web page. Likewise images and video files store metadata in its header. Secondly the metadata can be stored in a repository which can be linked to the resource. Hence the resource is linked to the external database of metadata. The method of embedding the metadata in the resource file is preferable because it ensures that the metadata will not be misplaced. Secondly there is no hassle to link data and metadata. Lastly, when data and metadata is in the same file then both of them are updated simultaneously.
However storing metadata in a separate file has its own advantage such as it makes the management of metadata easy. Also it facilitates the searching and accessing of data. Metadata usage in Information Management Field Metadata is a description about the information resource. Hence researchers use this description to locate, access and manage the resource. Metadata has an extensive usage in the field of information management. Metadata helps the management of information by providing the access of information to the authorized person at the right time and in correct format.
Metadata also records who, when, where, how and what were the changes made to the database and updates every detail of the changes made. Metadata also ensures the perpetuity of the information resource by saving its bibliographic information and never letting the resource to be lost or misplaced. The internet is composed of web pages which are repository of information and knowledge. The web is like the library and metadata also helps in searching and locating the required information from the web. Every web page has the embedded metadata such as shown below. <Head>
<Title>The use of metadata in information management</Title> <Meta name=”description” content=”How to locate information from internet? ”> <Meta name=”keywords” content=”metadata, locate, information, internet, search engines”> </Head> It is the HTML coding of a typical web page. All meta-tags are inserted into the header of the web page. Except from the title tag, no other tags are viewed on the web page. Now here comes the function of search engine which is software that indexes the web pages by searching the web. The search engines looks for the embedded tags along with the body content to index web page.
In the example above, the use of important words in the title, description and keywords tags will help in locating the website for anyone who is searching for the topic “The use of metadata in information management”. Nowadays, organizations are increasingly using search engine optimization by filling in the required tags so that their websites pop up on the list when the search engine crawls the web. This is an effective technique for advertising on internet. The metadata also have important application in data warehouse to locate and manage the data.
Data mining, OLAP, reporting and query are tools for extracting data from the warehouse which will be used by the analyst to infer something meaningful. All of these tools use metadata for its functions. For example a marketing executive wants to analyse the sales potential of a certain region by extracting relevant data from the database. But where to start the extraction process as the executive is unknown of the characteristic, form and format of the data stored in the warehouse. Hence he will take help from the business metadata which contains the detail description about the records, such as field name, length, data type, index etc.
The business metadata is a classification in data warehouse field to help connect end user with the data. Now the metadata will answer all the queries which will be coming into his mind when he sits for data extraction, such as ? How the sales figures are stored by product, store, day or months? Whether they are the aggregate or represents individual transaction? ? Can temporal comparison be made by comparing this year sales figure with that of last year? ? What is the formula and accounting procedures used to calculate profit and what business rules apply? ? How the regions are divided by state, cities or counties?
The timeliness of the sales figure whether they are updated and represent the true picture of actual sales? Hence, when metadata provides all of the above information, the analysis done by the executive is accurate. When the end user has the complete know how of the type and characteristics of the data then he finds it easy to write sql queries to find data, such as “select region_name, date, sales, profit from sales_record where region_name=”Alaska” ordered ascending” Importance of Metadata As aforementioned, metadata is an effective tool for retrieving and managing information resource.
Metadata is indispensable in getting the knowledge about the resource, in other words, metadata provides full detail of bibliographic information. Scientist, journalist, lawyers, academicians and researchers find it very useful to search reference books, articles, pieces of theories, philosophies and knowledge regarding their field of interest. There is a popular term used nowadays “information overload”. As humans stepped into the internet world, they are faced with lot more information than human mind can process. Hence there are instances when bulk of information creates hindrance in finding the relevant information.
Metadata in a way solves this problem by helping the user to narrow down his searches by use of metadata. The user reads the text in the meta-elements to decide whether the resource is useful for his purpose. This article has mainly discussed about how to use metadata to find information in internet and data warehouses, but the importance of metadata also extends to other vital aspects related to information management. Metadata is a significant tool for administrative control, preservation, authentication and authorization, security, relationship database, content format and presentation.
Traditional libraries, production and marketing companies, service companies, scientific and humanities online journals, news related web sites all use metadata to help user to get to their websites for relevant information. Recently the metadata is increasingly used by Geospatial organizations that use GPS system to map the location. Geospatial organizations make navigation system which allows the user to find routes and places. There is a new term which is increasingly used with metadata, which is taxonomy.
Taxonomy is basically a hierarchy of concepts that helps in controlling vocabulary. Many organizations, that make an effort to reach the public, monitor the common search terms, concepts and words that are used by the end user for finding information. The organization then use this know how about the end user to manage and update the stored taxonomy in their database. The taxonomy is then attached with the metadata so that their resources have the higher probability to be seen by the public. The use of taxonomy is increasing as it refines the search of information.
The concepts and vocabularies contained in taxonomy also have synonyms and related words. So when the user types in any of the related words, the taxonomy of the organizations recognizes it instantly and delivers the required data to the public. For example publishing companies increasingly use taxonomy to sell their list of books. For example McGraw Hill has a book called “Principles of Marketing” by Philips Kotler. When the user types in the words such as distribution channels, product and strategy, then the above mentioned book might be listed as the potential source of information.
Conclusion The application of metadata started with the simple use of descriptive metadata for identification and locating the resources. However, today the metadata functions and capabilities have increased. We have structural metadata to give correct format and presentation to the data which is understandable by the user; administrative metadata to define who is authorized to access the info, preserve copyrights and patents and ensure the security of the data. There are also established standards and schemes for creating metadata such as Dublin Core, METS, MODS and TEI.
However to facilitate the increasing exchange and migration of digital objects, new standards are being developed to provide interoperability between systems. The newer standards developed by NISO / AIIM are Z39. 87 and Data Dictionary – Technical Metadata for Digital Still Images. The metadata has revolutionized the way information can be located and used. The World Wide Web is the best example and archetype of the use of metadata. As the technology, metadata standards, schemes and elements are further developed and consistently used, the WWW will become the most powerful hub of information resource.
References Baca, M. (2008). Introduction to Metadata. 2nd ed. Getty Research Institute. DCMI, (2010). DCMI Metadata Basic. Retrieved August 13, 2010, from http://dublincore. org/metadata-basics/ Hillmann, D. I. (2006). Metadata in Practice. Amer Library. Inmon, W. H. & O’Neil, B. (2007). Business Metadata: Capturing Enterprise Knowledge. Morgan Kaufmann. Ma, J. (2006). Managing metadata for digital project. Library Collections, Acquisition and Technical Services, 30(1), 3-17.