Dedicated to the research, development, implementation, and standardization of metadata for educational and research mathematics.
AMS Panel discussion: Wednesday, January 19. Ballroom Balcony A, Marriott Wardman Park Hotel. 2:15 - 5:15. Immediately followed by American Mathematics Metadata Task Force Meeting.
Our goal in writing this exposition is to aid those who must translate existing digital library classification schemes into IMS compliant metadata and/or who must define new discipline-specific taxonomies. All of this can take place on the human side of the equation without any knowledge of how the results will be implemented. But it is still good to understand a little about how this all works in practice. The IMS metadata standards contain some implicit taxonomies that were put together from a number of other metadata projects. These are more or less self-explanatory. Thus we should have no trouble interpreting the meaning of author in the context of Characteristics.Create. But the guts of IMS metadata refers to external taxonomies, many of which do not currently exist. For example, there are no existing universally accepted taxonomies for the educational approach taken by a piece of pedagogic hypermedia or for the level and style of a mathematical proof. Yet IMS metadata anticipates that taxonomies such as these will (and indeed must) be defined, accepted, and made available to software using IMS metadata. What will the software need? At the very least, it will need ways to look up definitions of codes and/or keywords and check that a taxonomic stairway is valid and consistent.
These are called taxonomic services. The Gateway to Educational Materials is planning to provide extensive taxonomic services associated for taxonomies written by and for member organizations, but as of this writing very little exists in the way of concrete examples. The Library of Congress Classification scheme exists in books and on Web pages, but to my knowledge it has not been put into a form that can interface with programs seeking to interpret codes or check that codes are valid. The notion of a taxonomic service is not all that relevant to the defining and acceptance of a taxonomy. We can leave the programming to someone else. The same applies to the exact manner in which IMS metadata is represented in a document, but we should at least understand how a tree can be encoded into a document. At some time, after all, we may be called upon to troubleshoot a document in which that has been done. It turns out that there are two standards, RDF and XML, that can be used for this purpose. We will take a look at XML.
XML stands for "extensible markup language". XML is in many respects like HTML with some very important differences. HTML is a quick and dirty language that specifically tells software how to display Hypertext. It really isn't designed for anything else, although a long set of ad-hoc extensions have added functionality over the pat few years. The advantage of HTML is that browsers can interpret it directly and its disadvantage is that this is the only way it can be used. It also is not "grammatically correct", but that is another story. XML is a formal grammar that can be used to describe "documents" of any type. One of its intended applications is to describe metadata and one of its underlying principles is that it can be easily interpreted by software. To render an XML document it is necessary to have a dictionary that translates between the XML instructions and instructions to a browser, printer, or other display environment. MathML, for example, is an XML standard that cannot be displayed by current browsers but that can be displayed by the WebEQ Java applet. What is important for us is that XML has a well-defined block structure. Each block must be properly opened and closed. This creates a containment relationship among blocks with no loops, and hence a tree with an implied root. Figure III shows an XML version of the LifeCycle sub-tree of the example tree given in Figure II.
XML is a little more demanding than the code in this simplification, but what really is happening is that the human readable data that occupy the leaves of the IMS Metadata tree are, from the point of view of software, not really data at all. Before we can get to including real data, we need to define its type, and these types themselves consist of IMS metadata sub-schemes. For our purposes we can ignore this level of detail. We can think of trees as we did in Figure II and think of the block structure of trees as in Figure IV. We should just be prepared to read the more complete XML versions if and when the need arises.IMS Specifications From My Point of View
A major reason for writing this document is to facilitate the translation of existing mathematical taxonomies into IMS-compliant metadata. To do that, we need to become familiar with the elements available to us in the IMS scheme. This section contains some sketchy information and makes some editorial comments about the IMS specifications in relation to this task.It is probably a good idea to go through that document, especially the tables at the end that lay out the IMS metadata schema in table form. We will be using the entire scheme (the master scheme) as a starting point but may want (or need) to define our own restriction and/or extension. Here are a few of the specifications from the master scheme that seem important to our work.
The ones that we will have to work with the most are Characteristics, Educational Use Dependent, Relation, and Technical. Three of these contain pointers to external taxonomies that we will need to define and for which we will need to seek acceptance. The fourth, Relation, is supposed to describe relations among resources. This will presumably be used to describe the relation between a theorem and an example. Since these relations are essential to mathematical exposition, we might be the ones who are forced to come to grips with them.Design Principles, No Examples, and Going Off the Deep Mathematical End
There do not seem to be a plethora of examples from which we can draw wisdom about designing taxonomies. However, we have been given one word of advice: don't overload any categories or elements. The principle behind this is that of orthogonality. There are some lengthy descriptions of this notion, but it is very easy to describe in mathematical terms. We will now go off the deep mathematical end to do this.
First, let us define a space of all on-line documents. Associated with is another space of metadata descriptions. The IMS specification tell us how to define (but do not themselves define) a map . This map is not injective. The process of searching for documents using metadata is the process of computing where is a set of search criteria. We can go on to think of as a (finite-dimensional) vector space with the metadata elements as a basis. The values of metadata define linear functionals on this vector space. This involves using real or complex valued metadata, but we can deal with this by a coding scheme. The search strategy game involves endowing with an inner product in which the metadata elements become an orthogonal basis and in which the induced metric corresponds to a human notion of what it means for two documents to be similar. If that can be accomplished, searching would be greatly facilitated. In practice, there would be a small open subset describing, for example, all documents that contain a proof of the fundamental theorem of arithmetic pitched at the undergraduate level in a style that I would find acceptable for my class. The smaller the open set, the more acurate the search. To do this, we need as a precondition that metadata elements are independent in terms of our human interpretation of similarity. For otherwise the implicit metric defined by us and the theoretical metric defined on cannot be equivalent. By the same token, if we put more information in one element than in another, then will tend to have larger pre-images for smaller sets. This will make the results of a search less meaningful, very much as they are now.