XML DTDs for IOF Standard Format Attributes

Introduction

This is a follow on to an earlier document  IOF Standard Formats - Possible use of XML  which tried to suggest that XML might be applicable to the data formats that the IOF is attempting to devise and gave some examples of use. Since then there have been developments in the description of the attributes of the data and a lot of discussion about syntax, but no firm conclusiuons have been
reached.

The benefits of XML have been acknowleged by most people involved in the discussions but doubts have been expressed about the complexity and verbosity of the resulting data and the difficulty of preparing it (particularly if this has to be done manually). There seems to be agreement that a collection of simple file formats (e.g. tab or comma separated text data) may not be the way forward either. Such an approach is unlikely to be easily extensible as new trends (and hence data) emerge in the sport. As a consequence, progress seems to have halted, this document is an attempt to try to get things going again.

XML can be arbitrarily complex. The simplest XML document would just consist of a single element in which was embedded arbitrary text. For example :-

<?xml version="1.0"?>
<!DOCTYPE Data SYSTEM "Data.dtd">
<Data>
Lots of arbitrary text
which can include new lines and (most) other sysmbols $#@
</Data>

Here the first two lines just define it is XML and that the format is described in the file Data.dtd. The body of the XML is the single Data element. It is clear that any simple text based data format can be expressed as an XML document with this minimum 'wrapper', this would however defeat much of the purpose of XML. Nevertheless, it does provide a possible way forward for our application. The XML document specification language (DTD - the data type definition) allows alternate element specification for a particular piece of data and this would enable a migratory approach to the problem. We could, for example, embed the current N3Sport ranking format  or the OLEinzel archive formats, in simple XML elements but then have a more structured and extensible alternative for future use. Of course the migration process would never be entirely painless, but we ought to be able to avoid major upheavals if we get the design right now.
 

How to start?

We now have a good (probably not perfect) set of attribute definitions. If we analyse these carefully and put together the best structured XML definitions we can manage, then these can be simplified down to text based alternatives where appropriate. This would seem to be the 'top down' way of approaching the problem and hence the most likely route to a consensus.

This document is therefore a first attempt to turn our attributes into XML DTDs and to provide examples of the various forms of data which will help discussion of the detail and the alternatives.
 

The Attribute Definitions

These  attribute definitons have been published in tabular form. The detailed low level data still needs further specification, but most of the major categories are defined. It was felt that the first step from here to XML was to try to turn this into some form of object diagram. This is because XML has a basic object oriented form and it is natural to share objects rather than to duplicate information and this sort of structure will be particularly important to the sharing of distributed data.

This  attribute object diagram (needs a postscipt viewer) presents all the major attribute objects and shows how they are linked. Attributes only appear as objects on the diagram if they either have sub-structure or they are natural candidates for sharing.  Any other field is assumed to be expressed as an XML attribute, that is a simple data field within an element. The object diagram does not correspond directly to the IOF attribute tables. The most obvious difference is the identification of a Person object which seems to be a natural abstraction. (And an obvious candidate for distributed data in e.g. national membership databases). Nor is the decsion about what should be an object entirely consistent. For example Splits appears in the IOF attribute tables as simple character data but has been indicated here as an object in the belief that it probably ought to have sub structure. It is not claimed that this picture is entirely correct, but seems to be a good way of starting the DTD design. Hopefully we can refine this picture until we get it right.

There are other things that haven't been considered in the original attribute document which would become possible using XML. For example, the Control Description fields are assumed to be textual descriptions whereas they could readily be included as links to the appropriate graphical image objects.

The diagram has some annotations which need explanation. XML DTD allow the specification of embedded elements which are either optional or repeat zero or more times. In the object diagram optional element are shown by dotted links whereas repetitions have a * alongside the link. Again it is not claimed that these are all correct yet.
 

The DTDs and sample XML

From the object diagram it is a relatively mechanical step to produce the XML DTDs. Objects become elements and links are embeddings of elements. Anything that appears in the IOF attribute table which is not an object becomes an XML attribute. A set of  complete DTDs  have been written to illustrate this.

The following sample XML documents  express the content of the 6 major data items described in the attributes document. These are not intended to be complete, but give the flavour of the data.
 

Parsing and Viewing XML

One of the worries about XML is the need for complex software to read (and write?) it. However, XML is rapidly becoming an accepted standard and software is becoming available. Of particular note is IBM's alphaWorks site which has both  Java  and  C++  parsers available for download. However,, one of the easiest ways to view XML and get an idea of its structure is to use the latest version of Internet Explorer 5  which is now available for a wide range of operating systems and has built in support for viewing XML documents.

The above sample XML documents are actually embedded in HTML so that they appear as the original XML source. However, the following links are to actual XML versions so that you can either download them (press <shift> as you click the link) for use with a parser or view them directly if you have Explorer 5.

 Competitor.xml
 Club.xml
 Event.xml
 Class.xml
 Course.xml
 Control.xml

Note that if you do not have XML support you may either see the XML source if you click on these links, or you may see nothing! (it seems to depend on the browser). With Explorer 5 you should see the data formatted with attributes etc. highlighted in different colours.
<!ELEMENT Competitor (SimpleData|(Person,Times,CCardData))>
 

Simplifying the DTD

As discussed above, we have started with complex DTDs appropriate to the IOF attribute data but accept that it may be desirable to have much simpler versions. This is an area for much discussion, but this section presents an example of what is possible within the DTDs specified.

Take the definition of the Competitor. Currently we have the DTD:-

<!ELEMENT Competitor (Person,Times,CCardData)>
<!ATTLIST Competitor
  StartNumber CDATA #REQUIRED
  EnteredClass CDATA #REQUIRED
  DrawParameter CDATA #IMPLIED
  Status CDATA #REQUIRED
>

Where 'Person', 'Times',and 'CCardData' are themselves elements with a significant amout of structure which results in the XML definition of a  Competitor  seen before. Instead of this, we could define the element as :-

<!ELEMENT Competitor (SimpleData|(Person,Times,CCardData))>

 the | in the above meaning 'or' so that a Competitor is either 'SimpleData' or the structured form using Person etc. as above. If we then define :-

<!ELEMENT SimpleData (#PCDATA)>

this just says that SimpleData is text data (of any form we wish).

(Note here that XML syntax does not allow us to define:-

<!ELEMENT Competitor (#PCDATA|(Person,Times,CCardData))>

thus omitting the SimpleData element and making the alternative even less complex, I think it would make the parsing more difficult)

An alternative Competitor definition might now look like:-

<Competitor StartNumber="23" EnteredClass="M45" Status="OK" >
<SimpleData>
   M50,222671,Manchester and District Orienteering Club,MDOC,M,GBR,Watson,Ian,19:05:1947,
   52 Leegate Road,Heaton Moor,Stockport,SK4 4AX,UK,+44 161 7788,watson@cs.man.ac.uk,
   http://www.cs.man.ac.uk/~watson,18:6:1990,10.00.00,18:6:1990,11.15.34,1.15.34,
   Emit,45,1:33 2:40
</SimpleData>
</Competitor>

I.e. the main body of the data is just the sort of comma separated fields that we might generate from a database, spreadsheet etc..

In practice this still looks a bit messy as we are left with the attribute bits (StartNumber etc.). The way to avoid this is not to use XML attributes for the definitions at all, only elements. So out DTD might become:-

<!ELEMENT Competitor (SimpleData|(Person,Times,CCardData,StartData))>

where StartData is another element (which could use attributes) to hold the StartNumber etc.. If we do this then our simple data could be:-

<Competitor>
<SimpleData>
   M50,222671,Manchester and District Orienteering Club,MDOC,M,GBR,Watson,Ian,19:05:1947,
   52 Leegate Road,Heaton Moor,Stockport,SK4 4AX,UK,+44 161 7788,watson@cs.man.ac.uk,
   http://www.cs.man.ac.uk/~watson,18:6:1990,10.00.00,18:6:1990,11.15.34,1.15.34,
   Emit,45,1:33 2:40,23,M45,OK
</SimpleData>
</Competitor>

The two alternative XML files for a Competitor using the modified dtd  are available here:-

 Comp.xml  the complex form
 Comp1.xml  the simple form
 

What Next?

More Discussion??