The benefits of XML have been acknowleged by most people involved in the discussions but doubts have been expressed about the complexity and verbosity of the resulting data and the difficulty of preparing it (particularly if this has to be done manually). There seems to be agreement that a collection of simple file formats (e.g. tab or comma separated text data) may not be the way forward either. Such an approach is unlikely to be easily extensible as new trends (and hence data) emerge in the sport. As a consequence, progress seems to have halted, this document is an attempt to try to get things going again.
XML can be arbitrarily complex. The simplest XML document would just consist of a single element in which was embedded arbitrary text. For example :-
<?xml version="1.0"?>
<!DOCTYPE Data SYSTEM "Data.dtd">
<Data>
Lots of arbitrary text
which can include new lines and (most) other sysmbols $#@
</Data>
Here the first two lines just define it is XML and that the format is
described in the file Data.dtd. The body of the XML is the single Data
element. It is clear that any simple text based data format can be expressed
as an XML document with this minimum 'wrapper', this would however defeat
much of the purpose of XML. Nevertheless, it does provide a possible way
forward for our application. The XML document specification language (DTD
- the data type definition) allows alternate element specification for
a particular piece of data and this would enable a migratory approach to
the problem. We could, for example, embed the current N3Sport ranking format
or the OLEinzel archive formats, in simple XML elements but then have a
more structured and extensible alternative for future use. Of course the
migration process would never be entirely painless, but we ought to be
able to avoid major upheavals if we get the design right now.
This document is therefore a first attempt to turn our attributes into
XML DTDs and to provide examples of the various forms of data which will
help discussion of the detail and the alternatives.
This attribute object diagram (needs a postscipt viewer) presents all the major attribute objects and shows how they are linked. Attributes only appear as objects on the diagram if they either have sub-structure or they are natural candidates for sharing. Any other field is assumed to be expressed as an XML attribute, that is a simple data field within an element. The object diagram does not correspond directly to the IOF attribute tables. The most obvious difference is the identification of a Person object which seems to be a natural abstraction. (And an obvious candidate for distributed data in e.g. national membership databases). Nor is the decsion about what should be an object entirely consistent. For example Splits appears in the IOF attribute tables as simple character data but has been indicated here as an object in the belief that it probably ought to have sub structure. It is not claimed that this picture is entirely correct, but seems to be a good way of starting the DTD design. Hopefully we can refine this picture until we get it right.
There are other things that haven't been considered in the original attribute document which would become possible using XML. For example, the Control Description fields are assumed to be textual descriptions whereas they could readily be included as links to the appropriate graphical image objects.
The diagram has some annotations which need explanation. XML DTD allow
the specification of embedded elements which are either optional or repeat
zero or more times. In the object diagram optional element are shown by
dotted links whereas repetitions have a * alongside the link. Again it
is not claimed that these are all correct yet.
The following sample XML documents express
the content of the 6 major data items described in the attributes document.
These are not intended to be complete, but give the flavour of the data.
The above sample XML documents are actually embedded in HTML so that they appear as the original XML source. However, the following links are to actual XML versions so that you can either download them (press <shift> as you click the link) for use with a parser or view them directly if you have Explorer 5.
Competitor.xml
Club.xml
Event.xml
Class.xml
Course.xml
Control.xml
Note that if you do not have XML support you may either see the XML
source if you click on these links, or you may see nothing! (it seems to
depend on the browser). With Explorer 5 you should see the data formatted
with attributes etc. highlighted in different colours.
<!ELEMENT Competitor (SimpleData|(Person,Times,CCardData))>
Take the definition of the Competitor. Currently we have the DTD:-
<!ELEMENT Competitor (Person,Times,CCardData)>
<!ATTLIST Competitor
StartNumber CDATA #REQUIRED
EnteredClass CDATA #REQUIRED
DrawParameter CDATA #IMPLIED
Status CDATA #REQUIRED
>
Where 'Person', 'Times',and 'CCardData' are themselves elements with a significant amout of structure which results in the XML definition of a Competitor seen before. Instead of this, we could define the element as :-
<!ELEMENT Competitor (SimpleData|(Person,Times,CCardData))>
the | in the above meaning 'or' so that a Competitor is either 'SimpleData' or the structured form using Person etc. as above. If we then define :-
<!ELEMENT SimpleData (#PCDATA)>
this just says that SimpleData is text data (of any form we wish).
(Note here that XML syntax does not allow us to define:-
<!ELEMENT Competitor (#PCDATA|(Person,Times,CCardData))>
thus omitting the SimpleData element and making the alternative even less complex, I think it would make the parsing more difficult)
An alternative Competitor definition might now look like:-
<Competitor StartNumber="23" EnteredClass="M45" Status="OK" >
<SimpleData>
M50,222671,Manchester and District Orienteering Club,MDOC,M,GBR,Watson,Ian,19:05:1947,
52 Leegate Road,Heaton Moor,Stockport,SK4 4AX,UK,+44 161
7788,watson@cs.man.ac.uk,
http://www.cs.man.ac.uk/~watson,18:6:1990,10.00.00,18:6:1990,11.15.34,1.15.34,
Emit,45,1:33 2:40
</SimpleData>
</Competitor>
I.e. the main body of the data is just the sort of comma separated fields that we might generate from a database, spreadsheet etc..
In practice this still looks a bit messy as we are left with the attribute bits (StartNumber etc.). The way to avoid this is not to use XML attributes for the definitions at all, only elements. So out DTD might become:-
<!ELEMENT Competitor (SimpleData|(Person,Times,CCardData,StartData))>
where StartData is another element (which could use attributes) to hold the StartNumber etc.. If we do this then our simple data could be:-
<Competitor>
<SimpleData>
M50,222671,Manchester and District Orienteering Club,MDOC,M,GBR,Watson,Ian,19:05:1947,
52 Leegate Road,Heaton Moor,Stockport,SK4 4AX,UK,+44 161
7788,watson@cs.man.ac.uk,
http://www.cs.man.ac.uk/~watson,18:6:1990,10.00.00,18:6:1990,11.15.34,1.15.34,
Emit,45,1:33 2:40,23,M45,OK
</SimpleData>
</Competitor>
The two alternative XML files for a Competitor using the modified dtd are available here:-
Comp.xml the complex form
Comp1.xml the simple form