IOF Standard Formats - Possible use of XML
Introduction
This page is an attempt to examine the use of XML as the way of exchanging
Orienteering data in a standard format. This is part of the work being
done by the IOF
Information Standards Project . XML is the Extensible
Markup Language which has been defined by the World Wide Web
consortium in an attempt to provide a more flexible alternative to HTML
for general exchange of formatted data across the Web. One major reason
for considering XML is that is is itself being proposed as an international
standard and, as such, tools to generate and read it will become (are?)
widely available.
An initial reaction to the suggestion of using XML is likely to be that
it is too general purpose and too complex for the simple task of representing
Orienteering data where all that is required is a set of simple file formats
specifying things like names, age classes, course lengths, times etc. However,
there are a number of reasons why a simple solution may not be adequate.
-
Orienteering data is actually quite variable in length. There are
a large number of areas where the amount of data in a particular category
is unknown and hence the ability to handle variable length input is essential.
An obvious example of this is results information where the number of competitors
varies from event to event. Such simple cases can clearly be handled by
End of File markers but it is not that easy. For example, with the advent
of electronic punching, each individual competitor's data may contain split
times and the number of these will vary depending on the course which has
been run. And what if we want a standard results format for single and
multi-day events? If we predict all circumstances where variable length
data may occur we can probably still use 'End of Data' markers but it will
get more complex
-
Orienteering data can contain lots of 'optional' components. Events using
manual punching will continue for many years. We don't want to have totally
separate formats for this but neither do we want producers of data from
such events to have to include lots of 'dummy' fields which are only of
use to electronic timing. Many nations will have particular pieces of information
(e.g for local ranking systems) which are irrelevant when exchanging information
internationally but we don't want a format which either requires all nations
to include all fields, again with lots of dummies, or individual nations
to have to produce differing formats for national or international consumption.
-
Orienteering data will evolve with the sport and with technology. If we
had tried this exercise five years ago, would we have thought of everything
necessary for electronic punching? Would we have forseen the addition of
Mountain Bike Orienteering to the interests of the IOF and put in necessary
data fields? We will not even get the formats right first time for the
data that we do know about!
To cope with the above, the data formats will have to be flexible and extensible
if they are to last for any length of time and this will only be achieved
by the use of a well designed data format description framework and that
is what XML provides. This document presents, by example, some simple XML
formats of Orienteering data to demonstrate how it handles these issues.
XML nd the World Wide Web
Apart from specifying data in a flexible way XML has also been designed
in the context of the World Wide Web and as a result provides some important
additional capabilities. The Web provides many opportunities for distributed
information handling. What this means is that it is not necessary when
exchanging information to either send or store copies of all information
which might be relevant to a particular application. Instead, it may be
sufficient to send basic information but containing links to additonal
data which can be retrieved if and when required. For example, simple results
data may only need simple fields such as name, class, club and time. But,
it is possible that bigger events might want more - say telephone number
or postal address or email address - for query or prizegiving purposes.
Not only will it be a significant problem for an event official to assemble
all this data (in an up to date form!) but in a lot of cases it will be
redundant. It would be a lot easier if the data could simply provide sufficient
information (a web URL + IDnumber) to enable this information to be retrieved
automatically from a National Federation database. Such a system would
be particularly effective in preventing the problems which occur when multiple
copies of data are distributed and could make the administration of the
sport significantly easier.
Facilities to do this are already provided in existing XML and associated
tools (and a working example is given later in an associated document).
It is, of course, unlikely that we will be ready to make extensive use
of this in the near future but the future capability may prove invaluable.
This facility does not, in practice, add any real complexity to the use
of XML for straightforward data and its use can be ignored until wanted.
Basic XML Principles
This is not intended to be a comprhensive XML tutorial, many of these can
be found via the XML Home Page
, however the relevent points will be outlined.
Any well defined data format must have a formal Syntax Definition
which
says what the elements of the data are and how they are put together. There
is a commonly used notation called Backus Naur Form (BNF) which
has been used for many years in computing to describe things like the syntax
of programming languages. Data formats such as HTML, MIME encodings and
other web related data can be described in a limited form of BNF but one
which is poweful enough to provide the ability to express the sort of variable
and optional data discussed above.
However, a fixed format language such as HTML has a fixed syntax and
this means it is not capable of expressing general data nor of being extended
easily. A major feature of XML is that it provides the capability for a
user to write both the BNF syntax and the content of a general piece of
data. For maximum flexibility an XML document can contain both syntax and
data and a system which accepts XML may have to interpret the data according
to arbitrary rules of syntax. This is however far too flexible for most
applications, after all if the syntax of data is totally flexible then
this implies that it can also have arbitrary content and no specific application
would be able to handle this.
Instead, it is normal for a particular XML data format to have a fixed
syntax definition written specifically for the application. There are then
varying degrees of flexibility which can be exploited in applications which
handle the data. Two examples are :-
Syntax rules built in
The applications programmer can take the syntax definition
and write a progam which parses the data according to these fixed
rules. This is probably the simplest approach but the least flexible. Depending
on the exact approach takem, it may be possible to cope with extended data
formats but retrieveing additional data will require re-programming.
Dynamic Interpretaion of Syntax Data
An XML document can define either a completely new set of syntax
rules or extensions to an existing rule set. As mentioned previously, it
is unlikely that a program would know how to deal with totally general
data formats. However, there are a number of advantages to providing aplications
which can deal with XML documents that use a standard base syntax but also
define extensions. One very useful extensions might be the one which allows
the use of references to external data via a general URL defining, for
example, access to a remore database. There exist already general purpose
parsers which understand and handle general XML syntax and applications
built on these can exhibit significant flexibility.
XML syntax definitions are generally provided in Document Type Defintions
(DTD)
and the following examples
are DTDs which define data formats based on the requirements presented
in basic
IOF data formats .