IOF Standard Formats - Possible use of XML


This page is an attempt to examine the use of XML as the way of exchanging  Orienteering data in a standard format. This is part of the work being done by the  IOF Information Standards Project . XML is the Extensible Markup Language  which has been defined by the World Wide Web consortium in an attempt to provide a more flexible alternative to HTML for general exchange of formatted data across the Web. One major reason for considering XML is that is is itself being proposed as an international standard and, as such, tools to generate and read it will become (are?) widely available.

An initial reaction to the suggestion of using XML is likely to be that it is too general purpose and too complex for the simple task of representing Orienteering data where all that is required is a set of simple file formats specifying things like names, age classes, course lengths, times etc. However, there are a number of reasons why a simple solution may not be adequate.

To cope with the above, the data formats will have to be flexible and extensible if they are to last for any length of time and this will only be achieved by the use of a well designed data format description framework and that is what XML provides. This document presents, by example, some simple XML formats of Orienteering data to demonstrate how it handles these issues.

XML nd the World Wide Web

Apart from specifying data in a flexible way XML has also been designed in the context of the World Wide Web and as a result provides some important additional capabilities. The Web provides many opportunities for distributed information handling. What this means is that it is not necessary when exchanging information to either send or store copies of all information which might be relevant to a particular application. Instead, it may be sufficient to send basic information but containing links to additonal data which can be retrieved if and when required. For example, simple results data may only need simple fields such as name, class, club and time. But, it is possible that bigger events might want more - say telephone number or postal address or email address - for query or prizegiving purposes. Not only will it be a significant problem for an event official to assemble all this data (in an up to date form!) but in a lot of cases it will be redundant. It would be a lot easier if the data could simply provide sufficient information (a web URL + IDnumber) to enable this information to be retrieved automatically from a National Federation database. Such a system would be particularly effective in preventing the problems which occur when multiple copies of data are distributed and could make the administration of the sport significantly easier.

Facilities to do this are already provided in existing XML and associated tools (and a working example is given later in an associated document). It is, of course, unlikely that we will be ready to make extensive use of this in the near future but the future capability may prove invaluable. This facility does not, in practice, add any real complexity to the use of XML for straightforward data and its use can be ignored until wanted.

Basic XML Principles

This is not intended to be a comprhensive XML tutorial, many of these can be found via the  XML Home Page , however the relevent points will be outlined.

Any well defined data format must have a formal Syntax Definition which says what the elements of the data are and how they are put together. There is a commonly used notation called Backus Naur Form (BNF) which has been used for many years in computing to describe things like the syntax of programming languages. Data formats such as HTML, MIME encodings and other web related data can be described in a limited form of BNF but one which is poweful enough to provide the ability to express the sort of variable and optional data discussed above.

However, a fixed format language such as HTML has a fixed syntax and this means it is not capable of expressing general data nor of being extended easily. A major feature of XML is that it provides the capability for a user to write both the BNF syntax and the content of a general piece of data. For maximum flexibility an XML document can contain both syntax and data and a system which accepts XML may have to interpret the data according to arbitrary rules of syntax. This is however far too flexible for most applications, after all if the syntax of data is totally flexible then this implies that it can also have arbitrary content and no specific application would be able to handle this.

Instead, it is normal for a particular XML data format to have a fixed syntax definition written specifically for the application. There are then varying degrees of flexibility which can be exploited in applications which handle the data.  Two examples are :-

Syntax rules built in
The applications programmer can take the syntax definition and write a progam which parses the data according to these fixed rules. This is probably the simplest approach but the least flexible. Depending on the exact approach takem, it may be possible to cope with extended data formats but retrieveing additional data will require re-programming.
Dynamic Interpretaion of Syntax Data
An XML document can define either a completely new set of syntax rules or extensions to an existing rule set. As mentioned previously, it is unlikely that a program would know how to deal with totally general data formats. However, there are a number of advantages to providing aplications which can deal with XML documents that use a standard base syntax but also define extensions. One very useful extensions might be the one which allows the use of references to external data via a general URL defining, for example, access to a remore database. There exist already general purpose parsers which understand and handle general XML syntax and applications built on these can exhibit significant flexibility.
XML syntax definitions are generally provided in Document Type Defintions (DTD) and the following  examples are DTDs which define data formats based on the requirements presented in  basic IOF data formats .