XML for formatting (Re: Environment monitoring)

Donald Becker becker at scyld.com
Thu Oct 16 12:02:18 EDT 2003

On Thu, 16 Oct 2003 graham.mullier at syngenta.com wrote:

> [Hmm, and will the rants be longer or shorter after he's bought the mental
> lubricant?]

Buy the right amount and they will be eloquent enough that you won't
mind, or too much and they will be short and slurred ;-o

> I'm in support of the original rant, however, having had to reverse-engineer
> several data formats in the past. Most recently a set of molecular-orbital
> output data. Very frustrating trying to count through data fields and
> convince myself that we have mapped it correctly.

What you want is not XML, but a data format description language.

When I first read about XML, that what I believed it was.  I was
expecting that file optionally described the data format as a prologue,
and then had a sequence of efficently packed data structures.

But the XML designers created the evil twin of that idea.  The header is
a schema of parser rules, and each data element had verbose syntax that
conveyes little semantic information.  A XML file 
  - is difficult for humans to read, yet is even larger than
  human-oriented output
  - requires both syntax and rule checking after human editing, yet is
    complex for machines to parse. 
  - is intended for large data sets, where the negative impacts are
  - encourages "cdata" shortcuts that bypass the few supposed advantages.

> Anecdote from a different field (weather models) that's related - for a
> while, a weather model used calibration data a bit wrong - sea temperature
> and sea surface wind speed were swapped. All because someone had to look at
> a data dump and guess which column was which.

Versus looking at an XML output and guessing what "load_one" means?
I see very little difference: repeating a low-content label once for
each data element doesn't convey more information.  The only XML adds
here is avoiding miscounting fields for undocumented data structures.

What we really want in both the weather code case and when reporting
cluster statistics is a data format description language.  That
description includes the format of the packed fields, and should
include what the fields mean and their units, which is what we are
missing in both cases.  With such an approach we can efficiently
assemble, transmit and deconstruct packed data while having automatic
tools to check its validity.  And an general-purpose tools can even
combine a descrition and compact data set to product XML.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
914 Bay Ridge Road, Suite 220		Scyld Beowulf cluster system
Annapolis MD 21403			410-990-9993

Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list