::scr Ramblings of a Classic Refugee or How I Learned To Stop Worrying and Love OS X
Andy Wardley
scr@thegestalt.org
Wed, 6 Feb 2002 09:21:00 +0000
On Wed, Feb 06, 2002 at 01:02:53AM +0000, David Cantrell wrote:
> If you say that plain ol' ASCII is not special*, you have a bootstrapping
> problem. If all files need to conform to a DTD-a-like, then the DTD-a-like
> needs to conform too, and so does whatever describes that, and ...
But all these things operate at different levels. I'm not sure I see why
you have a bootstrapping problem.
ASCII is just an arbitrary number<->character mapping scheme which makes
a stream of binary numbers easier for humans to read. There is nothing
in ASCII per se that describes the structure of files. Even control
characters like newlines, carriage returns, are still just regular
bytes with no special significance other than the one ASCII prescribes
for it.
XML is standard built on top of ASCII (but it could just as easily have
chosen ECBDIC had that been the encoding standard du jour) which describes
a syntax for representing structured information as an ASCII byte stream.
A DTD, or in more general terms, a schema, is a description of the permitted
structure and content of a particular class of XML documents.
In comparison to natural language, ASCII describes the alphabet, XML
describes the rules for joining letters into words and words into
sentences, and the Schema describes the valid kinds of sentences (e.g.
noun phrase) and associated particular human meaning to them.
So I agree that you have to put a stick in the ground and say "We'll use
ASCII and build on top of that", or even before that, say "We'll use a
binary encoding system and build on top", but after that you're up and
running.
> At some point, you will need to say "enough!" and make arbitrary definitions.
> You may as well stick with tried-and-tested convention and make plain ol'
> ASCII special
Yes, that sounds like I'm agreeing with what you're saying. But I still
don't see the bootstrapping problem.
> * - or rather that nothing is special and that the whole system is
> self-describing
Now, the issue of self-description is very interesting, particularly with
regard to XML Schema. An XML Schema is an XML document which describes,
using the syntax of XML to spell it out, the permitted structure and content
of a class of XML instance documents. And because XML Schemata are a
particular class of XML instance documents, you can write an XML Schema to
describe XML Schema documents themselves.
With the XML::Schema modules, I wrote a minimally conformant parser which
allows you to build schema driven parsers. The "minimally conformant"
part means that you have to build the parser using Perl code. "Fully
conformant" means that you can feed an XML Schema document in one end
and have it build the parser for you.
However, with the Perl toolkit, I can build a schema driven XML parser which
parsers XML Schema documents. Or in other words, I can use the minimally
conformant parser to bootstrap a fully conformant parser. Bingo!
Of course, the thing that makes this possible is that we've put various
sticks in the ground and said "We're using binary, ASCII, XML and XML
schema". Once we have those stilts in place, we can build a house on
top which rises out of the swamp.
I guess that's the bootstrapping problem. Once you reach a certain level,
your documents can becomes self-describing and in a rather contrived sense,
self-aware. But you can only get that far by making assumptions set in
stone about how you're going to build the foundations.
A