[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ::scr Ramblings of a Classic Refugee or How I Learned To Stop Worrying and Love OS X
On Wed, Feb 06, 2002 at 10:48:26AM +0000, Alaric Snell wrote:
> Unicode, not ASCII. Never forget that. An XML processor is a complex piece of
> software since it *must* operate at the Unicode level (even if it's just
> mapping some ASCII variant to Unicode) to be able to process valid XML
> documents, which may contain character references to Unicode, or be encoded
> in UTF-8.
Ooops! Yes, of course, you're absolutely right. No wait, you're _almost_
right. To be extremely pedantic, the XML spec defines:
A character is an atomic unit of text as specified by ISO/IEC 10646.
Quoting from http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html
ISO/IEC 10646 is a relatively new character set standard, published in
1993 by the International Organization for Standardization (ISO). Its
name is "Universal Multiple-Octet Coded Character Set". (UCS)
Unicode is a coded character set specified by a consortium of major
American computer manufacturers, primarily to overcome the chaos of
different coded character sets in use when creating multilingual programs
and internationalizing software. From version 1.1 on, Unicode is
scrupulously kept compatible with ISO/IEC 10646 and its extensions.
The consortium is also an important contributor to the ISO work to
further develop ISO/IEC 10646.
I never knew that until just now, and I'm not sure that my life has been
enhanced at all by it, but there you go... 10646 is the man :-)
A