I love language. I love to read, of course. And I'll read virtually anything, assuming it qualifies as what the late, great Robert A. Heinlein characterized as "words in a line": newspapers, novels, flyers, cereal boxes..you name it. Sometimes I'll even read computer industry trade publications--if there's nothing else available. But I'm talking here specifically about language on a more atomic level--language as grammar, as vocabulary, as etymology and as constituents of expression. I first got turned on to language in that sense while I was still in first grade. That was when I began systematically working my way through my parents' Webster's New Collegiate Dictionary--along with their freshly-purchased Funk & Wagnalls Encyclopedia and the entire set of Tom Swift, Jr. adventures. Reading the encyclopedia gave me a..well..encyclopedic overview of human knowledge. The Tom Swift, Jr. books--and later that year, the works of Heinlein, Asimov, Clarke and other greats of science fiction--helped me appreciate the possibilities of technological progress, the importance of taking the long view and the inevitability of change. And the dictionary helped me understand gestalt--the way in which the components of meaning fit together to form a whole that is greater than the sum of its parts. It was fun. Naturally, making my way through Webster was a slow process, but it sped up toward the end. After all, the last three letters, X, Y and Z, take up less than a dozen pages between them--and X has far fewer entries than does any other letter. The letter X itself has an impressive variety of meanings. For inorganic chemists, it stands for reactance, while for organic ones it represents a halogen. Classicists know it as the Roman numeral for "10", while mathematicians recognize it as the first variable in an equation. Depending on its context, X can warn film buffs of "adult" content, stand in for a signature or a vote, cancel a clause or designate an experimental creation. It's in the experimental sense that computer scientists historically have employed the letter X. You might say that X has become the Internet standard for identifying non-standardness. Want to add a custom header to your email messages? Prefix it with an X and most mail transfer agents will ignore it. Want to use an internally-developed application or file type on your intranet? Declare a MIME X-type and a helper application and away you go. In the past year or so, the folks at the World Wide Web Consortium have contributed yet another definition for X via their efforts to craft an eXtensible Markup Language, or XML. Like sex in junior high school, XML is one of those things that everyone talks about--and with which practically no one has any real-world experience. For most people, it's X, the unknown. That's a shame, because, if folks understood what XML is really all about, they'd realize that--unlike sex--it's not as big a deal as it's made out to be. In fact, it may not even matter at all. Looking for the Next Best Thing So, what the heck is XML, anyway? In a nutshell, it's a metalanguage--a language of languages. Its purpose is to enable the creation of discipline-specific markup languages, such as the 1997 proof-of-concept Chemical Markup Language. Why? The XML initiative came about for two reasons: 1. Native HTML isn't powerful or flexible enough for the tastes of layout Nazis or programmers--and it's been subject to serious code balkanization as Microsoft, Netscape and the W3C engage in a three-cornered struggle over extending its capabilities. 2. The parent of both HTML and XML--Standard Generalized Markup Language, aka ISO 8879--is so complex and overburdened with obscure, seldom-used functions that it's completely impractical for commercial or consumer use. Not to mention that it has its own set of dialects, implementations and offshoots that keep it from achieving the status of a unified practical standard. And, since it's an ISO standard, it takes the equivalent of an act of God to change it. So, the gnomes of the W3C decided that what was needed was a "lite" version of SGML--something that had most of its power without all the outdated and cumbersome baggage that SGML hauls around with it like the linguistic equivalent of the White Knight. Unlike HTML--which isn't really a language, so much as it is a set of rendering instructions--XML would permit interested parties to create whole new markup languages, while providing a single standard for parsing those new languages. In intent, at least, XML aims to be--like Adobe's Postscript--a comprehensive language of descriptors for rendering content with enormous precision and flexibility. It's intended to provide great freedom to implementors, who will have the responsibility of creating browsers that will display the theoretically infinite variety of new tags and elements--perhaps even incorporating vectorized graphics descriptors, so that images can be folded into XML documents themselves, rather than being separate files, as they are in the HTML universe. As it stands, though, there exists precisely one XML browser--Jumbo, a Java browser written by Peter Murray-Rust. It exists solely to display examples of the Chemical Markup Language--another of Murray-Rust's creations. And Jumbo can't display other markup languages written in XML, because it doesn't know how. And there's the rub. Across the Great Divide Theoretically, every modern HTML document begins with a Document Type Definition that tells browsers what types of tags it will contain. In practice, none of the major browsers require it and none of them really uses it. Instead, they each support the tags that they support and ignore any others, regardless of any DTD. As anyone who's tried to use Netscape's Navigator to view a page that uses Microsoft's proprietary <MARQUEE> tag knows, there just ain't no such thing as "standard" HTML. And the efforts of the W3C to impose standards on the Web have consistently fallen short, mostly because the major browser vendors are locked in a death struggle for hearts, minds and websites--and proprietary tags are the weapons they use. That's not going to change any time soon. XML--and its bastard child, XHTML--will provide even greater opportunities for browser makers to distinguish their products from one another by the simple expedient of supporting certain markup languages ("modules" in XHTML parlance) and not others. Far from creating a homogenous markup language standard, XML/XHTML will open the door to even greater incompatibilities between browsers. I don't know about you, but Microsoft's determined attempts to make its technical support resources inaccessible to users of Netscape's browser tees me off. That kind of dog-in-the-manger spitefulness in the name of competition amounts to sheer bullying. XML will make it worse. Already, the W3C's proposed XHTML 1.0 specification--like the HTML 4.0 spec on which it is based--has trifurcated into three subspecies: Transitional, Strict and Frameset. And the extended hyperlinking capabilities that were to have been a central feature of XML--multidirectional links, links to multiple destinations, links to annotations and links with jazzy new behaviors, among others--have been pulled out of XML proper and given a separate life of their own. If you ask me, there are plenty of Dark Riders with Frodo and the One Ring nowhere in sight. Not to say that the XML initiative is wholly bad. For one thing, its focus on what it calls "well-formed" documents is a good thing. Far too many current HTML editors produce terrible code--and the WYSIWYG editors are, by and large, the worst offenders. They improperly nest tags, play cutesy games with non-breaking spaces, needlessly proliferate font tags and generally ignore the rules of HTML whenever they find it convenient. Maybe--and it's a big maybe--XHTML will persuade the folks that make HTML editors to clean up their act. And maybe it won't. After all, Microsoft's FrontPage often generates HTML documents that lack a trailing </HTML> tag. I think the Redmond skunkworks put that little "feature" into their product on purpose--because they know that Netscape's browser collapses in a pitiful heap when it runs across the little tag that wasn't there. Mind you, the <HTML></HTML> tag pair is the very essence of web pages. It, <HEAD></HEAD> and <BODY></BODY> together comprise the most minimal HTML document you can write. If Microsoft can cavalierly ignore even the minimums--because Explorer can tolerate the omission and Navigator can't--is there a reason to believe that it will pay any closer attention to the W3C's dictates that XHTML--much less XML--documents be "well-formed"? As long as there's a competitive advantage to be gained, I wouldn't bet on it. In fact, I'd bet the other way. But, what the heck, XHTML is evolution in action and there's no predicting the outcome--except that, as the late Jim Morrison put it, it'll be, "slow, driven, mad and sweet--like some new language." (Copyright© 1999 by Thom Stark--all rights reserved) |