|
View this white paper in Adobe PDF format. |
XML: Why it Matters Abstract What is XML? | Further Information about XML | The genealogy of XML | A Copernican revolution | The virtue of simplicity | Well-formed and Valid | How XML fits into the Internet world | Why XHTML? | Lux and XML What is XML? When the W3C calls XML "universal," they mean it. In the five years or so since its introduction, XML has enabled the beginning of a transformation of the Web from a means of transmitting documents to a general means of exchanging data. For the most part, XML is being used in three ways:
XML is not just a simple solution to a complex problem. It's a simple solution to several hundred complex problems. Among the many notable applications of XML:
People have used XML to structure everything from a dictionary of East Asian historical and literary terms to the virtual worlds of online adventure games. And when it comes to business-to-business (B2B) e-commerce systems, XML is king. XML is at the heart of Microsoft BizTalk, Rosetta Net, and the Open Trading Protocol, not to mention the Electronic Business XML Initiative, and the XML Common Business Library. Many standards are being developed in support of all this e-commerce activity. For example, the W3C is developing XML-based standards for security-related assertions, encryption, and digital signatures. These are currently only in draft form, but for the most part they are functioning as de facto standards. The genealogy of XML Over the next seventeen years, GML slowly evolved into ISO-standard SGML. Full-blown SGML is powerful, but it's cumbersome. SGML has the sort of flexibility that practically invites trouble: confronted with two good ways to do something, SGML almost inevitably supports both, providing little guidance to developers. In 1989, Tim Berners-Lee came up with the idea of the World Wide Web. In 1991 he went public with what is almost certainly the most important SGML application ever produced: HTML. The original HTML was also one of the simplest SGML application ever produced. This tiny, light-weight, narrowly focused markup language evolved through several versions in the 1990s and successfully provided a means to describe the contents of documents for the emergent World Wide Web. The general success of HTML inspired the W3C in a quest for a comparably simple, human-legible language to describe structured data, very "lightweight" and hence very suitable for use on the Web. One of HTML’s few shortcomings also provided a stimulated though: there was really no way to say that a particular HTML document was "correct" except to view it with lots of different browsers and see if they all could handle it successfully. As discussed below, XML introduces clear, testable notions of well-formed and valid documents. Between 1996 and 1998 so many individuals and companies contributed to the definition of XML that it is virtually impossible to say who "invented" it. (Credit is conventionally given to the SGML Editorial Board of the W3C.) Nonetheless, XML completely defied the conventional wisdom that when a committee sets out to design a horse, they end up with a camel. Years of collective experience led to a genuinely lightweight and highly general markup language. XML is a racehorse. A Copernican revolution Prior to XML, every time somebody needed a way to exchange data, they spent days making decisions that were ultimately beside the point: instead of working out the characteristics of the particular data in question, they either had to work out a new scheme for exchanging data in general or had to adopt one of over a dozen cumbersome existing approaches. Prior to XML, every small discrepancy between how two companies represent data was a crisis. Now, if both companies conform to XML, it should be reasonably straightforward to use XSLT (itself an XML application) to convert between formats. Prior to XML, data formats probably outnumbered the companies using them. The few formats that were widely used were almost vacuous of structural information: for example, the commonly used "comma-separated text" only means something to programs that can agree exactly what data would be in what position in a comma-separated list. The virtue of simplicity If you are familiar with HTML, XML will look familiar. (If you are not familiar with HTML, then you might want to skip forward to the section, "Why XHTML?") Just imagine an HTML-like language, where you could invent new tags and attributes to represent different types of information instead of just layout of Web pages. For example, the following would be a fragment of well-formed XML:
Well-formed and Valid
Each of these rules impacts XHTML, the XML-based successor to HTML 4.0, but the impacts are small and straightforward. For example:
In addition to the notion of a well-formed document, XML introduces the stricter concept of a valid document: a document that has been validated against one or more DTDs (Document Type Definitions) or XML Schemas. DTDs and XML Schemas are two different ways to define an XML application. For example, XHTML, MathML, XML Signature, etc., are each defined by a DTD or an XML Schema. (Creating a valid document requires declaring the document type at the start of the document, but we're not going to go into that here. This has always been good practice in HTML. In XHTML it is mandatory.) DTD is an older mechanism to define a document type, dating back to SGML. Very shortly after the invention of XML, Microsoft invented XML Schema, which has now been adopted as a W3C Recommendation in its own right. XML Schema is itself an XML application, so developers can use all of their usual XML tools to handle schemas. DTD and XML Schema are approximately equivalent in their power, and important XML applications often provide both a DTD and an XML Schema. Using XML Schemas or DTDs assures that all of the elements, attributes, etc. are appropriate to the type of document in question and that they have an appropriate nesting relationship to one another. For example, our hypothetical president element in the example above would not be part of a valid XHTML document, because XHTML deals with the content of documents, not the presidents of countries. If you tried to insert a president element into an XHTML document, the document would still be well-formed XML, but it would no longer be valid XHTML. Similarly, in our example above, a DTD or Schema could enforce a rule that every country must have a name attribute, while leaving the year attribute of mostpopularfood optional. (There are also ways to mix - and validate - elements and attributes from multiple DTDs and Schemas in the same document. While that is beyond the scope of this introductory article, it's a really neat feature, allowing common ways of achieving common goals. For example, any XML document that wants to display a mathematical expression can draw on MathML.) How XML fits into the Internet world Why? Well, XML is just a document type. Both HTML documents and XML documents simply consist of text. Within the file, the header that starts the content indicates the type of document. That can be HTML or XML. On the Web, HTML documents are delivered using HTTP. You can just as easily deliver an XML document with HTTP. This has a lot of interesting consequences:
Naturally, when you deliver an XML document, the system that receives it must be able to do something with that document. All contemporary Web browsers render XHTML appropriately. Netscape 7.1 provides native support for MathML, allowing appropriate rendering of complex mathematical expressions; so far it requires third-party support to do the same in Internet Explorer. However, unlike HTML, XML is not exclusively a system to deliver content for a Web browser. Many XML document types are not primarily intended to be read by a human. For example, B2B e-commerce systems use XML documents and either HTTP or other delivery protocols as a means to exchange data, request actions, etc. Contemporary Web browsers will display such a document in an appropriate format to allow you to examine it, but they cannot do much more with it: the document is intended for an e-commerce system, not a graphical browser. Why XHTML?
Lux and XML
We have worked with XML in a variety of environments, including Microsoft .NET. These projects have included use of both DOM (Document Object Model) and SAX (the Simple API for XML) and have involved extensive experience with XSLT, XPath, etc. Further Information About XML A good discussion of mixing multiple namespaces (and hence multiple DTDs/Schemas) can be found at http://www-106.ibm.com/developerworks/library/x-nmspace2.html. © 2004 The Lux Group, Inc. All rights reserved. This page is provided as a public service to the Web community. As with any copyrighted work, limited quotation with appropriate attribution for purposes of review is permitted. Links to this page are welcome. Explicit permission from Lux is required to otherwise publish, transmit, transfer or sell, reproduce, create derivative works from, or distribute this content, including by incorporating the content into any e-mail. If you wish to reproduce this content, please contact us for permission. Lux believes that basic information like this should be shared rather than hoarded. Naturally, an article like this only constitutes an introduction to a subject. We hope that if this article has been useful to you, you will consider Lux if you have need for expertise in this area. Lux1725 Westlake Ave N. Suite 105 Seattle, Washington 98109 phone 206 328 9898 · fax 206 328 9899 For permission to reproduce articles: info@lux-seattle.com New Business Inquiries: Todd Tibbetts toddt@lux-seattle.com Public Relations: Ben Thompson ben@lux-seattle.com Career Opportunities: jobs@lux-seattle.com |
||
| HOME | CLIENTS | ABOUT |SERVICES | NEWS | RESOURCES | CONTACT | SITE MAP | © 2004 The Lux Group, Inc. |