Navigation:

Search



Our Friends

Articles Introduction to XML
 

Introduction to XML

A gentle introduction to XML.

This was written by Ola Nordstrom and given on Wed Jun 11 2003.

Table of Contents


1. Introduction

XML is the acronym given to the eXtensible Markup Language. Along with XML comes a slew of other acronyms; to name a few XSL, XSLT, XSL-FO, XPath, DTD, CSS, DOM, SOAP, etc... (yes I believe there is an O'Reilly book for each of them). Thus when reading articles talking about XML this and that, the XML handicapped reader quickly becomes lost in acronyms. This guide is written to give the user a simple introduction to what XML is and how it can be used to publish stuff online. So forget what you think you know about the acronyms listed above, they were created to confuse you. The only way to learn something is do it, starting of with a simple example.

2. XML vs. HTML

Conceptually you can think of XML as HTML. They look similar, and they way you end up writing XML documents is much like writing HTML, everything is enclosed in tags. The reason to use XML is that your are not burdened by the look of the document, that is content is separated from style, this is the main strength of XML.

There are however a few differences.

  • XML is case sensitive
  • XML tags must be properly nested and closed
  • attributes must be wrapped in quotes ("")
  • all XML must have a root tag (more on this later)
3. Example 1

Lets say I read alot of papers and need to publish summaries of them on the web, I just want to write summaries and have them look nice. I want the option of being able to change the look of everything later on. This is where XML can be used in conjunction with XSLT.

> ?xml version="1.0" encoding="utf-8"? <
> ?xml-stylesheet type="text/xsl" href="summaries.xsl"? <

> catalog <

  > book <
    > author <D. Cantrell> /author <
    > summary <This book was no good.> /summary <
  > /book <

> /catalog <

The first line says that this is an XML document, the encoding says that we will be using 8 bit encoding, this will enable us to use the full 8 bit ascii character set in our XML file. Another common encoding is ISO-8859-1 . You can offcourse add multiple book entries in the file.

The following line specifies the stylesheet that is to be used to give meaning to the XML tags.

The next line define the root tags, called catalog , this name is arbitrary. The summary tag is where we keep the actualy information (text) everything else is XML cruft.

Here is the style sheet; it tells an XML aware reader how to render the XML as HTML.

> ?xml version="1.0" encoding="utf-8"? <
> xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" <
> xsl:template match="/" <
  > html <
  > body <
  > h2 <My Summaries> /h2 <
     > h3 <
      > xsl:for-each select="catalog/book" <
      > tr <
        > td <> xsl:value-of select="author"/ <> /td <
        > td <> xsl:value-of select="summary"/ <> /td <
      > /tr <
      > /xsl:for-each <
     > /h3 <
  > /body <
  > /html <
> /xsl:template <
> /xsl:stylesheet <

The astute reader will notice that the XSL file is also and XML file.

So to read this you could fire up your XML aware browser (mozilla or a derivate of) and open up the first piece of code. Make sure the summaries.xsl file was located in the same directory.

Reading through the stylesheet it simply matches the closing XML tags. Then spits out the HTML tags and then for each book in the catalog prints the value of the author and summary tags. In our case D. Cantrell and This book was no good .

4. Converting XML to HTML

Not all browsers can read XML files and know how to suck in the associate XSL file. So to keep those browsers happy we can generate the HTML file from our source XML. There are many tools to do this; written in Java, C++, C and so on that work on a variety of platforms. One such tool is xsltproc which is part of libxslt. It is very easy to use.

xsltproc -o summaries.html summaries.xml summaries.xsl

The command will generate summaries.html from the corresponding xml and xsl files.

5. That was very poor XSL

The previous example was a simple XML document with an oversimplified stylesheet. The only good thing about the stylesheet is that it is easy to read. Infact you should never write stylesheets like that. To explain why I must digress into some of the messy acronyms (not really just what they do).

5.1. The XSLT Parser

To render or convert XML documents into another format they must be parsed (duh!). To do this the parser must generate a tree starting with the root element and trickling down to all the elements that make up the document. XSL then matches these elements inorder to figure what they mean and in our case give meaning to the tags by printing our corresponding html tags.

This has all the smelly characteristics of recursion.

To be able to nest tags within one another such as:

> book <
  > author <
    > chapter <
    ...

The XSL parser must also be able to recurse and match tags within tags and so on. The XSL in Example 1 does not allow for this. Instead it performs a for loop printing out the author and summary. What if you wanted to add an bold tag to make a word boldface? This is not possible without completely rewriting the XSL.

> xsl:for-each select="catalog/book" <
> tr <
  > td <> xsl:value-of select="author"/ <> /td <
  > td <> xsl:value-of select="summary"/ <> /td <
> /tr <
> /xsl:for-each <

As you can see for each catalog print select the author and summary and wrap them inside table data html tags. Thus there is no recursion and we have chastised the XSL parser, preventing us from using its full potential.

6. Good XSL

A better way to write XSL stylesheets is to use template matching. This simply matches one tag and the tells the parser to continue (and match more tags).

First a longer XML example.

> ?xml version="1.0" encoding="utf-8"? <
> ?xml-stylesheet type="text/xsl" href="summaries.xsl"? <

> catalog <
> !-- I am Joe Comment -- <

> summary <
  > title <Route Oscillations in I-BGP with Route Reflections> /title <
  > published <ACM SIGCOMM 2002, Pittsburgh, PA, August 19-23, 2002.> /published <
  > author <
    > name <Anindya Basu> /name <
    > homepage <http://www.research.att.com/~griffin/> /homepage <
  > /author <
  > author <
    > name <Chih-Hao Luke Ong> /name <
    > homepage <http://web.comlab.ox.ac.uk/oucl/work/luke.ong/
    > /homepage <
  > /author <
  
  > localcopy <sigcomm2002.2.ps> /localcopy <
  > papersummary <
  > p <This paper also analyzes the behavior of route
  oscillations due to anomalies in I-BGP behavior. Again the
  "correctness of IBGP" is in question.  The authors define route
  oscillations as "persistent route oscillation" and "transient
  route oscillations". The former is when routers exchange
  UPDATEs without ever settling on a stable path. The latter case
  is when routers undergo route oscillations due to timing
  situations.> /p <

  > p <Bla Bla ...> /p <
  > /papersummary <
> /summary <

> summary <
        > !-- Another summary should go in here -- <
> /summmary <

> /catalog <

As you can see we have expaned on the previous example providing more more tags which are hopefully self explanatory. Which is another goal of XML, to have tags that make sense.

The XSL for this looks like this.

> ?xml version="1.0" encoding="utf-8"? <
> xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" <

> xsl:template match="/" <                                      > !--root rule-- <
  > html <> head <> title <Paper Summaries> /title <> /head <
    > body bgcolor="white" <
      > xsl:apply-templates select="/catalog"/ <
    > /body <
  > /html <
> /xsl:template <

> xsl:template match="catalog" <
  > xsl:apply-templates select="summary"/ <
  > hr/ <
  Last Modified: Sun Jan 19 22:27:14 EST 2003
> /xsl:template <

> xsl:template match="summary" <   > !--processing for each record-- <
  > table border="0" cellpadding="0" cellspacing="2" <
    > xsl:apply-templates select="title"/ <
    > xsl:apply-templates select="published"/ <
    > xsl:apply-templates select="author"/ <
    > tr <
      > td <Local Paper Copy:> /td <
      > xsl:apply-templates select="localcopy"/ <
    > /tr <
    > tr <
      > td colspan="3" <
        > !--> xsl:value-of select="papersummary"/ <-- <
        > xsl:apply-templates select="papersummary"/ <
      > /td <
    > /tr <
  > /table <
  > br/ <
> /xsl:template <

> xsl:template match="author" <   > !--this is often recursed since many authors-- <
  > tr <
    > xsl:apply-templates select="name"/ <
    > xsl:apply-templates select="homepage"/ <
  > /tr <
> /xsl:template <

> xsl:template match="name" <
  > td <Author:> /td <> td <> xsl:value-of select="."/ <> /td <
> /xsl:template <

> xsl:template match="homepage" <
  > !-- print the link instead> xsl:value-of select="."/ <-- <
  > td <> a href="{.}" <> xsl:value-of select="."/ <> /a <> /td <
> /xsl:template <

> xsl:template match="title" <
  > tr bgcolor="#a8caff" <> td colspan="3" <> i <> xsl:value-of select="."/ <> /i <> /td <> /tr <
> /xsl:template <

> xsl:template match="published" <
  > tr <> td colspan="3" <Published:> xsl:value-of select="."/ <> /td <> /tr <
> /xsl:template <

> xsl:template match="localcopy" <
  > td <> a href="{.}" <> xsl:value-of select="."/ <> /a <> /td <
> /xsl:template <

> xsl:template match="papersummary" <
    > xsl:apply-templates select="p"/ <
> /xsl:template <

> xsl:template match="p" <
    > p <> xsl:value-of select="."/ <> /p <
> /xsl:template <

> /xsl:stylesheet <

The first cluster of XSL stuff matches the end of the document and wraps everything in open and close HTML tags. The the special XSL template is applied that the individual tags are matched.

What is important is that the xsl:template just matches a tag. To get the information out one simply selects it.

> xsl:value-of select="."/ <

The templates is what allows us to have multiple authors, paragraphs etc.

7. Conclusion

This tutorial has hopefully given you a good enough introduction to start using XML. A natural place for XML to be used is offcourse the Web hence XSL was also presented. XSL is very important since it gives meaning to the XML tags. Perhaps this should have been called an intro XSL, but nobody who does not already know XML will not know XSL and thus not be caught by the "Buzz Word" that is XML.

XML is becoming more and more pervasive, XML is being used within databases, word processors, to do RPC/RMI, etc.


This article has external documents! Click here.