Presented by Ola Nordstrom on June 11, 2003
Table of Contents
- 1. Introduction
- 2. XML vs. HTML
- 3. Example 1
- 4. Converting XML to HTML
- 5. That was very poor XSL
- 5.1. The XSLT Parser
- 6. Good XSL
- 7. Conclusion
1. Introduction
XML is the acronym given to the eXtensible Markup Language. Along with XML
comes a slew of other acronyms; to name a few XSL, XSLT, XSL-FO, XPath, DTD,
CSS, DOM, SOAP, etc… (yes I believe there is an O’Reilly book for each of
them). Thus when reading articles talking about XML this and that, the XML
handicapped reader quickly becomes lost in acronyms. This guide is written to
give the user a simple introduction to what XML is and how it can be used to
publish stuff online. So forget what you think you know about the acronyms
listed above, they were created to confuse you. The only way to learn
something is do it, starting of with a simple example.
2. XML vs. HTML
Conceptually you can think of XML as HTML. They look similar, and they way you
end up writing XML documents is much like writing HTML, everything is enclosed
in tags. The reason to use XML is that your are not burdened by the look of
the document, that is content is separated from style, this is the main
strength of XML.
There are however a few differences.
- XML is case sensitive
- XML tags must be properly nested and closed
- attributes must be wrapped in quotes ("")
- all XML must have a root tag (more on this later)
3. Example 1
Lets say I read alot of papers and need to publish summaries of them on the
web, I just want to write summaries and have them look nice. I want the option
of being able to change the look of everything later on. This is where XML can
be used in conjunction with XSLT.
The first line says that this is an XML document, the encoding says that we
will be using 8 bit encoding, this will enable us to use the full 8 bit ascii
character set in our XML file. Another common encoding is ISO-8859-1.
You can offcourse add multiple book entries in the file.
The following line specifies the stylesheet that is to be used to give meaning
to the XML tags.
The next line define the root tags, called catalog, this name is
arbitrary. The summary tag is where we keep the actualy information
(text) everything else is XML cruft.
Here is the style sheet; it tells an XML aware reader how to render the XML as HTML.
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
The astute reader will notice that the XSL file is also and XML file.
So to read this you could fire up your XML aware browser (mozilla or a
derivate of) and open up the first piece of code. Make sure the
summaries.xsl file was located in the same directory.
Reading through the stylesheet it simply matches the closing XML tags. Then
spits out the HTML tags and then for each book in the catalog prints the value
of the author and summary tags. In our case D. Cantrell and This
book was no good.
4. Converting XML to HTML
Not all browsers can read XML files and know how to suck in the associate XSL
file. So to keep those browsers happy we can generate the HTML file from our
source XML. There are many tools to do this; written in Java, C++, C and so on
that work on a variety of platforms. One such tool is xsltproc which
is part of libxslt. It is very easy to use.
xsltproc -o summaries.html summaries.xml summaries.xsl
The command will generate summaries.html from the corresponding xml and
xsl files.
5. That was very poor XSL
The previous example was a simple XML document with an oversimplified
stylesheet. The only good thing about the stylesheet is that it is easy to
read. Infact you should never write stylesheets like that. To explain why I
must digress into some of the messy acronyms (not really just what they do).
5.1. The XSLT Parser
To render or convert XML documents into another format they must be parsed
(duh!). To do this the parser must generate a tree starting with the root
element and trickling down to all the elements that make up the document. XSL
then matches these elements inorder to figure what they mean and in our case
give meaning to the tags by printing our corresponding html tags.
This has all the smelly characteristics of recursion.
To be able to nest tags within one another such as:
The XSL parser must also be able to recurse and match tags within tags and so
on. The XSL in Example 1 does not allow for this. Instead it performs a
for loop printing out the author and summary. What if you wanted to
add an bold tag to make a word boldface? This is not possible without
completely rewriting the XSL.
As you can see for each catalog print select the author and summary and
wrap them inside table data html tags. Thus there is no recursion and we have
chastised the XSL parser, preventing us from using its full potential.
6. Good XSL
A better way to write XSL stylesheets is to use template matching. This simply
matches one tag and the tells the parser to continue (and match more tags).
First a longer XML example.
As you can see we have expaned on the previous example providing more more
tags which are hopefully self explanatory. Which is another goal of XML, to
have tags that make sense.
The XSL for this looks like this.
The first cluster of XSL stuff matches the end of the document and wraps
everything in open and close HTML tags. The the special XSL template is
applied that the individual tags are matched.
What is important is that the xsl:template just matches a tag. To get
the information out one simply selects it.
The templates is what allows us to have multiple authors, paragraphs etc.
7. Conclusion
This tutorial has hopefully given you a good enough introduction to start
using XML. A natural place for XML to be used is offcourse the Web hence XSL
was also presented. XSL is very important since it gives meaning to the XML
tags. Perhaps this should have been called an intro XSL, but nobody who does
not already know XML will not know XSL and thus not be caught by the "Buzz
Word" that is XML.
XML is becoming more and more pervasive, XML is being used within databases,
word processors, to do RPC/RMI, etc.