Saturday, May 10, 2008

Transforming XML with Tom

In this post I'm going to show some of the features that the Tom pattern matching compiler provide for Xml manipulation.

Xml support

Tom provides support for Xml literals to perform tree creation or pattern matching on a existing tree.

The Manipulating Xml documents section of the documentation provide a nice presentation on this feature. Also nice examples come with Tom distribution.

The example

In order to illustrate Tom Xml capabilities an existing simple XSLT sheet will be converted to a equivalent Tom program.

The following simple XSLT sheet takes an RSS feed and converts it into an HTML file with a table that contains the headlines and the categories listed in the RSS file.


<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/xhtml1/strict">

<xsl:template match="/">
<html>
<head>
<title>Headlines</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>

<xsl:template match="channel">
<table style="border-style=solid">
<tr>
<th>Headline</th>
<th>Categories</th>
</tr>

<xsl:for-each select="item">
<tr>
<td>
<xsl:value-of select="title/text()"/>
</td>
<td>
<ol>
<xsl:apply-templates select=
class="srctext">"category"
/>
</ol>
</td>
</tr>
</xsl:for-each>
</table>
</xsl:template >

<xsl:template match="category">
<li>
<xsl:value-of select="./text()"/>
</li>
</xsl:template>
<xsl:template match="text()">

</xsl:template>
</xsl:stylesheet>


Loading the Xml

The first step is to load the Xml file. The following code shows the main method that calls the load, transform and print methods for the input file.


%include{ adt/tnode/TNode.tom }

static XmlTools xtools = new XmlTools();

static TNode loadOpmlDocument(String filename) {
return (TNode)xtools.convertXMLToTNode(filename);
}

public static void main(String[] args) {
TNode opmlDocument =
loadOpmlDocument("anrss.rss");

TNode docElem = opmlDocument.getDocElem();
TNode transformedHtml = transform(docElem);

xtools.printXMLFromTNode(transformedHtml);
}


HTML document generation

The transform method is equivalent to the XSLT template that matches the root element. It only generates the HTML document declaration.


static TNode transform(TNode docElem) {
return `xml(
<html>
<head>
<title>#TEXT("Headlines")</title>
</head>
<body>
transformBody(docElem);
</body>
</html>);
}


Note here the use of the xml(...) construct . When using this construct literal Xml could be specified. Also note that a backquote(`) character is used meaning that we're using Tom-specific syntax.

Table generation

The transformBody is equivalent to the XSLT template that matches the channel element. It creates the table with its headers.


static TNode transformBody(TNode docElem) {
%match(docElem) {
<rss>
channel@<channel>
_*
</channel>
</rss> ->
{TNodeList itemRows = transformItems(`channel);
return
`xml(<table style="border-style=solid">
<tr>
<th>#TEXT("Headline")</th>
<th>#TEXT("Categories")</th>
</tr>
itemRows*
</table> );}
}
return `xml(#TEXT(""));
}


Note that the %match construct has Xml code in the pattern section.

Also note that here we call transformItems to generate a node list(itemRows) that will be all the rows of the table. Also note that the itemRows variable is expanded in the middle of the table.

Transforming items

The transformItems process every item and generates a list of HTML table rows.


static TNodeList transformItems(TNode channel) {
TNodeList result = EmptyconcTNode.make();

%match (channel){
<channel>
item@<item>
<title>theTitle</title>
</item>
</channel> -> {
result =
ConsconcTNode.make(
`xml(<tr>
<td>
theTitle
</td>
<td>
categories(item)
</td>
</tr>),
result);
}
}
return result.reverse();
}


Here the multiple results generated by the %match construct is used to fill the list with all the rows.

Something that might be confusing is that the %match pattern seems to be looking for a single item with title as its only child (because of the lack of _* constructs). This is something specific to the Xml literal syntax, as documentation says implicit _* constructs are added between Xml literals.

Mapping categories

The categories method maps each category and is equivalent XSLT template that matches a category.


private static TNode categories(TNode item){
TNodeList listItems = EmptyconcTNode.make();

%match(item) {
<item>
<category>
category
</category>
</item> -> {
String value = getTextFromCategory(`category);
listItems =
ConsconcTNode.make(
`xml(<li>#TEXT(value)</li>),
listItems);
}
}
listItems = listItems.reverse();
return `xml(<ol>listItems*</ol>);
}


Final words

Although I'm not a big fan of Xml literals, it is nice way to create a new tree compared to using W3C DOM classes. Xml literals are used in several languages today such as Scala or Visual Basic 9. A nice alternative is Groovy Builders which provide a nice syntax to create tree structures that is independent of the backend .

One of the things that was missing(at least from the documentation) was direct support for Xml namespaces which is useful when working with multiple Xml Schemas from different sources.