Tuesday, May 20, 2008

Processing Xml in Snobol4

In this post a quick way to process XML files using a SAX-like method in SNOBOL4 is presented.

I couldn't find a tool for XML parsing/processing for SNOBOL4 so I decided to try to create one to learn more about the language. I decided to use SAX method because is simpler to implement and lets me focus in the text processing part of the code .

Since SNOBOL4 works consumes the input line by line, a function to flatten all the input was created. This helps by eliminating the problem of considering line breaks, but it also makes the code very inefficient since it creates a big line with all the code in in the XML file.


Define('ReadAll()content,tcontent') :(RA_END)
ReadAll
content = ''
RA_LOOP
tcontent = INPUT : F(ERA_LOOP)
content = content tcontent : (RA_LOOP)
ERA_LOOP
ReadAll = content :(RETURN)
RA_END



Having this problem solved, the Xml processing function looks as follows:


Define('ReadXml(inputStr,iPos,fTStart,fTEnd,fText)iPos,fPos,name,closing,attsString,text') :(RX_END)
ReadXml
&anchor = 0
Init
XmlDirectiveL
inputStr POS(iPos) '<?' ARB '?>' @fPos :F(TagStartL)
iPos = fPos :(Init)

TagStartL
inputStr POS(iPos) '<' SPAN(TagChar) $ name ARB $ attsString ('/>' | '>') $ closing @fPos :F(EndTagL)
attsTable = ReadAttributes(attsString)
iPos = fPos
APPLY(fTStart,name,attsTable) :(Init)

EndTagL
inputStr POS(iPos) '</' SPAN(TagChar) $ name '>' @fPos :F(BlanksL)
iPos = fPos
APPLY(fTEnd,name) :(Init)

BlanksL
inputStr POS(iPos) SPAN(Blank) @fPos :F(TextL)
iPos = fPos :(Init)
TextL
inputStr POS(iPos) BREAK('<') $ text @fPos :F(RXXS_END)
iPos = fPos
APPLY(fText,text) :(Init)

RXXS_END

:(RETURN)
RX_END


This code keeps track of the position in the string where the last XML element structure matched by using the iPos variable. The '@' symbol followed by a variable records the position in the input string at a given moment.

Each part of this function marked by the labels XmlDirectiveL, TagStartL, EndTagL, BlanksL, TextL matches one XML element and calls a callback function specified by the fTStart, fTEnd and fText parameters. The call is made by using the APPLY function.

The contents of the ReadAttributes function is the following.


TagChar = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ:-'
AttNameChar = TagChar
Blank = " "

Define("ReadAttributes(text)result,attsText,iPOS,fPOS,name,value") :(ratts)
ReadAttributes
result = table()
attsText = trim(text)
iPOS = 0
ratts_loop
attsText POS(iPOS) ARBNO(' ') SPAN(AttNameChar) $ name ARBNO(' ') '=' ARBNO(' ') '"' BREAK('"') $ value '"' @fPOS :F(ratts_loop_end)
result = value
iPOS = fPOS :(ratts_loop)
ratts_loop_end
ReadAttributes = result :(Return)
ratts



An example of the use of these functions is the following:


-include "xmlp.sno"

Define('MyTSFunc(name,attributesTable)') :(MTS_END)
MyTSFunc
OUTPUT = "Into " name
OUTPUT = "id=" attributesTable["id"] :(RETURN)
MTS_END

Define('MyTEFunc(name)') :(MTE_END)
MyTEFunc
OUTPUT = "Out of " name :(RETURN)
MTE_END

Define('MyTTFunc(text)') :(MTT_END)
MyTTFunc
OUTPUT = "Text: " text :(RETURN)
MTT_END

OUTPUT = "XML Test"
Txt = ReadAll()
ReadXml(Txt,0,.MyTSFunc,.MyTEFunc,.MyTTFunc)
END



Given the following input:

<uno>
<dos id="3">
asdf
<tres id="4">
h hh
</tres>
<cuatro>
iasdl
</cuatro>
<cinco id="42"/>
</dos>
</uno>


The program generates:



XML Test
Into uno
id=
Into dos
id=3
Text: asdf
Into tres
id=4
Text: h hh
Out of tres
Into cuatro
id=
Text: iasdl
Out of cuatro
Into cinco
id=42
Out of dos
Out of uno


The benefit of using a SAX-like approach is that the code could be reused for other programs. For example the following program prints all the links and the titles from an OPML file from Google Reader.


-include "xmlp.sno"


Define('TagVisitHandler(name,attributesTable)theUrl,title') :(TVH_END)
TagVisitHandler
name "outline" :F(Return)
title = attributesTable["text"]
theUrl = attributesTable["htmlUrl"]
ident(theUrl , '') :s(return)
OUTPUT = "Link for " title " : " theUrl :(RETURN)
TVH_END

Define('MiTEFunc(name)') :(MTE_END)
MiTEFunc :(RETURN)
MTE_END

Define('MiTTFunc(text)') :(MTT_END)
MiTTFunc :(RETURN)
MTT_END

Txt = ReadAll()
ReadXml(Txt,0,.TagVisitHandler,.MiTEFunc,.MiTTFunc)
END



Documentation from SNOBOL4.ORG was used as reference.

Saturday, May 10, 2008

Transforming XML with Tom

In this post I'm going to show some of the features that the Tom pattern matching compiler provide for Xml manipulation.

Xml support

Tom provides support for Xml literals to perform tree creation or pattern matching on a existing tree.

The Manipulating Xml documents section of the documentation provide a nice presentation on this feature. Also nice examples come with Tom distribution.

The example

In order to illustrate Tom Xml capabilities an existing simple XSLT sheet will be converted to a equivalent Tom program.

The following simple XSLT sheet takes an RSS feed and converts it into an HTML file with a table that contains the headlines and the categories listed in the RSS file.


<?xml version="1.0" encoding="utf-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/TR/xhtml1/strict">

<xsl:template match="/">
<html>
<head>
<title>Headlines</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>

<xsl:template match="channel">
<table style="border-style=solid">
<tr>
<th>Headline</th>
<th>Categories</th>
</tr>

<xsl:for-each select="item">
<tr>
<td>
<xsl:value-of select="title/text()"/>
</td>
<td>
<ol>
<xsl:apply-templates select=
class="srctext">"category"
/>
</ol>
</td>
</tr>
</xsl:for-each>
</table>
</xsl:template >

<xsl:template match="category">
<li>
<xsl:value-of select="./text()"/>
</li>
</xsl:template>
<xsl:template match="text()">

</xsl:template>
</xsl:stylesheet>


Loading the Xml

The first step is to load the Xml file. The following code shows the main method that calls the load, transform and print methods for the input file.


%include{ adt/tnode/TNode.tom }

static XmlTools xtools = new XmlTools();

static TNode loadOpmlDocument(String filename) {
return (TNode)xtools.convertXMLToTNode(filename);
}

public static void main(String[] args) {
TNode opmlDocument =
loadOpmlDocument("anrss.rss");

TNode docElem = opmlDocument.getDocElem();
TNode transformedHtml = transform(docElem);

xtools.printXMLFromTNode(transformedHtml);
}


HTML document generation

The transform method is equivalent to the XSLT template that matches the root element. It only generates the HTML document declaration.


static TNode transform(TNode docElem) {
return `xml(
<html>
<head>
<title>#TEXT("Headlines")</title>
</head>
<body>
transformBody(docElem);
</body>
</html>);
}


Note here the use of the xml(...) construct . When using this construct literal Xml could be specified. Also note that a backquote(`) character is used meaning that we're using Tom-specific syntax.

Table generation

The transformBody is equivalent to the XSLT template that matches the channel element. It creates the table with its headers.


static TNode transformBody(TNode docElem) {
%match(docElem) {
<rss>
channel@<channel>
_*
</channel>
</rss> ->
{TNodeList itemRows = transformItems(`channel);
return
`xml(<table style="border-style=solid">
<tr>
<th>#TEXT("Headline")</th>
<th>#TEXT("Categories")</th>
</tr>
itemRows*
</table> );}
}
return `xml(#TEXT(""));
}


Note that the %match construct has Xml code in the pattern section.

Also note that here we call transformItems to generate a node list(itemRows) that will be all the rows of the table. Also note that the itemRows variable is expanded in the middle of the table.

Transforming items

The transformItems process every item and generates a list of HTML table rows.


static TNodeList transformItems(TNode channel) {
TNodeList result = EmptyconcTNode.make();

%match (channel){
<channel>
item@<item>
<title>theTitle</title>
</item>
</channel> -> {
result =
ConsconcTNode.make(
`xml(<tr>
<td>
theTitle
</td>
<td>
categories(item)
</td>
</tr>),
result);
}
}
return result.reverse();
}


Here the multiple results generated by the %match construct is used to fill the list with all the rows.

Something that might be confusing is that the %match pattern seems to be looking for a single item with title as its only child (because of the lack of _* constructs). This is something specific to the Xml literal syntax, as documentation says implicit _* constructs are added between Xml literals.

Mapping categories

The categories method maps each category and is equivalent XSLT template that matches a category.


private static TNode categories(TNode item){
TNodeList listItems = EmptyconcTNode.make();

%match(item) {
<item>
<category>
category
</category>
</item> -> {
String value = getTextFromCategory(`category);
listItems =
ConsconcTNode.make(
`xml(<li>#TEXT(value)</li>),
listItems);
}
}
listItems = listItems.reverse();
return `xml(<ol>listItems*</ol>);
}


Final words

Although I'm not a big fan of Xml literals, it is nice way to create a new tree compared to using W3C DOM classes. Xml literals are used in several languages today such as Scala or Visual Basic 9. A nice alternative is Groovy Builders which provide a nice syntax to create tree structures that is independent of the backend .

One of the things that was missing(at least from the documentation) was direct support for Xml namespaces which is useful when working with multiple Xml Schemas from different sources.

Thursday, May 1, 2008

Mutiple representations of an object with Tom mappings

This post shows an example of how Tom object mappings could be used to provide multiple pattern matching representations of objects of the same class.

Introduction

As shown in previous posts, I really like F# Active Patterns and Scala Extractors. Among other things these features allows to create multiple pattern matching representations of objects . One of the ideas behind these concepts is "Views" presented by Philip Wadler in the Views: A way for pattern matching to cohabit with data abstraction paper.

It is possible to use Tom object mappings(explained here) to get a similar effect.

Example

For this example, an alternative representation for a Complex number will be created. A complex number could be represented by Cartesian coordinates (real and imaginary parts) and by Polar coordinates (angle and modulus).

This example is used to present Views, Extractors and Active Patterns.

The Apache Commons Complex class will be used as the complex number implementation.

First, a mapping for the Complex sort needs to be created.


%include { double.tom }

%typeterm Complex {
implement { org.apache.commons.math.complex.Complex }
is_sort(t) { t instanceof org.apache.commons.math.complex.Complex }
equals(t1,t2) { t1.equals(t2) }
}



Then, a mapping for the Complex number with Cartesian coordinates is created. Since the Complex class is in Cartesian coordinates only the getReal and getImaginary accessors are required.


%op Complex Complex(real:double,img :double ) {
is_fsym(t) { t instanceof org.apache.commons.math.complex.Complex }
get_slot(real, t) { t.getReal() }
get_slot(img, t) { t.getImaginary() }
make(real,img) { new org.apache.commons.math.complex.Complex(real,img) }
}


Finally a mapping for the Polar representation. Since the Complex class stores the number in Cartesian coordinates a conversion must be applied to get the angle and modulus slots. Also the polar2Complex method is used to create a Complex instance using the Polar symbol.


%op Complex Polar(m:double,a :double ) {
is_fsym(t) { t instanceof org.apache.commons.math.complex.Complex }

get_slot(a, t) { Math.atan2(t.getImaginary(), t.getReal()) }

get_slot(m, t) {
Math.sqrt(t.getReal() * t.getReal() +
t.getImaginary() * t.getImaginary()) }

make(radial,modulus) {
org.apache.commons.math.complex.ComplexUtils.polar2Complex(
radial,
modulus) }
}



A use of these mappings is the following:


Complex c = new Complex(3,3);

%match(c) {
Polar(m,a) -> {
System.out.println(
String.format("%f,%f",`m,`a ));
}
}


Also given that a make declaration was added to the mappings we can use the object creation syntax as follows:


Complex c2 = `Polar(3,Math.PI/2.0);

%match ( c2){
Complex(r,i) -> {
System.out.println(
String.format("%f,%f",`r,`i));
}
}