I couldn't find a tool for XML parsing/processing for SNOBOL4 so I decided to try to create one to learn more about the language. I decided to use SAX method because is simpler to implement and lets me focus in the text processing part of the code .
Since SNOBOL4 works consumes the input line by line, a function to flatten all the input was created. This helps by eliminating the problem of considering line breaks, but it also makes the code very inefficient since it creates a big line with all the code in in the XML file.
Define('ReadAll()content,tcontent') :(RA_END)
ReadAll
content = ''
RA_LOOP
tcontent = INPUT : F(ERA_LOOP)
content = content tcontent : (RA_LOOP)
ERA_LOOP
ReadAll = content :(RETURN)
RA_END
Having this problem solved, the Xml processing function looks as follows:
Define('ReadXml(inputStr,iPos,fTStart,fTEnd,fText)iPos,fPos,name,closing,attsString,text') :(RX_END)
ReadXml
&anchor = 0
Init
XmlDirectiveL
inputStr POS(iPos) '<?' ARB '?>' @fPos :F(TagStartL)
iPos = fPos :(Init)
TagStartL
inputStr POS(iPos) '<' SPAN(TagChar) $ name ARB $ attsString ('/>' | '>') $ closing @fPos :F(EndTagL)
attsTable = ReadAttributes(attsString)
iPos = fPos
APPLY(fTStart,name,attsTable) :(Init)
EndTagL
inputStr POS(iPos) '</' SPAN(TagChar) $ name '>' @fPos :F(BlanksL)
iPos = fPos
APPLY(fTEnd,name) :(Init)
BlanksL
inputStr POS(iPos) SPAN(Blank) @fPos :F(TextL)
iPos = fPos :(Init)
TextL
inputStr POS(iPos) BREAK('<') $ text @fPos :F(RXXS_END)
iPos = fPos
APPLY(fText,text) :(Init)
RXXS_END
:(RETURN)
RX_END
This code keeps track of the position in the string where the last XML element structure matched by using the
iPos
variable. The '@' symbol followed by a variable records the position in the input string at a given moment. Each part of this function marked by the labels
XmlDirectiveL, TagStartL, EndTagL, BlanksL, TextL
matches one XML element and calls a callback function specified by the fTStart, fTEnd
and fText
parameters. The call is made by using the APPLY function.The contents of the
ReadAttributes
function is the following.
TagChar = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ:-'
AttNameChar = TagChar
Blank = " "
Define("ReadAttributes(text)result,attsText,iPOS,fPOS,name,value") :(ratts)
ReadAttributes
result = table()
attsText = trim(text)
iPOS = 0
ratts_loop
attsText POS(iPOS) ARBNO(' ') SPAN(AttNameChar) $ name ARBNO(' ') '=' ARBNO(' ') '"' BREAK('"') $ value '"' @fPOS :F(ratts_loop_end)
result = value
iPOS = fPOS :(ratts_loop)
ratts_loop_end
ReadAttributes = result :(Return)
ratts
An example of the use of these functions is the following:
-include "xmlp.sno"
Define('MyTSFunc(name,attributesTable)') :(MTS_END)
MyTSFunc
OUTPUT = "Into " name
OUTPUT = "id=" attributesTable["id"] :(RETURN)
MTS_END
Define('MyTEFunc(name)') :(MTE_END)
MyTEFunc
OUTPUT = "Out of " name :(RETURN)
MTE_END
Define('MyTTFunc(text)') :(MTT_END)
MyTTFunc
OUTPUT = "Text: " text :(RETURN)
MTT_END
OUTPUT = "XML Test"
Txt = ReadAll()
ReadXml(Txt,0,.MyTSFunc,.MyTEFunc,.MyTTFunc)
END
Given the following input:
<uno>
<dos id="3">
asdf
<tres id="4">
h hh
</tres>
<cuatro>
iasdl
</cuatro>
<cinco id="42"/>
</dos>
</uno>
The program generates:
XML Test
Into uno
id=
Into dos
id=3
Text: asdf
Into tres
id=4
Text: h hh
Out of tres
Into cuatro
id=
Text: iasdl
Out of cuatro
Into cinco
id=42
Out of dos
Out of uno
The benefit of using a SAX-like approach is that the code could be reused for other programs. For example the following program prints all the links and the titles from an OPML file from Google Reader.
-include "xmlp.sno"
Define('TagVisitHandler(name,attributesTable)theUrl,title') :(TVH_END)
TagVisitHandler
name "outline" :F(Return)
title = attributesTable["text"]
theUrl = attributesTable["htmlUrl"]
ident(theUrl , '') :s(return)
OUTPUT = "Link for " title " : " theUrl :(RETURN)
TVH_END
Define('MiTEFunc(name)') :(MTE_END)
MiTEFunc :(RETURN)
MTE_END
Define('MiTTFunc(text)') :(MTT_END)
MiTTFunc :(RETURN)
MTT_END
Txt = ReadAll()
ReadXml(Txt,0,.TagVisitHandler,.MiTEFunc,.MiTTFunc)
END
Documentation from SNOBOL4.ORG was used as reference.