I couldn't find a tool for XML parsing/processing for SNOBOL4 so I decided to try to create one to learn more about the language. I decided to use SAX method because is simpler to implement and lets me focus in the text processing part of the code .
Since SNOBOL4 works consumes the input line by line, a function to flatten all the input was created. This helps by eliminating the problem of considering line breaks, but it also makes the code very inefficient since it creates a big line with all the code in in the XML file.
   Define('ReadAll()content,tcontent') :(RA_END) 
ReadAll
    content = ''
RA_LOOP
    tcontent = INPUT           : F(ERA_LOOP)
    content = content tcontent : (RA_LOOP)
ERA_LOOP
    ReadAll = content :(RETURN)
RA_END
Having this problem solved, the Xml processing function looks as follows:
   Define('ReadXml(inputStr,iPos,fTStart,fTEnd,fText)iPos,fPos,name,closing,attsString,text') :(RX_END) 
ReadXml
    &anchor = 0
Init
XmlDirectiveL
    inputStr POS(iPos) '<?' ARB '?>'  @fPos  :F(TagStartL)
    iPos = fPos     :(Init)
TagStartL    
    inputStr POS(iPos) '<' SPAN(TagChar) $ name ARB $ attsString ('/>' | '>') $ closing  @fPos  :F(EndTagL)
    attsTable = ReadAttributes(attsString) 
    iPos = fPos
    APPLY(fTStart,name,attsTable) :(Init)
EndTagL
    inputStr POS(iPos) '</' SPAN(TagChar) $ name '>'  @fPos  :F(BlanksL)
    iPos = fPos
    APPLY(fTEnd,name)  :(Init)
BlanksL
    inputStr POS(iPos) SPAN(Blank) @fPos  :F(TextL)
    iPos = fPos :(Init)
TextL
    inputStr POS(iPos) BREAK('<') $ text @fPos  :F(RXXS_END)
    iPos = fPos 
    APPLY(fText,text) :(Init)
RXXS_END 
    
     :(RETURN)
RX_ENDThis code keeps track of the position in the string where the last XML element structure matched by using the
iPos variable. The '@' symbol followed by a variable records the position in the input string at a given moment. Each part of this function marked by the labels
XmlDirectiveL, TagStartL, EndTagL, BlanksL, TextL matches one XML element and calls a callback function specified by the fTStart, fTEnd and fText parameters. The call is made by using the APPLY function.The contents of the
ReadAttributes function is the following.
   TagChar = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ:-'
   AttNameChar = TagChar
   Blank = " "
   Define("ReadAttributes(text)result,attsText,iPOS,fPOS,name,value") :(ratts)
ReadAttributes
   result = table()
   attsText = trim(text)
   iPOS = 0
ratts_loop   
   attsText POS(iPOS) ARBNO(' ') SPAN(AttNameChar) $ name ARBNO(' ') '=' ARBNO(' ') '"' BREAK('"') $ value  '"' @fPOS  :F(ratts_loop_end)
   result = value 
   iPOS = fPOS :(ratts_loop)
ratts_loop_end     
   ReadAttributes = result :(Return)
ratts
 An example of the use of these functions is the following:
-include "xmlp.sno"
     Define('MyTSFunc(name,attributesTable)') :(MTS_END)
MyTSFunc
     OUTPUT = "Into " name 
     OUTPUT = "id=" attributesTable["id"] :(RETURN)
MTS_END
     Define('MyTEFunc(name)') :(MTE_END)
MyTEFunc
     OUTPUT = "Out of " name :(RETURN)
MTE_END
     Define('MyTTFunc(text)') :(MTT_END)
MyTTFunc
     OUTPUT = "Text: " text :(RETURN)
MTT_END
  OUTPUT = "XML Test"
  Txt = ReadAll()
  ReadXml(Txt,0,.MyTSFunc,.MyTEFunc,.MyTTFunc)
END 
Given the following input:
<uno>
   <dos id="3">
     asdf
      <tres id="4">
       h hh
      </tres>
      <cuatro>
        iasdl
      </cuatro>
      <cinco id="42"/>
   </dos>
</uno>
The program generates:
XML Test
Into uno
id=
Into dos
id=3
Text: asdf      
Into tres
id=4
Text: h hh      
Out of tres
Into cuatro
id=
Text: iasdl      
Out of cuatro
Into cinco
id=42
Out of dos
Out of uno
The benefit of using a SAX-like approach is that the code could be reused for other programs. For example the following program prints all the links and the titles from an OPML file from Google Reader.
-include "xmlp.sno"
     Define('TagVisitHandler(name,attributesTable)theUrl,title') :(TVH_END)
TagVisitHandler
     name "outline" :F(Return)
     title = attributesTable["text"]  
     theUrl = attributesTable["htmlUrl"]  
     ident(theUrl , '') :s(return)
     OUTPUT = "Link for " title " : " theUrl :(RETURN)
TVH_END
     Define('MiTEFunc(name)') :(MTE_END)
MiTEFunc :(RETURN)
MTE_END
     Define('MiTTFunc(text)') :(MTT_END)
MiTTFunc :(RETURN)
MTT_END
  Txt = ReadAll()
  ReadXml(Txt,0,.TagVisitHandler,.MiTEFunc,.MiTTFunc)
END 
Documentation from SNOBOL4.ORG was used as reference.
