SNOBOL (String Oriented Symbolic Language) is a computer programming language developed between 1962 and 1967 at AT&T Bell Laboratories by David J. Farber, Ralph E. Griswold and Ivan P. Polonsky.
SNOBOL is a language for string manipulations. Also from its Wikipedia entry:
... SNOBOL was widely used in the 1970s and 1980s as a text manipulation language ... its popularity has faded as newer languages such as Awk and Perl have made string manipulation by means of regular expressions popular ...
This language caught my attention while listening to the OOPSLA podcast episode on the excellent 50-in-50 talk by Guy Steele and Richard Gabriel.
Given that this a programming language exploration blog, learning more about this language provide an excellent opportunity to know more about the first languages for text manipulation.
The best place to learn about the language is http://www.snobol4.org/ a lot of SNOBOL resources can be found there. One of the best resources is a link to the THE SNOBOL4 PROGRAMMING LANGUAGE (Green Book).
All the examples presented in this post were created using the Macro SNOBOL4 in C implementation.
A "Hello world" program in SNOBOL looks like this:
OUTPUT = 'Hello World'
As shown here, the assignment to the special
OUTPUTvariable outputs the value to the standard output.
The inverse is also true for the
INPUTvariable. For example the following program asks the name of the user.
OUTPUT = "Your name? "
NAME = INPUT
OUTPUT = "Hello " NAME
Flow of control is given by jumps to labels given the successful execution of a statement. For example:
OUTPUT = "Your name? "
NAME = INPUT :F(DONE)
OUTPUT = "Hello " NAME :(ASK)
OUTPUT = "Finished"
This example asks for a name until the input is closed, that is end of file or Ctrl+D (in Linux). The
ENDelements are labels; all of them (except for END) are user specified names. The
:F(DONE)modifier means jump to DONE if failed and the
:(ASK)modifier means jump to ASK.
The most interesting thing about the language is the string pattern matching capabilities. Here's an small(and very incomplete) example that extracts the parts of a simplified URL string:
LETTER = "abcdefghijklmnopqrstuvwxyz"
LETTERORDOT = "." LETTER
LETTERORSLASH = "/" LETTER
LINE = INPUT
LINE SPAN(LETTER) . PROTO "://" SPAN(LETTERORDOT) . HOST "/" SPAN(LETTERORSLASH) . RES
OUTPUT = PROTO
OUTPUT = HOST
OUTPUT = RES
In line 6, the contents of the LINE variable is matched against a pattern. The pattern contains the following elements:
SPAN(LETTER) . PROTO "://"section says identify a sequence of letters followed by "://" and assign them to the variable called PROTO
SPAN(LETTERORDOT) . HOST "/"secotion says take a sequence of letters and dots followed by "/" and assign then to the variable called HOST
- Finally the last section takes the remaining letters and slash characters and assign them to the RES variable
To show a litte program that uses all the elements presented here, I wanted to create a small example that takes as input the authentication
/var/log/auth.logand shows all the uses of sudo and the program that was executed. The desired lines look like this:
Dec 28 08:21:42 glorfindel sudo: lfallas : TTY=pts/3 ; PWD=/home/lfallas ; USER=root ; COMMAND=/bin/bash
This file also contains entries other than sudo usages, so we have to ignore them.
Heres the program:
&ANCHOR = 0
UCASE = "ABCDEFGHIJLKMNOPQRSTUVWXYZ"
LCASE = "abcdefghijlkmnopqrstuvwxyz"
DIGIT = "0123456789"
APATH = SPAN(DIGIT)
USERNAMECHAR = DIGIT LCASE UCASE
USERNAMEPAT = SPAN(USERNAMECHAR)
READLINE LINE = INPUT :F(DONE)
LINE " sudo: " USERNAMEPAT . USER :F(READLINE)
LINE "COMMAND=" ARB . COMMAND RPOS(0)
OUTPUT = USER ":" COMMAND :(READLINE)
Here the &ANCHOR assignment tells SNOBOL that pattern matching is performed at any position of the specified string. The
ARBelement says any character before the next pattern succeeds and the
RPOS(0)element is used to identify the end of line.
For future entries I'm going to show more interesting SNOBOL features.