We're going to start by defining an AST for Scheme-like expressions:
data Expr = ScSymbol String
| ScString String
| ScNumber Integer
| ScDouble Double
| ScCons Expr Expr
| ScNil
| ScBool Bool
| ScQuote Expr
deriving Show
For this experiment I'm going to use the Text.Parsec.Token and Text.Parsec.Language packages which have useful functions for creating language parsers.
idSymbol = oneOf ":!$%&*+/<=>?@\\^|-~"
schemeLanguageDef :: LanguageDef st
schemeLanguageDef = emptyDef {
commentLine = ";;"
, identStart = letter <|> idSymbol
, identLetter = alphaNum <|> idSymbol
, opStart = parserZero
, opLetter = parserZero
}
schemeTokenParser = makeTokenParser schemeLanguageDef
TokenParser {
identifier = idParser,
reservedOp = opParser,
stringLiteral = stringLiteralParser,
parens = parParser,
lexeme = lexParser,
naturalOrFloat = naturalOrFloatParser
} = schemeTokenParser
Given these definitions we get parsers for identifiers, numbers, string literals "for free".
Boolean literals are parsed using the following parser:
boolLiteral = lexParser (
do
char '#'
val <- (char 't') <|> (char 'f')
return $ ScBool $ val == 't'
)
Basic quotations are parsed as follows:
quoteParser = lexParser (
do
char '\''
val <- expressionParser
return $ ScQuote val
)
Atomic expressions are parsed as follows:
atomParser =
(do
id <- idParser
return $ ScSymbol id)
<|>
(do
fnumber <- naturalOrFloatParser
return $ case fnumber of
Left num -> ScNumber num
Right num -> ScDouble num)
<|>
(do str <- stringLiteralParser
return $ ScString str)
<|> boolLiteral
<|> quoteParser
Next we need to consider S-Expressions. This kind of expressions are combinations of the above elements surrounded by parentheses.
For example:
(+ 1 2 3)
dottedSuffixParser =
do
dotParser
finalExpr <- expressionParser
return finalExpr
parExpressionParser =
do (exprs, last) <- parParser
(do
seq <- many expressionParser
dottedSuffix <- optionMaybe dottedSuffixParser
return (case dottedSuffix of
Just lastExpr -> (seq, lastExpr)
Nothing -> (seq, ScNil)))
return $ foldr ScCons last exprs
As shown above, the parser need to verify if there was a dot ('.') before the last element for example:
(1 . 2)
More details on the dot syntax can be found here.
Finally we define expressions:
expressionParser =
atomParser <|> parExpressionParser
parseIt input = parse expressionParser "" input
We can use this function to parse this expressions as follows:
*SCParser> parseIt "(- x 1)"
Right (ScCons (ScSymbol "-") (ScCons (ScSymbol "x") (ScCons (ScNumber 1) ScNil)))
There's a great a series of articles on implementing a Scheme interpreter in Haskell called "Write Yourself a Scheme in 48 Hours/Parsing" which is a great source of information on this topic.