The focus of the second part will be expression parsing. The desired grammar for the expression small language experiment is the following (here in pseudo BNF):
NESTED_EXPRESSION = '(' EXPRESSION ')'
PRIMARY_EXPRESSION = SYMBOL | NUMBER | STRING | NESTED
UNARY_EXPRESSION = NOT_EXPRESSION | CALL_EXPRESSION | ARRAY_ACCESS_EXPRESSION |
PRIMARY_EXPRESSION
MULTIPLICATIVE_EXPRESSION = UNARY_EXPRESSION ( ('*' | '/') UNARY_EXPRESSION)*
ADDITIVE_EXPRESSION = MULTIPLICATIVE_EXPRESSION ( ('*' | '/') MULTIPLICATIVE_EXPRESSION)*
RELATIONAL_EXPRESSION = ADDITIVE_EXPRESSION ( ('<' | '>') ADDITIVE_EXPRESSION)*
EQUALITY_EXPRESSION = RELATIONAL_EXPRESSION ( ('=' | '<>') RELATIONAL_EXPRESSION)*
LOGICAL_OR_EXPRESSION = EQUALITY_EXPRESSION ('||' EQUALITY_EXPRESSION)*
LOGICAL_AND_EXPRESSION = LOGICAL_OR_EXPRESSION ('&&' LOGICAL_OR_EXPRESSION)*
EXPRESSION = LOGICAL_AND_EXPRESSION
In this post we're going to build some key techniques for creating the code for parsing a subset of this grammar. Not every element of expression grammar will be presented since it gets repetitive at some point.
For this small experiment we're going to use the following F# types as the AST of the program:
type Operator =
| Plus
| Minus
| Times
| Div
| And
| Or
| Equal
| NotEqual
| Assign
| Lt
| Gt
type PExpr =
| PSymbol of string
| PString of string
| PNumber of string
| PBoolean of bool
| PNot of PExpr
| PCall of string * (PExpr list)
| PNested of PExpr
| PArrayAccess of PExpr * PExpr
| PBinaryOperation of Operator * PExpr * PExpr
Simple atomic expressions
We're going to start with the atomic expressions:
- Symbols or variables (ex.
x
,y
,z
) - Number literals (ex
1
,1.2
) - String literals (ex.
"Hello"
)
We already got these parsers from the previous post:
let pSymbol =
concatParsers2
(readWithConditionOnChar (fun c -> System.Char.IsLetter(c, 0)))
(fun initialChar ->
concatParsers2
(readZeroOrMoreChars (fun c -> System.Char.IsLetter(c) || System.Char.IsDigit(c)))
(fun suffixString -> (preturn (PSymbol (initialChar + suffixString))))
)
let pNumber =
( (optionalP (readSpecificChar '-') "") >>= (fun neg ->
digitP >>= (fun firstChar ->
(readZeroOrMoreChars (fun c -> System.Char.IsDigit(c))) >>= (fun chars ->
(optionalP decimalPartP "") >>= (fun dec ->
preturn (PNumber (neg + firstChar + chars + dec)))))))
let pString =
whitespace >>
readSpecificChar '"' >>
readZeroOrMoreStringChars (fun previous current ->
(previous = '\\' && current = '"') || current <> '"')
>>= (fun stringContents ->
readSpecificChar '"' >> (preturn (PString stringContents)))
We can call these expressions "primary expressions", which are used in conjunction with operators to create more complex elements. We need a parser that recognizes any one these elements. We can use the disjParser
from the previous post :
> let primaryExpression = (disjParser (disjParser pSymbol pNumber) pString);;
val primaryExpression : (ReaderState -> (PExpr * ReaderState) option)
> parse "10" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PNumber "10", {Data = "10";
Position = 2;
Indentation = [0];})
> parse "x" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x", {Data = "x";
Position = 1;
Indentation = [0];})
> parse "\"hello\"" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PString "hello", {Data = ""hello"";
Position = 7;
Indentation = [0];})
We can use List.reduce
function to improve this code since we could have several parsers as the primary expression.
> let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString] ;;
val myPrimaryExpression : (ReaderState -> (PExpr * ReaderState) option)
> parse "10" myPrimaryExpression
- ;;
val it : (PExpr * ReaderState) option =
Some (PNumber "10", {Data = "10";
Position = 2;
Indentation = [0];})
> parse "x1" myPrimaryExpression
- ;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x1", {Data = "x1";
Position = 2;
Indentation = [0];})
Binary operations
Binary expressions are composed of two expressions and an operator which connects them. For example: a + b
. For these operations we can define the following utility parser creation function:
let buildExpressions (leftExpr:PExpr) (rightExprs:(Operator * PExpr) list) =
(List.fold (fun left (op,right) -> PBinaryOperation(op, left, right)) leftExpr rightExprs)
let pBinaryExpression operators lowerLevelElementParser =
lowerLevelElementParser
>>= (fun leftTerm ->
(zeroOrMore
(operators
>>= (fun op ->
lowerLevelElementParser >>= (fun expr -> preturn (op, expr))))
[])
>>= (fun acc -> preturn (buildExpressions leftTerm acc) ))
This function allows us to build parsers for binary expressions . The same expressed in pseudo BNF may look like this:
pBinaryExpression = lowerLevelElementParser ( operators lowerLevelElementParser )*
Which means that we'll like to recognize an element with lowerLevelElementParser
and zero or more pairs of an element recognized by operators
followed by another lowerLevelElementParser
. What looks strange at first is the use of zero or more
which implies that this parser can recognize just one element with lowerLevelElementParser
. This will be useful in the following section when we try to implement operator precedence.
As an example we can define a parser for plus
as follows:
> let identifyOperator operatorChar operatorResult =
- concatParsers
- whitespaceNoNl
- ((readSpecificChar operatorChar) >> (preturn operatorResult))
- ;;
val identifyOperator :
operatorChar:char ->
operatorResult:'a -> (ReaderState -> ('a * ReaderState) option)
> let plusOperator = identifyOperator '+' Plus;;
val plusOperator : (ReaderState -> (Operator * ReaderState) option)
> let myPlus = pBinaryExpression plusOperator myPrimaryExpression;;
val myPlus : (ReaderState -> (PExpr * ReaderState) option)
> parse "1+2" myPlus;;
val it : (PExpr * ReaderState) option =
Some (PBinaryOperation (Plus,PNumber "1",PNumber "2"), {Data = "1+2";
Position = 3;
Indentation = [0];})
Arithmetic operations
Now we're going to add support for basic arithmetic operations: +
, -
, /
, *
. By using the pBinaryExpression
utility function defined in the previous section we can preserve operator precedence. To do this we define the operations as follows:
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
What we say here is that an additive expression
is composed of addition or subtraction (disjParser plusOperator minusOperator)
of multiplicative expressions
. Also we said that a multiplicative expression
is composed of multiplication or division (disjParser pDivOperator pTimesOperator)
of primary expressions.
An example of using these operators looks like this:
> parse "1 * 2 + 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Plus,PBinaryOperation (Times,PNumber "1",PNumber "2"),PNumber "3"),
{Data = "1 * 2 + 3";
Position = 9;
Indentation = [0];})
> parse "1 + 2 * 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Plus,PNumber "1",PBinaryOperation (Times,PNumber "2",PNumber "3")),
{Data = "1 + 2 * 3";
Position = 9;
Indentation = [0];})
Recursion
One thing that was tricky to implement at first was recursion. At some point in the grammar we need to refer back to the top parser . For example a parentheses expression can have any other expression and its recognized as a primary expression. For example:
1 * (2 + 3)
Needs to be parsed as:
*
/ \
/ \
1 (2+3)
|
+
/ \
2 3
What we need to do is to define a top level pExpression
parser used for recognizing all our expressions . This parser will be used to define the parenthesis or nested expression. At this point our grammar written using pseudo BNF looks like this:
pPrimaryExpression = pNumber | pSymbol | pString
pAdditiveExpression = pPrimaryExpression ( (plusOperator | minusOperator) pPrimaryExpression )*
pMultiplicativeExpression = pAdditiveExpression ( (divOperator | timesOperator) pPrimaryExpression )*
pTopExpression = pMultiplicativeExpression
In F# this using our set of combinators it looks like:
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
let pTopExpression = pAdditiveExpression
We can use pTopExpression
to parse any expression:
> parse "3+10*x-1/32" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Minus,
PBinaryOperation
(Plus,PNumber "3",PBinaryOperation (Times,PNumber "10",PSymbol "x")),
PBinaryOperation (Div,PNumber "1",PNumber "32")),
{Data = "3+10*x-1/32";
Position = 11;
Indentation = [0];})
> parse "x" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x", {Data = "x";
Position = 1;
Indentation = [0];})
But now we want to use pTopExpression
to define one of our primary expressions:
//WARNING THE FOLLOWING CODE WILL NOT COMPILE
let readLPar =
concatParsers whitespace (readSpecificChar '(')
let readRPar = readSpecificChar ')'
let pNested = readLPar >>
pTopExpression >>= (fun expr ->
readRPar >> (preturn (PNested expr)))
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
let pTopExpression = pAdditiveExpression
When we try to compile this code we get the following error:
pTopExpression >>= (fun expr ->
--------------^^^^^^^^^^^^^^
/dir/stdin(6,15): error FS0039: The value or constructor 'pTopExpression' is not defined
Which is totally correct, pTopExpression
is defined after pNested
. It has to because pTopExpression is defined using pAdditiveExpression
has a dependency on pPrimaryExpression. Until now we only used function composition to create our parsers.
What we want to do is to define:
number symbol string nested
^ ^ ^ ^ |
| | | | |
| | | | |
+------+------+-----+ |
| |
| |
primary |
| |
| |
additive |
| |
| |
multiplicative |
| |
| |
top <-----------+
To solve this problem we're going to take advantage of reference variables in F#. And do the following trick:
let pTopExpressions = ref []
let pTopExpression =
fun state -> (List.reduce disjParser !pTopExpressions) state
let pNested = readLPar >>
pTopExpression >>= (fun expr ->
readRPar >> (preturn (PNested expr)))
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
pTopExpressions := [pAdditiveExpression]
Using a reference (ref
) allows us to create the recursive parser that we need . Here's an example of using it:
> parse "1*(2+3)" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Times,PNumber "1",
PNested (PBinaryOperation (Plus,PNumber "2",PNumber "3"))),
{Data = "1*(2+3)";
Position = 7;
Indentation = [0];})
Next steps
Using the same techniques presented so far we can finish our basic grammar. This is part #2 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html.