The focus of the second part will be expression parsing. The desired grammar for the expression small language experiment is the following (here in pseudo BNF):
NESTED_EXPRESSION = '(' EXPRESSION ')'
PRIMARY_EXPRESSION = SYMBOL | NUMBER | STRING | NESTED
UNARY_EXPRESSION = NOT_EXPRESSION | CALL_EXPRESSION | ARRAY_ACCESS_EXPRESSION |
PRIMARY_EXPRESSION
MULTIPLICATIVE_EXPRESSION = UNARY_EXPRESSION ( ('*' | '/') UNARY_EXPRESSION)*
ADDITIVE_EXPRESSION = MULTIPLICATIVE_EXPRESSION ( ('*' | '/') MULTIPLICATIVE_EXPRESSION)*
RELATIONAL_EXPRESSION = ADDITIVE_EXPRESSION ( ('<' | '>') ADDITIVE_EXPRESSION)*
EQUALITY_EXPRESSION = RELATIONAL_EXPRESSION ( ('=' | '<>') RELATIONAL_EXPRESSION)*
LOGICAL_OR_EXPRESSION = EQUALITY_EXPRESSION ('||' EQUALITY_EXPRESSION)*
LOGICAL_AND_EXPRESSION = LOGICAL_OR_EXPRESSION ('&&' LOGICAL_OR_EXPRESSION)*
EXPRESSION = LOGICAL_AND_EXPRESSIONIn this post we're going to build some key techniques for creating the code for parsing a subset of this grammar. Not every element of expression grammar will be presented since it gets repetitive at some point.
For this small experiment we're going to use the following F# types as the AST of the program:
type Operator =
| Plus
| Minus
| Times
| Div
| And
| Or
| Equal
| NotEqual
| Assign
| Lt
| Gt
type PExpr =
| PSymbol of string
| PString of string
| PNumber of string
| PBoolean of bool
| PNot of PExpr
| PCall of string * (PExpr list)
| PNested of PExpr
| PArrayAccess of PExpr * PExpr
| PBinaryOperation of Operator * PExpr * PExprSimple atomic expressions
We're going to start with the atomic expressions:
- Symbols or variables (ex.
x,y,z) - Number literals (ex
1,1.2) - String literals (ex.
"Hello")
We already got these parsers from the previous post:
let pSymbol =
concatParsers2
(readWithConditionOnChar (fun c -> System.Char.IsLetter(c, 0)))
(fun initialChar ->
concatParsers2
(readZeroOrMoreChars (fun c -> System.Char.IsLetter(c) || System.Char.IsDigit(c)))
(fun suffixString -> (preturn (PSymbol (initialChar + suffixString))))
)
let pNumber =
( (optionalP (readSpecificChar '-') "") >>= (fun neg ->
digitP >>= (fun firstChar ->
(readZeroOrMoreChars (fun c -> System.Char.IsDigit(c))) >>= (fun chars ->
(optionalP decimalPartP "") >>= (fun dec ->
preturn (PNumber (neg + firstChar + chars + dec)))))))
let pString =
whitespace >>
readSpecificChar '"' >>
readZeroOrMoreStringChars (fun previous current ->
(previous = '\\' && current = '"') || current <> '"')
>>= (fun stringContents ->
readSpecificChar '"' >> (preturn (PString stringContents)))We can call these expressions "primary expressions", which are used in conjunction with operators to create more complex elements. We need a parser that recognizes any one these elements. We can use the disjParser from the previous post :
> let primaryExpression = (disjParser (disjParser pSymbol pNumber) pString);;
val primaryExpression : (ReaderState -> (PExpr * ReaderState) option)
> parse "10" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PNumber "10", {Data = "10";
Position = 2;
Indentation = [0];})
> parse "x" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x", {Data = "x";
Position = 1;
Indentation = [0];})
> parse "\"hello\"" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PString "hello", {Data = ""hello"";
Position = 7;
Indentation = [0];})We can use List.reduce function to improve this code since we could have several parsers as the primary expression.
> let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString] ;;
val myPrimaryExpression : (ReaderState -> (PExpr * ReaderState) option)
> parse "10" myPrimaryExpression
- ;;
val it : (PExpr * ReaderState) option =
Some (PNumber "10", {Data = "10";
Position = 2;
Indentation = [0];})
> parse "x1" myPrimaryExpression
- ;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x1", {Data = "x1";
Position = 2;
Indentation = [0];})Binary operations
Binary expressions are composed of two expressions and an operator which connects them. For example: a + b . For these operations we can define the following utility parser creation function:
let buildExpressions (leftExpr:PExpr) (rightExprs:(Operator * PExpr) list) =
(List.fold (fun left (op,right) -> PBinaryOperation(op, left, right)) leftExpr rightExprs)
let pBinaryExpression operators lowerLevelElementParser =
lowerLevelElementParser
>>= (fun leftTerm ->
(zeroOrMore
(operators
>>= (fun op ->
lowerLevelElementParser >>= (fun expr -> preturn (op, expr))))
[])
>>= (fun acc -> preturn (buildExpressions leftTerm acc) ))This function allows us to build parsers for binary expressions . The same expressed in pseudo BNF may look like this:
pBinaryExpression = lowerLevelElementParser ( operators lowerLevelElementParser )*Which means that we'll like to recognize an element with lowerLevelElementParser and zero or more pairs of an element recognized by operators followed by another lowerLevelElementParser. What looks strange at first is the use of zero or more which implies that this parser can recognize just one element with lowerLevelElementParser. This will be useful in the following section when we try to implement operator precedence.
As an example we can define a parser for plus as follows:
> let identifyOperator operatorChar operatorResult =
- concatParsers
- whitespaceNoNl
- ((readSpecificChar operatorChar) >> (preturn operatorResult))
- ;;
val identifyOperator :
operatorChar:char ->
operatorResult:'a -> (ReaderState -> ('a * ReaderState) option)
> let plusOperator = identifyOperator '+' Plus;;
val plusOperator : (ReaderState -> (Operator * ReaderState) option)
> let myPlus = pBinaryExpression plusOperator myPrimaryExpression;;
val myPlus : (ReaderState -> (PExpr * ReaderState) option)
> parse "1+2" myPlus;;
val it : (PExpr * ReaderState) option =
Some (PBinaryOperation (Plus,PNumber "1",PNumber "2"), {Data = "1+2";
Position = 3;
Indentation = [0];})Arithmetic operations
Now we're going to add support for basic arithmetic operations: +, -, /, *. By using the pBinaryExpression utility function defined in the previous section we can preserve operator precedence. To do this we define the operations as follows:
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpressionWhat we say here is that an additive expression is composed of addition or subtraction (disjParser plusOperator minusOperator) of multiplicative expressions. Also we said that a multiplicative expression is composed of multiplication or division (disjParser pDivOperator pTimesOperator) of primary expressions.
An example of using these operators looks like this:
> parse "1 * 2 + 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Plus,PBinaryOperation (Times,PNumber "1",PNumber "2"),PNumber "3"),
{Data = "1 * 2 + 3";
Position = 9;
Indentation = [0];})
> parse "1 + 2 * 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Plus,PNumber "1",PBinaryOperation (Times,PNumber "2",PNumber "3")),
{Data = "1 + 2 * 3";
Position = 9;
Indentation = [0];})Recursion
One thing that was tricky to implement at first was recursion. At some point in the grammar we need to refer back to the top parser . For example a parentheses expression can have any other expression and its recognized as a primary expression. For example:
1 * (2 + 3)Needs to be parsed as:
*
/ \
/ \
1 (2+3)
|
+
/ \
2 3What we need to do is to define a top level pExpression parser used for recognizing all our expressions . This parser will be used to define the parenthesis or nested expression. At this point our grammar written using pseudo BNF looks like this:
pPrimaryExpression = pNumber | pSymbol | pString
pAdditiveExpression = pPrimaryExpression ( (plusOperator | minusOperator) pPrimaryExpression )*
pMultiplicativeExpression = pAdditiveExpression ( (divOperator | timesOperator) pPrimaryExpression )*
pTopExpression = pMultiplicativeExpressionIn F# this using our set of combinators it looks like:
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
let pTopExpression = pAdditiveExpressionWe can use pTopExpression to parse any expression:
> parse "3+10*x-1/32" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Minus,
PBinaryOperation
(Plus,PNumber "3",PBinaryOperation (Times,PNumber "10",PSymbol "x")),
PBinaryOperation (Div,PNumber "1",PNumber "32")),
{Data = "3+10*x-1/32";
Position = 11;
Indentation = [0];})
> parse "x" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x", {Data = "x";
Position = 1;
Indentation = [0];})But now we want to use pTopExpression to define one of our primary expressions:
//WARNING THE FOLLOWING CODE WILL NOT COMPILE
let readLPar =
concatParsers whitespace (readSpecificChar '(')
let readRPar = readSpecificChar ')'
let pNested = readLPar >>
pTopExpression >>= (fun expr ->
readRPar >> (preturn (PNested expr)))
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
let pTopExpression = pAdditiveExpressionWhen we try to compile this code we get the following error:
pTopExpression >>= (fun expr ->
--------------^^^^^^^^^^^^^^
/dir/stdin(6,15): error FS0039: The value or constructor 'pTopExpression' is not definedWhich is totally correct, pTopExpression is defined after pNested. It has to because pTopExpression is defined using pAdditiveExpression has a dependency on pPrimaryExpression. Until now we only used function composition to create our parsers.
What we want to do is to define:
number symbol string nested
^ ^ ^ ^ |
| | | | |
| | | | |
+------+------+-----+ |
| |
| |
primary |
| |
| |
additive |
| |
| |
multiplicative |
| |
| |
top <-----------+To solve this problem we're going to take advantage of reference variables in F#. And do the following trick:
let pTopExpressions = ref []
let pTopExpression =
fun state -> (List.reduce disjParser !pTopExpressions) state
let pNested = readLPar >>
pTopExpression >>= (fun expr ->
readRPar >> (preturn (PNested expr)))
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
pTopExpressions := [pAdditiveExpression]Using a reference (ref) allows us to create the recursive parser that we need . Here's an example of using it:
> parse "1*(2+3)" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Times,PNumber "1",
PNested (PBinaryOperation (Plus,PNumber "2",PNumber "3"))),
{Data = "1*(2+3)";
Position = 7;
Indentation = [0];})Next steps
Using the same techniques presented so far we can finish our basic grammar. This is part #2 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html.