Sunday, February 26, 2017

A simple language with indentation based blocks. Part 2: Expressions

The focus of the second part will be expression parsing. The desired grammar for the expression small language experiment is the following (here in pseudo BNF):

NESTED_EXPRESSION = '(' EXPRESSION ')'

PRIMARY_EXPRESSION = SYMBOL | NUMBER | STRING | NESTED

UNARY_EXPRESSION =  NOT_EXPRESSION | CALL_EXPRESSION | ARRAY_ACCESS_EXPRESSION |
                    PRIMARY_EXPRESSION

MULTIPLICATIVE_EXPRESSION = UNARY_EXPRESSION ( ('*' | '/') UNARY_EXPRESSION)*

ADDITIVE_EXPRESSION = MULTIPLICATIVE_EXPRESSION ( ('*' | '/') MULTIPLICATIVE_EXPRESSION)*

RELATIONAL_EXPRESSION = ADDITIVE_EXPRESSION ( ('<' | '>') ADDITIVE_EXPRESSION)*

EQUALITY_EXPRESSION = RELATIONAL_EXPRESSION ( ('=' | '<>') RELATIONAL_EXPRESSION)*

LOGICAL_OR_EXPRESSION = EQUALITY_EXPRESSION  ('||'  EQUALITY_EXPRESSION)* 

LOGICAL_AND_EXPRESSION = LOGICAL_OR_EXPRESSION ('&&'  LOGICAL_OR_EXPRESSION)* 

EXPRESSION = LOGICAL_AND_EXPRESSION

In this post we're going to build some key techniques for creating the code for parsing a subset of this grammar. Not every element of expression grammar will be presented since it gets repetitive at some point.

For this small experiment we're going to use the following F# types as the AST of the program:

type Operator =
    | Plus
    | Minus
    | Times
    | Div
    | And
    | Or
    | Equal
    | NotEqual
    | Assign
    | Lt
    | Gt
    

type PExpr =
    | PSymbol  of string
    | PString  of string
    | PNumber  of string
    | PBoolean of bool
    | PNot     of PExpr
    | PCall    of string * (PExpr list)
    | PNested  of PExpr
    | PArrayAccess of PExpr * PExpr
    | PBinaryOperation of Operator * PExpr * PExpr

Simple atomic expressions

We're going to start with the atomic expressions:

  • Symbols or variables (ex. x,y,z)
  • Number literals (ex 1, 1.2)
  • String literals (ex. "Hello")

We already got these parsers from the previous post:

   let pSymbol =
       concatParsers2
          (readWithConditionOnChar  (fun c -> System.Char.IsLetter(c, 0)))
          (fun initialChar ->
               concatParsers2
                  (readZeroOrMoreChars (fun c -> System.Char.IsLetter(c) || System.Char.IsDigit(c)))
                  (fun suffixString -> (preturn (PSymbol (initialChar + suffixString))))
           )


   let pNumber =
              ( (optionalP (readSpecificChar '-') "") >>= (fun neg -> 
                digitP  >>= (fun firstChar -> 
                (readZeroOrMoreChars (fun c ->  System.Char.IsDigit(c))) >>= (fun chars ->
                (optionalP decimalPartP "") >>= (fun dec ->                                                                               
                preturn (PNumber (neg + firstChar + chars + dec)))))))


   let pString =
       whitespace >>
       readSpecificChar '"' >>
       readZeroOrMoreStringChars (fun previous current ->
                 (previous = '\\' && current = '"') || current <> '"')
       >>= (fun stringContents ->
            readSpecificChar '"' >> (preturn (PString stringContents)))

We can call these expressions "primary expressions", which are used in conjunction with operators to create more complex elements. We need a parser that recognizes any one these elements. We can use the disjParser from the previous post :

> let primaryExpression = (disjParser (disjParser pSymbol pNumber) pString);;

val primaryExpression : (ReaderState -> (PExpr * ReaderState) option)

> parse "10" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
  Some (PNumber "10", {Data = "10";
                       Position = 2;
                       Indentation = [0];})
> parse "x" myPrimaryExpression;; 
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x", {Data = "x";
                      Position = 1;
                      Indentation = [0];})
> parse "\"hello\"" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
  Some (PString "hello", {Data = ""hello"";
                          Position = 7;
                          Indentation = [0];})

We can use List.reduce function to improve this code since we could have several parsers as the primary expression.

> let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]  ;;

val myPrimaryExpression : (ReaderState -> (PExpr * ReaderState) option)

> parse "10" myPrimaryExpression
- ;;
val it : (PExpr * ReaderState) option =
  Some (PNumber "10", {Data = "10";
                       Position = 2;
                       Indentation = [0];})
> parse "x1" myPrimaryExpression 
- ;; 
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x1", {Data = "x1";
                       Position = 2;
                       Indentation = [0];})

Binary operations

Binary expressions are composed of two expressions and an operator which connects them. For example: a + b . For these operations we can define the following utility parser creation function:

   let buildExpressions (leftExpr:PExpr) (rightExprs:(Operator * PExpr) list) =
       (List.fold (fun left (op,right) -> PBinaryOperation(op, left, right)) leftExpr rightExprs)

   let pBinaryExpression operators lowerLevelElementParser  =
       lowerLevelElementParser
         >>= (fun leftTerm ->
                 (zeroOrMore
                     (operators
                        >>= (fun op ->
                               lowerLevelElementParser >>= (fun expr -> preturn (op, expr))))
                     [])
                   >>= (fun acc -> preturn (buildExpressions leftTerm acc) ))

This function allows us to build parsers for binary expressions . The same expressed in pseudo BNF may look like this:

   pBinaryExpression = lowerLevelElementParser ( operators  lowerLevelElementParser )*

Which means that we'll like to recognize an element with lowerLevelElementParser and zero or more pairs of an element recognized by operators followed by another lowerLevelElementParser. What looks strange at first is the use of zero or more which implies that this parser can recognize just one element with lowerLevelElementParser. This will be useful in the following section when we try to implement operator precedence.

As an example we can define a parser for plus as follows:

> let identifyOperator operatorChar operatorResult =
-        concatParsers
-           whitespaceNoNl
-           ((readSpecificChar operatorChar) >> (preturn operatorResult))
- ;;

val identifyOperator :
  operatorChar:char ->
    operatorResult:'a -> (ReaderState -> ('a * ReaderState) option)
          
> let plusOperator = identifyOperator '+' Plus;;

val plusOperator : (ReaderState -> (Operator * ReaderState) option)

> let myPlus = pBinaryExpression plusOperator myPrimaryExpression;;

val myPlus : (ReaderState -> (PExpr * ReaderState) option)

> parse "1+2" myPlus;;
val it : (PExpr * ReaderState) option =
  Some (PBinaryOperation (Plus,PNumber "1",PNumber "2"), {Data = "1+2";
                                                          Position = 3;
                                                          Indentation = [0];})

Arithmetic operations

Now we're going to add support for basic arithmetic operations: +, -, /, *. By using the pBinaryExpression utility function defined in the previous section we can preserve operator precedence. To do this we define the operations as follows:

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

What we say here is that an additive expression is composed of addition or subtraction (disjParser plusOperator minusOperator) of multiplicative expressions. Also we said that a multiplicative expression is composed of multiplication or division (disjParser pDivOperator pTimesOperator) of primary expressions.

An example of using these operators looks like this:

> parse "1 * 2 + 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Plus,PBinaryOperation (Times,PNumber "1",PNumber "2"),PNumber "3"),
     {Data = "1 * 2 + 3";
      Position = 9;
      Indentation = [0];})
> parse "1 + 2 * 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Plus,PNumber "1",PBinaryOperation (Times,PNumber "2",PNumber "3")),
     {Data = "1 + 2 * 3";
      Position = 9;
      Indentation = [0];})

Recursion

One thing that was tricky to implement at first was recursion. At some point in the grammar we need to refer back to the top parser . For example a parentheses expression can have any other expression and its recognized as a primary expression. For example:

1 * (2 + 3)

Needs to be parsed as:

     *
    / \
   /   \
  1  (2+3)
       |
       +
      / \
     2   3

What we need to do is to define a top level pExpression parser used for recognizing all our expressions . This parser will be used to define the parenthesis or nested expression. At this point our grammar written using pseudo BNF looks like this:

pPrimaryExpression = pNumber | pSymbol | pString

pAdditiveExpression = pPrimaryExpression ( (plusOperator |  minusOperator) pPrimaryExpression )*

pMultiplicativeExpression = pAdditiveExpression ( (divOperator |  timesOperator) pPrimaryExpression )*

pTopExpression = pMultiplicativeExpression

In F# this using our set of combinators it looks like:

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

let pTopExpression = pAdditiveExpression

We can use pTopExpression to parse any expression:

> parse "3+10*x-1/32" pTopExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Minus,
        PBinaryOperation
          (Plus,PNumber "3",PBinaryOperation (Times,PNumber "10",PSymbol "x")),
        PBinaryOperation (Div,PNumber "1",PNumber "32")),
     {Data = "3+10*x-1/32";
      Position = 11;
      Indentation = [0];})
> parse "x" pTopExpression;;          
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x", {Data = "x";
                      Position = 1;
                      Indentation = [0];})

But now we want to use pTopExpression to define one of our primary expressions:

//WARNING THE FOLLOWING CODE WILL NOT COMPILE
let readLPar =
       concatParsers whitespace (readSpecificChar '(')
let readRPar = readSpecificChar ')'

let pNested = readLPar    >>
              pTopExpression >>= (fun expr ->
              readRPar    >>  (preturn (PNested expr)))

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

let pTopExpression = pAdditiveExpression

When we try to compile this code we get the following error:

                pTopExpression >>= (fun expr ->
  --------------^^^^^^^^^^^^^^

/dir/stdin(6,15): error FS0039: The value or constructor 'pTopExpression' is not defined

Which is totally correct, pTopExpression is defined after pNested. It has to because pTopExpression is defined using pAdditiveExpression has a dependency on pPrimaryExpression. Until now we only used function composition to create our parsers.

What we want to do is to define:

   number symbol string nested
     ^      ^      ^     ^  |
     |      |      |     |  |
     |      |      |     |  |
     +------+------+-----+  |
            |               |
            |               |
         primary            |
            |               |
            |               |
        additive            |
            |               |
            |               |
      multiplicative        |
            |               |
            |               |
           top  <-----------+

To solve this problem we're going to take advantage of reference variables in F#. And do the following trick:

let pTopExpressions = ref []

let pTopExpression =
       fun state -> (List.reduce disjParser !pTopExpressions) state


let pNested = readLPar    >>
              pTopExpression >>= (fun expr ->
              readRPar    >>  (preturn (PNested expr)))

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression


pTopExpressions := [pAdditiveExpression]

Using a reference (ref) allows us to create the recursive parser that we need . Here's an example of using it:

> parse "1*(2+3)" pTopExpression;; 
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Times,PNumber "1",
        PNested (PBinaryOperation (Plus,PNumber "2",PNumber "3"))),
     {Data = "1*(2+3)";
      Position = 7;
      Indentation = [0];})

Next steps

Using the same techniques presented so far we can finish our basic grammar. This is part #2 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html.

No comments: