Sunday, March 10, 2019

A quick note about programming in Prolog

Prolog predicates work in different ways depending on how you use them. A simple example is the append predicate. One may say that append is used to get the result of appending two lists. For example:

?- append([1,2,3], [4,5,6], NewList).
NewList = [1, 2, 3, 4, 5, 6].

But we can also use append to get the prefix of a list given a suffix.

?- append(Prefix, [5,6], [1,2,3,4,5,6]).
Prefix = [1, 2, 3, 4] ;

Or we can get all the possible combinations of lists that can concatenated produce a specified list:

?- append(First, Second, [1,2,3,4,5,6]).
First = [],
Second = [1, 2, 3, 4, 5, 6] ;
First = [1],
Second = [2, 3, 4, 5, 6] ;
First = [1, 2],
Second = [3, 4, 5, 6] ;
First = [1, 2, 3],
Second = [4, 5, 6] ;
First = [1, 2, 3, 4],
Second = [5, 6] ;
First = [1, 2, 3, 4, 5],
Second = [6] ;
First = [1, 2, 3, 4, 5, 6],
Second = [] ;

This is a powerful concept that is accomplished by Prolog's execution mecanism which involves unification, resolution and backtracking. This mechanism tries to find a way to bind free variables so the goal makes sense.

Recently, I found an example of this behavior while writing some quick experiments with a JavaScript parser I'm working on (which deserves a separate post).

It all started while trying to define a predicate to translate from a JavaScript switch statement to an equivalent sequence of if/else statements. An example of this conversion it's take the following statement:

switch(myVar) {
  case 1: 
     print('one');
     break;
  case 2:
     print('two');
     break;
}

And convert it to something like:

if (myVar == 1) {
   print('one');
} else if (myVar == 2) {
   print('two');
}

A simple version of a predicate that performs this conversion is the following:

switch_if(js_switch(js_identifier(Variable, _),
                    Cases,
                   _),
          IfStat) :-
       switch_if_cases(Variable, Cases, IfStat).

switch_if_cases(Variable,
                [js_case(Value, Body, _)|Rest],
                js_if(Comparison,
                      js_block(NoBreakBody, _),
                      RestIf,
                      _)) :-
     value_comparison(Comparison, Variable, Value),
     body_with_no_break(Body, NoBreakBody),
     switch_if_cases(Variable, Rest, RestIf).

switch_if_cases(Variable,
                [js_default( Body, _)],
                js_block(NoBreakBody, _)) :-
     body_with_no_break(Body, NoBreakBody).

value_comparison(js_binary_operation(
                          equals_op,
                          js_identifier(Variable, _),
                          Value,
                          _),
           Variable, Value).

body_with_no_break([js_break(_)], []).
body_with_no_break([Stat|Rest1], [Stat|Rest2]) :-
    body_with_no_break(Rest1, Rest2)

Here's an example of using the switch_if predicate to convert a small code snippet. The input contains a switch statement. The result is unified in the IfString variable.

?- parse_js_stat_string("switch(w) {
case 1:
   print('first');
   break;
case 2:
   print('second');
   break;
default:
   print('Nothing');
   break;
}", SwitchAst), switch_if(SwitchAst, IfAst), print_js_ast_to_string(IfAst, IfString), writef(IfString).
if (w == 1) {
 print('first', );
} else if (w == 2) {
 print('second', );
} else {
 print('Nothing', );
}
SwitchAst = js_switch(...),
IfString = "if (w == 1) {\n print('first', );\n} else if (w == 2) {\n print('second', );\n} else {\n print('Nothing', );\n}"

The parse_js_stat_string predicate parses a string containing a statement. On success this predicate generates a simple AST representation of the input element as a Prolog term. For example:

?- parse_js_stat_string("switch(w) {
       case 1:
         print('first');
         break;
       case 2:
         print('second');
         break;
       default:
         print('Nothing');
         break;
    }", SwitchAst).
SwitchAst = js_switch(js_identifier([119], lex_info(1, [])), [js_case(js_literal(number, [49], lex_info(2, [ws(19, false)])), [js_expr_stat(js_call(js_identifier([112, 114|...], lex_info(3, [ws(..., ...)])), js_arguments([js_literal(string, [...|...], lex_info(..., ...))], lex_info(3, [], 3, [])), null)), js_break(lex_info(4, [ws(43, true)]))], lex_info(2, [ws(11, true)])), js_case(js_literal(number, [50], lex_info(5, [ws(63, false)])), [js_expr_stat(js_call(js_identifier([112|...], lex_info(6, [...])), js_arguments([js_literal(..., ..., ...)], lex_info(6, [], 6, [])), null)), js_break(lex_info(7, [ws(..., ...)]))], lex_info(5, [ws(55, true)])), js_default([js_expr_stat(js_call(js_identifier([...|...], lex_info(..., ...)), js_arguments([...], lex_info(..., ..., ..., ...)), null)), js_break(lex_info(10, [...]))], lex_info(8, [ws(100, true)]))], lex_info(1, []))

Going the other way

An awesome surprise it's that the switch_if predicate can be used in "the order direction" without any modification. That is, we can specify an if AST and get a switch version of it. Of course this can only be done if the conversion makes sense. For example:

?- parse_js_stat_string("
if (x == 10) {
    print('10 val');
} else if (x == 20) {
    print('20 val');
} else {
    print('None');
}", IfAst), switch_if(SwitchAst, IfAst), print_js_ast_to_string(SwitchAst, SwitchString), writef(SwitchString).
switch(x) {
 case 10:
  print('10 val', );
  break;
 case 20:
  print('20 val', );
  break;
 default:
   print('None', );
  break;
}
IfAst = js_if(js_binary_operation(equals_op,...),
SwitchAst = js_switch(...),
SwitchString = "switch(x) {\n case 10:\n  print('10 val', );\n  break;\n case 20:\n  print('20 val', );\n  break;\n default:\n   print('None', );\n  break;\n}\n" 

Several alternatives for converting between switch and if

One of the most intriguing characteristics of Prolog is the posibility of giving different valid results for a given goal.

For example we can give another posibility to the value_comparison predicate:

value_comparison(js_binary_operation(
                          equals_op,
                          js_identifier(Variable, _),
                          Value,
                          _),
           Variable, Value).
value_comparison(js_binary_operation(
                          equals_op,
                          Value,
                          js_identifier(Variable, _),
                          _),
           Variable, Value).

By doing this what we are saying is that either x == 1 and 1 == x are valid possibilities for the condition of the if statement replacing a case section.

If we do this we can get more alternatives:

?- quick_switch_if_test("switch(w) {
case 1:
   print('first');
   break;
case 2:
   print('second');
   break;
default:
   print('Nothing');
   break;
}").
if (w == 1) {
 print('first', );
} else if (w == 2) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
if (w == 1) {
 print('first', );
} else if (2 == w) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
if (1 == w) {
 print('first', );
} else if (w == 2) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
if (1 == w) {
 print('first', );
} else if (2 == w) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
false.

Code for this post can be found here.

Saturday, December 29, 2018

Using Racklog for parsing

This post shows a small experiment of creating parsing predicates using the Racket's Racklog package . This package enables the use of logic programming features inside Racket.

Logic programming languages, like Prolog, use the Definite clause grammars syntax for this purpose. In this post this technique is not directly used, the goal is to express the grammar in the same way the DCG syntax is expanded. A nice experiment for a future post is to create macros for hiding state change variables.

As always the goal is to experiment with a concept without worrying too much about performance!

1.1 How it looks

Here's an example of a predicate for parsing an if statement for a fictional language.

(define %if-stat
  (%rel (ctxt new-ctxt)
        [(ctxt new-ctxt)
         (%let (ifkw-ctxt lpar-ctxt rpar-ctxt
                cond-ctxt then-ctxt
                stats-ctxt else-ctxt end-ctxt)
               (%and
                ((%literal-id "if")   ctxt ifkw-ctxt) 
                (%lpar                ifkw-ctxt lpar-ctxt)
                (%expression          lpar-ctxt cond-ctxt)
                (%rpar                cond-ctxt rpar-ctxt)
                ((%literal-id "then") rpar-ctxt then-ctxt)
                (%stat                then-ctxt stats-ctxt)
                ((%opt %else-block)   stats-ctxt else-ctxt)
                (%is new-ctxt (with-result
                                `(-if ,(parse-context-result cond-ctxt)
                                      ,(parse-context-result stats-ctxt)
                                      ,(parse-context-result else-ctxt))
                                else-ctxt))))]))

Running this predicate from the REPL using a string with sample code produces the following output:

racklogexperiments.rkt> (parse-context-result
 (cdar
   (%which (result-ctxt)
           (%if-stat (string-context "if (c) then
                                        foo(c);
                                      else
                                         moo(d);")
                 result-ctxt))))
'(-if
  (-id "c")
  (-call-stat (-call "foo" ((-id "c"))))
  (-else (-call-stat (-call "moo" ((-id "d"))))))
racklogexperiments.rkt>

1.2 Creating parsing predicates

An important piece that we need to define our parsers it's the "parsing context". This element is represented as as simple structure which keeps track of :

  1. The result or output of the last parser.
  2. The text used as input for the parser.
  3. The current position inside the string as a zero-based index.
(struct
  parse-context
  (result text position))

(define (string-context str)
        (parse-context '() str 0))

(define (with-result value ctxt)
  (struct-copy parse-context ctxt [result value]))

The way we define parsers using Racklog predicates is that we transform an input parsing context to new context. The new context is going to have a the result of the last parsing operation and it will move the position as many characters as consumed by the last parsing operation.

We use a couple of parsers as the foundation for all must all other parsers. The first one, %a-char has the next available character as the result.

(define %a-char
  (%rel (a-char ctxt new-ctxt)
        [(a-char ctxt new-ctxt)
            (%is #t (has-more-chars? ctxt))
            (%is new-ctxt (get-char-from-context ctxt))
            (%is a-char (parse-context-result new-ctxt))
            ]))

The code for the utility functions used in these parsers is the following:

(define (get-char-from-context ctxt)
  (struct-copy
     parse-context
     ctxt
     [result (string-ref (parse-context-text ctxt) (parse-context-position ctxt))]
     [position (+ 1 (parse-context-position ctxt))]))

(define (has-more-chars? ctx)
   (<
      (parse-context-position ctx)
      (string-length (parse-context-text ctx))))

The get-char-from-context function creates a new context with the current character as result and advances the context to the next position. We use Racklog unification and %and to chain together the parsing contexts. The following interaction illustrates this:

racklogexperiments.rkt> (define result (%which (c1 result-ctxt) 
                                               (%and (%a-char #\a (string-context "abc") c1)
                                                     (%a-char #\b c1 result-ctxt))))
racklogexperiments.rkt> result
'((c1 . #<parse-context>) (result-ctxt . #<parse-context>))
racklogexperiments.rkt> (assoc 'result-ctxt result)
'(result-ctxt . #<parse-context>)
racklogexperiments.rkt> (parse-context-result (cdr (assoc 'result-ctxt result)))
#\b
racklogexperiments.rkt> (parse-context-position (cdr (assoc 'result-ctxt result)))
2
racklogexperiments.rkt>

Here we create a very simple parser that recognizes the sequence: "ab" . As presented above, the position of the resulting parsing context is 2 which is the zero based position inside the string after 'b'.

1.3 Sequences

A very useful parser is one that let's you apply another parser zero or more times. The parser that archives this is the following:

(define %p-simple-sequence
  (λ (pred)
    (%rel (ctxt new-ctxt)
          [(ctxt new-ctxt)
           (%let (tmp-ctxt tmp-result)
                 (%or
                  (%and
                   (pred ctxt tmp-ctxt)
                   (%is tmp-result (with-result                                     
                                     (cons (parse-context-result tmp-ctxt)
                                           (parse-context-result ctxt))
                                     tmp-ctxt))
                   ((%p-simple-sequence pred) tmp-result new-ctxt))
                  (%is new-ctxt ctxt)))])))

(define %p-sequence
  (λ (pred)
    (%rel ( ctxt new-ctxt)
          [(ctxt new-ctxt)
           (%let (tmp-ctxt)
               (%and
                 (%is tmp-ctxt (with-result '() ctxt))
                 ((%p-simple-sequence pred)
                  tmp-ctxt new-ctxt)
                 ))]
          )))

We can apply this parser as follows:

racklogexperiments.rkt> (parse-context-result 
       (cdar (%which (result-ctxt) 
               ((%p-sequence 
                    (%rel (c r) [(c r) (%a-char #\a c r)])) 
                (string-context "aaaa") result-ctxt)))  )
'(#\a #\a #\a #\a)
racklogexperiments.rkt>

1.4 Optionals

Another useful element we need is a way to parse optional elements. We used this in our if example above for the else section.

To implement this we use %or to try to parse the optional parser first or succeed with an empty result. Using this technique will enable multiple solutions (see an example of this below).

(define %opt
  (λ (parser)
    (%rel (ctxt new-ctxt)
          [(ctxt new-ctxt)
           (%let (tmp-ctxt)
                 (%and
                  (%or (parser ctxt new-ctxt)
                       (%is new-ctxt (with-result '() ctxt)))))])))

1.5 Multiple possible ASTs

One interesting possibility of using a Racklog (or Prolog's DCGs) is that you can get many possible interpretations of the grammar. Although it may not be of practical use it looks rather interesting.

An example of these shows up when parsing an if statement with a https://en.wikipedia.org/wiki/Dangling_else .

(define sample-code
  "if (x) then if (y) then foo(); else goo();")

Here there are two possible valid interpretations of this if statement. The default one:

racklogexperiments.rkt> (parse-context-result
 (cdar
  (%which (result-ctxt)
          (%if-stat (string-context sample-code)
                    result-ctxt))))

'(-if
  (-id "x")
  (-if
   (-id "y")
   (-call-stat (-call "foo" ()))
   (-else (-call-stat (-call "goo" ()))))
  ())
racklogexperiments.rkt>

Here's an alternative visualization of this tree:

We can now ask Racklog for another solution using the %more function. See the result here:

racklogexperiments.rkt> (parse-context-result (cdar (%more)))
'(-if
  (-id "x")
  (-if (-id "y") (-call-stat (-call "foo" ())) ())
  (-else (-call-stat (-call "goo" ()))))
racklogexperiments.rkt>

Here's the other alternative visualization:

1.6 Code

The code for this post can be found here.

Thursday, May 11, 2017

A small Tetris-like clone using J and ncurses. Part 2

This is part two of our Tetris-like clone in J series. In this part we're going to see how the ncurses interface was created.

Creating the ncurses UI

To create the user interface we used the ncurses UI using the api/ncurses package. Sadly all interactions with this API makes the code look like straightforward imperative code.

Since we represented the game field using a matrix, we need a way to visualize this matrix. The next snippet shows how we used the wattr_on and mvwprintw functions to print each cell of the game field with the given color.

drawGame=: 3 : 0
matrix =. >{.}.y
win =. >{.y
cols =. }. $ matrix
rows =. {. $ matrix
for_row. i.rows do.
  for_col. i.cols do.
     value =. (<row, col) { matrix
     wattr_on_ncurses_ win;(COLOR_PAIR_ncurses_ (value+1));0
     mvwprintw_1_ncurses_ win; row; (2*col); (('  '))
  end.
end.
)

The game loop handles user interactions and game rules . Here's how it looks:

NB. Game loop
while. 1 do.
   c =. wgetch_ncurses_ vin 
   if. c = KEY_UP_ncurses_ do.
      game =: put_in_matrix (current*_1);k;j;game
      current =. rotate current
      needs_refresh =. 1
   elseif. c = KEY_RIGHT_ncurses_ do.
      game_tmp =. put_in_matrix (current*_1);k;j;game
      if. can_put_in_matrix current;k;(j + 1);game_tmp do.
         game =: game_tmp
         j =. j + 1
         needs_refresh =. 1
      end.
   elseif. c = KEY_LEFT_ncurses_ do.
      game_tmp =. put_in_matrix (current*_1);k;j;game
      if. can_put_in_matrix (current);k;(j - 1);game_tmp do.
         game =: game_tmp
         j =. j - 1
         needs_refresh =. 1
      end.
   elseif. 1 do. 
      if. ((seconds_from_start'') - timestamp) < 0.1 do.
         continue.
      else.
         timestamp =. seconds_from_start'' 
      end.
   
      if. automove = 0 do.
         game =: put_in_matrix (current*_1);k;j;game
         if. can_put_in_matrix (current);(k+1);j;game do.
            k =. k + 1
         else.
           game =: put_in_matrix (current);k;j;game
           k =. 0
           j =. 0
           if. can_put_in_matrix current;k;j;game do.
              current =. (?@$ tetriminos) {:: tetriminos
           else.
             mvwprintw_1_ncurses_ vin; 0; 0; ' Game over '
             nodelay_ncurses_ vin ;'0'
             wgetch_ncurses_ vin 
             exit''
          
           end.
         end.
         automove =. 2
         needs_refresh =. 1
      else.
          automove =. automove - 1
      end.
   end.
   unget_wch_ncurses_ c
   if. needs_refresh do.
      game =: put_in_matrix (current);k;j;game
      game =: remove_full_rows game
      drawGame vin; game
      wrefresh_ncurses_  vin
      needs_refresh =. 0
   end.
end.

The rest of the code is pure ncurses initialization which is not that interesting. Code for this post can be found here: : https://github.com/ldfallas/jcurtris .

Sunday, April 30, 2017

A small Tetris-like clone using J and ncurses. Part 1

For me, the J programming language it's a very intriguing. It is full of ideas and concepts that I'm not familiar with. Getting to know a programming language it's not only about learning the syntax. It is learning the techniques that people use to take advantage of it what gives you more insight . This is particularly true for J.

For me the best way to learn more about a programming language is to try to solve a small problem with it. In this post I'm going to describe an attempt to write a small and incomplete Tetris-like clone using J and the ncurses library. Here's a preview of how it looks:

Tetriminos

According to the Wikipedia page for Tetris, the pieces are named Tetriminos https://en.wikipedia.org/wiki/Tetris#Gameplay. Each piece is composed of blocks. In J we can represent this pieces as matrices.

To create this matrices we use the Shape verb ($) For example:

  • The "L" tetrimino:
   ] l_tetrimino =. 2 3 $ 1 1 1 1 0 0 
1 1 1
1 0 0
  • The "O" tetrimino:
   ] b_tetrimino =. 2 2 $ 4 4 4 4 
4 4
4 4
  • The "S" tetrimino:
   ] s_tetrimino =. 2 3 $ 0 5 5 5 5 0 
0 5 5
5 5 0

Tetrimino rotation

In Tetris pressing the 'up' arrow is going to rotate the current piece. We can use matrix Transpose (|:) and Reverse (|.) verbs and compose them together using the Atop conjunction (@). Here's the definition:

rotate =: |.@|:

Here we can see how this verb works:

   l_tetrimino
1 1 1
1 0 0
   rotate l_tetrimino
1 0
1 0
1 1
   rotate rotate l_tetrimino
0 0 1
1 1 1
   rotate rotate rotate l_tetrimino
1 1
0 1
0 1
   rotate rotate rotate rotate l_tetrimino
1 1 1
1 0 0

We can apply this transformation to the other tetriminos for example:

   ] s_tetrimino =. 2 3 $ 0 5 5 5 5 0 
0 5 5
5 5 0
   rotate s_tetrimino
5 0
5 5
0 5
   rotate rotate s_tetrimino
0 5 5
5 5 0

We use different numbers for each tetrimino so we can use different colors to paint them.

Tetrimino placement

A natural way to model the game is to use a matrix representing the playing field. We use a 10 columns by 20 rows matrix for this effect. We use the Shape verb ($) to do this:

   ] game =:  20 10 $ 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

A fundamental piece of functionality that we need is a way to put a tetrimino inside this matrix. This proved to be tricky (maybe because of lack of J knowledge). We're going to incrementally create this verb.

Reading the J documentation, it seems that we can use the Amend (m } _ _ _) verb to change just a set of cells of the game matrix. Here's an example on how to use this verb

   ] sample =. 5 5 $ 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

   1 2 3 4 (1 1;2 1;1 2;2 2) } sample
0 0 0 0 0
0 1 3 0 0
0 2 4 0 0
0 0 0 0 0
0 0 0 0 0

What we are saying here is that we can to change the following cells in sample:

  • row 1 column 1 with value 1
  • row 2 column 1 with value 2
  • row 1 column 2 with value 3
  • row 2 column 2 with value 4

Now to take advantage of this verb we need to calculate the target coordinates to change the value of a tetrimino. First we start by generating coordinates for each of the cells of the tetrimino.

We're going to use the following predifined values:

   ] l_tetrimino =. 2 3 $ 1 1 1 1 0 0 
1 1 1
1 0 0
   sample
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

We start by determining how many rows and columns. We use the Shape of verb ($) to do this:

   $ l_tetrimino
2 3

Now we want to generate (0, 0);(0, 1);...(1, 2);(2, 2). With the result of Shape of we generate a sequence of numbers for each of the axis. To do this we use the Integers (i.) verbwith the number of rows and the number of columns. For example:

   NB. Get the number of rows:
   (0&{@$) l_tetrimino
2
   NB. Get the number of columns:
   (1&{@$) l_tetrimino
3
   NB. Get an integer sequence from zero to number of rows or columns
   i.(1&{@$) l_tetrimino
0 1 2
   i.(0&{@$) l_tetrimino
0 1

Now this is very cool, we can use the Table verb (/)to pair this two arrays. From the documentation:

In general, each cell of x is applied to the entire of y . Thus x u/ y is equivalent to x u"(lu,_) y where lu is the left rank of u .

This is very important!. To taken advantage of this we need to use the Append verb (,) but changing the rack to operate on each of the elements from the right argument. See this example:

   0 (,"0) 0 1 2
0 0
0 1
0 2

Now we can take advantage of this and write:

   (  (i.@(0&{@$)) (<@,"0)/ (i.@(1&{@$))) l_tetrimino 
+---+---+---+
|0 0|0 1|0 2|
+---+---+---+
|1 0|1 1|1 2|
+---+---+---+

Now this is almost what we need. We can use the Ravel veb (,) to flatten this box:


, (  (i.@(0&{@$)) (<@,"0)/ (i.@(1&{@$))) l_tetrimino
+---+---+---+---+---+---+
|0 0|0 1|0 2|1 0|1 1|1 2|
+---+---+---+---+---+---+

With this positions we can use the Amend verb to change our game matrix:

   (,l_tetrimino) positions } sample
1 1 1 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

We need something else since this technique only allows us to put the tetrimino at the top of the matrix. In order to do this we need to sum the target position to the coordinates that we generated. We can use the powerful Under verb (&.) which allows us to apply an operation to each of the cells of a box.

   (3 2)&+ &.> positions
+---+---+---+---+---+---+
|3 2|3 3|3 4|4 2|4 3|4 4|
+---+---+---+---+---+---+

We construct this operation by:

  1. using Bond conjuction (&) to tie together the position (3 2) with the plus operation (+) . That is (3 2)&+ .
  2. we apply this operation to each of the elements of the box and then assemble the box again. That is &.>

Now we can put the tetrimino in row 3, column 2.

   target_position =. 3 2
   target_position =. 2 1
   (,l_tetrimino) (target_position&+&.>positions) } sample
0 0 0 0 0
0 0 0 0 0
0 1 0 0 0
0 1 1 1 0
0 0 0 0 0

We cannot just pust the tetrimino in the target position. It may also "blend" with existing values. For example say the following game field and the following tetrminio:

   ] game
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 1 0
0 0 0 1 1

positions =., (  (i.@(0&{@$)) (<@,"0)/ (i.@(1&{@$))) tetrimino

   (,tetrimino) ((1 2)&+&.> positions) } game
0 0 0 0 0
0 0 1 1 0
0 0 1 0 0
0 0 1 0 0
0 0 0 1 1

Because of this we need to mix the tetrimino with the current values of the target region. We do this by extracting the values of the target position:

   ($tetrimino) $ ((1 2)&+&.> positions) { game
0 0
0 1
0 1

We can combine this array with the tetrimino and we get the desired target value:

   ] target_tetrimino =. +&tetrimino ($tetrimino) $ ((1 2)&+&.> positions) { game
1 1
1 1
1 1

   (,target_tetrimino) ((1 2)&+&.> positions) } game
0 0 0 0 0
0 0 1 1 0
0 0 1 1 0
0 0 1 1 0
0 0 0 1 1

The final verb definition looks like this:

put_in_matrix =: 3 : 0
 NB. unpack argumets
 tetrimino =. > 0 { y
 i =. > 1 { y
 j =. > 2 { y
 game =. > 3 { y

 NB. calculate target positions 
 positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino

 NB. combine tetrimino with target section
 tetrimino =. +&tetrimino ($tetrimino) $ ((+&(i,j))&.> positions) { game
  
 NB. change matrix
 (,tetrimino) ((+&(i,j))&.> positions)} game
)

Checking if space is available

The other piece of functionality that we need is a way to verify if we can put the tetrimino in a target position. We need to verify two conditions: 1. We can put the tetrimino inside the game field. 2. There's space available in the target position.

To check the boundaries we use common comparison operators:

 NB. Verify field boundaries
 is_inside =. (xpos >: 0) *. (ypos >: 0) *. ( (xpos+tetrimino_width - 1) < game_width) *. ((ypos+tetrimino_height - 1) < game_height)

The second criteria it's more intersesting. To illustrate how we did the detection we're going to start with a predifined game field:

   game
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 0 0
   ] tetrimino =. 2 3 $ 2 0 0 2 2 2
2 0 0
2 2 2

The first step is to reset the values of the tetrimino to be either zero or one:

   ] tetrimino =. 0&< tetrimino
1 0 0
1 1 1

Now we extract the elements of the target position (in this example column 1, row 3)

   ypos =. 3
   xpos =. 1
   positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino
   ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
   
0 0 1
0 0 0
   xpos =. 1
   ypos =. 2
   positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino
   ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
0 0 0
0 0 1

Now we can multiply the tetrimino by the target value:

   target_segment =. ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
   ] hits =. +/ , *&tetrimino target_segment
1

Now the variable hits contains the number of elements with a target cell value. The final predicate looks like this:

can_put_in_matrix =: 3 : 0
 NB. Unpack the arguments
 tetrimino =. 0&< > 0 { y
 tetrimino_width =. 1 { $ tetrimino
 tetrimino_height =. 0 { $ tetrimino
 ypos =. > 1 { y
 xpos =. > 2 { y
 game =. > 3 { y
 game_width  =. 1 { $ game
 game_height =. 0 { $ game

 NB. Verify field boundaries
 is_inside =. (xpos >: 0) *. (ypos >: 0) *. ( (xpos+tetrimino_width - 1) < game_width) *. ((ypos+tetrimino_height - 1) < game_height)

 NB. Check if we hit an occupied cell
 hits =. 0
 if. is_inside do.
   positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino
   hits =. +/ , *&tetrimino ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
 end.

 is_inside *. (hits = 0)
)

End words

As it was said in the beginning, J it's very interesting. For me there are many things to learn (you can tell that by looking at all those parenthesis in some expressions). Also there are many strategies in array languages that one needs to understand in other to write idiomatic code.

The ncurses interface will be discussed in part 2. For future posts it will be interesting to talk about concepts like the obverse (http://www.jsoftware.com/help/jforc/u_1_uv_uv_and_u_v.htm#_Toc191734413) and state machines (http://www.jsoftware.com/help/jforc/loopless_code_vii_sequential.htm#_Toc191734470) .

Code for this post can be found here: https://github.com/ldfallas/jcurtris

Monday, April 10, 2017

A simple language with indentation based blocks. Part 4: Improving error messages

Going back to our first post, we defined the type of a parser function as :

ReaderState ->  (('a * ReaderState) option )

Which means that a parser is a function from a reader state to an Option<'a> of a value and a new reader state.

Using Option<'a> is handy for writing code that may result on a value or a failure indicator. In our case it's possible that the parser fails to recognize its input. The most common example it's a program with syntax errors. For example the following program is incorrect:

if cond1:
   if x y z:
      print(x)

A parser failure may also be something we expected. This is the case when need to use failure to choose a parser from a list of possible parsers. For example the expression parser:

   pStatements := [pReturn; ifParser; pCallStatement; pAssignStatement; whileParser]

   let pStatement =
       fun state -> (List.reduce disjParser !pStatements) state

Using the bultin Option<'a> type comes handy, but it has the problem that it doesn't provide information about the parsing failure. When a parser cannot recognize some input string, the only response we get is None. That's why we introduced the ParsingResult<'a> type to replace the Option<'a> type. Here's the definition:

type ReaderState = { Data:        string;
                     Position:    int;
                     Line:        int;
                     Indentation: int list }

type ParserFailure =
    | Fatal of string * int
    | Fail

   
type ParsingResult<'a> =
     | Success of ('a * ReaderState)
     | Failure of ParserFailure

We still have two possible states: Success and Failure. However now we have a space to add more information about a syntax error . In this case Fatal has a couple of slots to specify an error message and a line number.

Failure information

The Failure has a parameter of type ParserFailure. This type is defined as follows:

   type ParserFailure =
       | Fatal of string * int
       | Fail

We use this two alternatives to represent the scenarios described at the beginning of this post:

  • Fatal : a syntax error in the input string
  • Fail : a failure to completely recognize something

We use Fatal for scenarios such as the following program which has a syntax error in the while condition ( x y ):

while x y:
   print(1)

This is the definition of the whileParser:

   let whileParser =
        whileKeyword    >>
        pExpression     >>= (fun condition ->
        colon           >>
        pBlock          >>= (fun block ->
        preturn (PWhile(condition, block))))

We can say that any failure after the while keyword is a fatal failure. However a failure detecting the while keyword is a simple failure . In order to represent this situation we're going to introduce the +>> which is going to sequence two parsers and produce a fatal failure in case that the first parser fails. The definition of this operation looks like this:

   let inline (+>>)  (parser1 : (ReaderState ->  ParsingResult<'a>))
                     (parser2 : (ReaderState ->  ParsingResult<'b>)) =
       fun input -> match (parser1 input) with
                    | Success (_, restState) -> parser2 restState
                    | Failure(f & Fatal(_, _)) -> Failure(f)
                    | Failure(_) -> Failure(Fatal("Parsing error ", input.Line))

   let inline (+>>=) (parser1 : (ReaderState ->  ParsingResult<'a>))
                     (parser2N : ('a ->  (ReaderState ->  ParsingResult<'b>))) =
       fun input -> match (parser1 input) with
                    | Success (matchedTxt, restState) -> (parser2N matchedTxt) restState
                    | Failure(f & Fatal(_)) -> Failure(f)
                    | Failure(_) -> Failure(Fatal("Parse problem ", input.Line))                    

Now with this operator we can modify the definition of whileParser to be as follows:

   let whileParser =
        whileKeyword    >>
        pExpression     +>>= (fun condition ->
        colon           +>>
        pBlock          +>>= (fun block ->
        preturn (PWhile(condition, block))))

Example

Now we can see the result of parsing a file:

> parse "if x:
- 
-       while x y:
-           print(1)
- " pStatement;;
val it : ParsingResult<PStat> = Failure (Fatal ("Parsing error ",3))

Expected failure

Now we also we parser failures for choosing from a set of possible parsers. We do that in the disjParser which chooses between two parsers. The old definition of the disjParser looks like this:

let disjParser (parser1 : (ReaderState ->  (('a * ReaderState) option )))
               (parser2 : (ReaderState ->  (('a * ReaderState) option ))) =
    fun input -> match (parser1 input) with
                 | result & Some _ -> result
                 | _ -> parser2 input

Notice that this definition uses the Option<'a> cases (Some and None) to determine if it succeeds or if it needs to give a chance to parser2. With our new ParserResult<'a> type we need to make a distinction between fatal and a "controlled" failure. We can now change the definition of this parser to be:

let disjParser (parser1 : (ReaderState ->  ParsingResult<'a>))
               (parser2 : (ReaderState ->  ParsingResult<'a>)) =
    fun input -> match (parser1 input) with
                 | success & Success(_) -> success
                 | Failure(fatal & Fatal(_, _)) -> Failure(fatal)
                 | _ -> parser2 input

Next steps

In the next post we're going to deal with code comments.

This is part #4 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html and a GitHub repository with the code can be found there: https://github.com/ldfallas/fsharpparsing.

Saturday, March 25, 2017

A simple language with indentation based blocks. Part 3: Blocks

In this post we're going to add support for statements containing blocks which is the main goal of these series of posts.

Identifying blocks by using indentation

The technique we're going to use to identify blocks is based the following description from the (excellent) Python documentation:

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line's indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.

This section was taken from: https://docs.python.org/3/reference/lexical_analysis.html#indentation

The IDENT and DEDENT tokens act as begin/end block indicators. They have the same function as '{' and '}' in C-based languages. The following Python grammar fragment shows how this tokens are used.

...
suite: simple_stmt | NEWLINE IDENT stmt+ DEDENT
...
if_stmt: 'if' test ':' suite ...
...

Since our little parser uses parsing combinators without a tokenizer, we need to adjust this strategy a little bit. We're going to jump right ahead to show how we define the parser for the if statement.

   let pBlock =
       newlines   >>
       indent     >>
       pStatement >>= (fun firstStat ->
       (zeroOrMore
           (newlines  >>
            indented  >>
            pStatement) [])  >>= (fun restStats ->
       dedent      >> preturn (firstStat::restStats)))

   let ifParser =
       ifKeyword   >>
       pExpression >>= (fun expr ->
       colon       >>
       pBlock      >>= (fun block ->
       (optionalP pElse []) >>= (fun optElseBlockStats ->
                     preturn (PIf(expr,
                                  block,
                                  (match optElseBlockStats with
                                   | [] -> None
                                   | elements -> Some elements))))))

Here's an example of the generated AST for a simple if statement:

> parse "if x:
-           print(x)
-           return x*x
- " ifParser;;
val it : (PStat * ReaderState) option =
  Some
    (PIf
       (PSymbol "x",
        [PCallStat (PCall ("print",[PSymbol "x"]));
         PReturn (PBinaryOperation (Times,PSymbol "x",PSymbol "x"))],null),
     {Data = "if x:
          print(x)
          return x*x
";
      Position = 45;
      Indentation = [0];})

The key element here is the definition of the pBlock parser. In the definition of this parser we try to emulate the same strategy as the Python grammar definition:

   let pBlock =
       newlines   >>
       indent     >>
       pStatement >>= (fun firstStat ->
       (zeroOrMore
           (newlines  >>
            indented  >>
            pStatement) [])  >>= (fun restStats ->
       dedent      >> preturn (firstStat::restStats)))

We define indent, dedent and indented as follows:

   let indentation =
          pGetIndentation >>=(fun indentation ->                          
                (readZeroOrMoreChars (fun c -> c = ' '))
                >>=
                (fun spaces ->
                   match (spaces.Length,
                          indentation) with
                   | (lessThan, top::_) when lessThan > top ->
                       (pSetIndentation lessThan) >> (preturn "INDENT")
                   | (lessThan, top::_) when lessThan = top ->
preturn "INDENTED"
                   | (identifiedIndentation, top::rest) ->
                          (pSetFullIndentation rest) >> (preturn "DEDENT")
                   | _ -> pfail))

   let indent = indentation >>= (fun result -> if result = "INDENT" then preturn result else pfail)
   let dedent = indentation >>= (fun result -> if result = "DEDENT" then preturn result else pfail)
   let indented = indentation >>= (fun result -> if result = "INDENTED" then preturn result else pfail)

We define each of these indentation elements as:

  • INDENTED verifies that the expected indentation level is present. It consumes this indentation. For example, if expecting 3 space indentation, it will consume and verify that there are three spaces from the current position.
  • INDENT Verifies that there's a new indentation level can be found from the current position. It changes the 'Indentation' value in the current reader state.
  • DEDENT decreases the expected indentation level from the reader.

We can test these combinators to see how they affect the state of the parser.

We start with the following fragment:

> let code = "if x:
  if y:
     return 1
  else:
     return 2
"

We can test parts of our complex parsers by specifying just pieces of others. For example let's get to the then part of the parser.

> parse code (ifKeyword >> pExpression >> colon >> newlines >> indent);;
val it : (string * ReaderState) option =
  Some
    ("INDENT", {Data = "if x:
  if y:
     return 1
  else:
     return 2
";
                Position = 8;
                Indentation = [2; 0];})

As you can see, after executing these parsers we get to position 8. For example:

> code.Substring(8);;
val it : string = "if y:
     return 1
  else:
     return 2
"

Also the indentation stack now has 2 and 0. If we execute these series of parsers again to get to the then section of the first nested if we get:

> parse code (ifKeyword >> pExpression >> colon >> newlines >> indent >>
-             ifKeyword >> pExpression >> colon >> newlines >> indent);;
val it : (string * ReaderState) option =
  Some
    ("INDENT", {Data = "if x:
  if y:
     return 1
  else:
     return 2
";
                Position = 19;
                Indentation = [5; 2; 0];})

Now we have three indentation levels on our stack and we got to position 19.

Blocks

Just as the Python grammar reuses suite production, we can reuse the pBlock parser to define other kinds of statements. For example we can define the while statement as follows:

   let whileParser =
        whileKeyword    >>
        pExpression     >>= (fun condition ->
        colon           >>
        pBlock          >>= (fun block ->
        preturn (PWhile(condition, block))))

Now we can define more interesting statements such as:

val code : string = "if x > 0:
  while x < 10:
    print(x)
    x := x + 1
"

> parse code pStatement;;
val it : (PStat * ReaderState) option =
  Some
    (PIf
       (PBinaryOperation (Gt,PSymbol "x",PNumber "0"),
        [PWhile
           (PBinaryOperation (Lt,PSymbol "x",PNumber "10"),
            [PCallStat (PCall ("print",[PSymbol "x"]));
             PAssignStat
               (PSymbol "x",PBinaryOperation (Plus,PSymbol "x",PNumber "1"))])],
        null),
     {Data = "if x > 0:
  while x < 10:
    print(x)
    x := x + 1
";
      Position = 53;
      Indentation = [0];})

Next steps

Now that we can define blocks we can focus in other parts of our little language parser. One thing we can definetly improve is error messages. For example, let's try to parse the following fragment:

> parse "notanif x y z" ifParser;;
val it : (PStat * ReaderState) option = None

Here's where the Option<'a> type is not that useful since it doesn't allow you to specify a failure value such as a error message or location.

Another thing that we need to implement is comments. Our parser combinators work directly with the text. Because of this it will be interesting to see to introduce comments support without changing all our parsers.

This is part #3 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html.

Sunday, February 26, 2017

A simple language with indentation based blocks. Part 2: Expressions

The focus of the second part will be expression parsing. The desired grammar for the expression small language experiment is the following (here in pseudo BNF):

NESTED_EXPRESSION = '(' EXPRESSION ')'

PRIMARY_EXPRESSION = SYMBOL | NUMBER | STRING | NESTED

UNARY_EXPRESSION =  NOT_EXPRESSION | CALL_EXPRESSION | ARRAY_ACCESS_EXPRESSION |
                    PRIMARY_EXPRESSION

MULTIPLICATIVE_EXPRESSION = UNARY_EXPRESSION ( ('*' | '/') UNARY_EXPRESSION)*

ADDITIVE_EXPRESSION = MULTIPLICATIVE_EXPRESSION ( ('*' | '/') MULTIPLICATIVE_EXPRESSION)*

RELATIONAL_EXPRESSION = ADDITIVE_EXPRESSION ( ('<' | '>') ADDITIVE_EXPRESSION)*

EQUALITY_EXPRESSION = RELATIONAL_EXPRESSION ( ('=' | '<>') RELATIONAL_EXPRESSION)*

LOGICAL_OR_EXPRESSION = EQUALITY_EXPRESSION  ('||'  EQUALITY_EXPRESSION)* 

LOGICAL_AND_EXPRESSION = LOGICAL_OR_EXPRESSION ('&&'  LOGICAL_OR_EXPRESSION)* 

EXPRESSION = LOGICAL_AND_EXPRESSION

In this post we're going to build some key techniques for creating the code for parsing a subset of this grammar. Not every element of expression grammar will be presented since it gets repetitive at some point.

For this small experiment we're going to use the following F# types as the AST of the program:

type Operator =
    | Plus
    | Minus
    | Times
    | Div
    | And
    | Or
    | Equal
    | NotEqual
    | Assign
    | Lt
    | Gt
    

type PExpr =
    | PSymbol  of string
    | PString  of string
    | PNumber  of string
    | PBoolean of bool
    | PNot     of PExpr
    | PCall    of string * (PExpr list)
    | PNested  of PExpr
    | PArrayAccess of PExpr * PExpr
    | PBinaryOperation of Operator * PExpr * PExpr

Simple atomic expressions

We're going to start with the atomic expressions:

  • Symbols or variables (ex. x,y,z)
  • Number literals (ex 1, 1.2)
  • String literals (ex. "Hello")

We already got these parsers from the previous post:

   let pSymbol =
       concatParsers2
          (readWithConditionOnChar  (fun c -> System.Char.IsLetter(c, 0)))
          (fun initialChar ->
               concatParsers2
                  (readZeroOrMoreChars (fun c -> System.Char.IsLetter(c) || System.Char.IsDigit(c)))
                  (fun suffixString -> (preturn (PSymbol (initialChar + suffixString))))
           )


   let pNumber =
              ( (optionalP (readSpecificChar '-') "") >>= (fun neg -> 
                digitP  >>= (fun firstChar -> 
                (readZeroOrMoreChars (fun c ->  System.Char.IsDigit(c))) >>= (fun chars ->
                (optionalP decimalPartP "") >>= (fun dec ->                                                                               
                preturn (PNumber (neg + firstChar + chars + dec)))))))


   let pString =
       whitespace >>
       readSpecificChar '"' >>
       readZeroOrMoreStringChars (fun previous current ->
                 (previous = '\\' && current = '"') || current <> '"')
       >>= (fun stringContents ->
            readSpecificChar '"' >> (preturn (PString stringContents)))

We can call these expressions "primary expressions", which are used in conjunction with operators to create more complex elements. We need a parser that recognizes any one these elements. We can use the disjParser from the previous post :

> let primaryExpression = (disjParser (disjParser pSymbol pNumber) pString);;

val primaryExpression : (ReaderState -> (PExpr * ReaderState) option)

> parse "10" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
  Some (PNumber "10", {Data = "10";
                       Position = 2;
                       Indentation = [0];})
> parse "x" myPrimaryExpression;; 
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x", {Data = "x";
                      Position = 1;
                      Indentation = [0];})
> parse "\"hello\"" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
  Some (PString "hello", {Data = ""hello"";
                          Position = 7;
                          Indentation = [0];})

We can use List.reduce function to improve this code since we could have several parsers as the primary expression.

> let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]  ;;

val myPrimaryExpression : (ReaderState -> (PExpr * ReaderState) option)

> parse "10" myPrimaryExpression
- ;;
val it : (PExpr * ReaderState) option =
  Some (PNumber "10", {Data = "10";
                       Position = 2;
                       Indentation = [0];})
> parse "x1" myPrimaryExpression 
- ;; 
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x1", {Data = "x1";
                       Position = 2;
                       Indentation = [0];})

Binary operations

Binary expressions are composed of two expressions and an operator which connects them. For example: a + b . For these operations we can define the following utility parser creation function:

   let buildExpressions (leftExpr:PExpr) (rightExprs:(Operator * PExpr) list) =
       (List.fold (fun left (op,right) -> PBinaryOperation(op, left, right)) leftExpr rightExprs)

   let pBinaryExpression operators lowerLevelElementParser  =
       lowerLevelElementParser
         >>= (fun leftTerm ->
                 (zeroOrMore
                     (operators
                        >>= (fun op ->
                               lowerLevelElementParser >>= (fun expr -> preturn (op, expr))))
                     [])
                   >>= (fun acc -> preturn (buildExpressions leftTerm acc) ))

This function allows us to build parsers for binary expressions . The same expressed in pseudo BNF may look like this:

   pBinaryExpression = lowerLevelElementParser ( operators  lowerLevelElementParser )*

Which means that we'll like to recognize an element with lowerLevelElementParser and zero or more pairs of an element recognized by operators followed by another lowerLevelElementParser. What looks strange at first is the use of zero or more which implies that this parser can recognize just one element with lowerLevelElementParser. This will be useful in the following section when we try to implement operator precedence.

As an example we can define a parser for plus as follows:

> let identifyOperator operatorChar operatorResult =
-        concatParsers
-           whitespaceNoNl
-           ((readSpecificChar operatorChar) >> (preturn operatorResult))
- ;;

val identifyOperator :
  operatorChar:char ->
    operatorResult:'a -> (ReaderState -> ('a * ReaderState) option)
          
> let plusOperator = identifyOperator '+' Plus;;

val plusOperator : (ReaderState -> (Operator * ReaderState) option)

> let myPlus = pBinaryExpression plusOperator myPrimaryExpression;;

val myPlus : (ReaderState -> (PExpr * ReaderState) option)

> parse "1+2" myPlus;;
val it : (PExpr * ReaderState) option =
  Some (PBinaryOperation (Plus,PNumber "1",PNumber "2"), {Data = "1+2";
                                                          Position = 3;
                                                          Indentation = [0];})

Arithmetic operations

Now we're going to add support for basic arithmetic operations: +, -, /, *. By using the pBinaryExpression utility function defined in the previous section we can preserve operator precedence. To do this we define the operations as follows:

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

What we say here is that an additive expression is composed of addition or subtraction (disjParser plusOperator minusOperator) of multiplicative expressions. Also we said that a multiplicative expression is composed of multiplication or division (disjParser pDivOperator pTimesOperator) of primary expressions.

An example of using these operators looks like this:

> parse "1 * 2 + 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Plus,PBinaryOperation (Times,PNumber "1",PNumber "2"),PNumber "3"),
     {Data = "1 * 2 + 3";
      Position = 9;
      Indentation = [0];})
> parse "1 + 2 * 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Plus,PNumber "1",PBinaryOperation (Times,PNumber "2",PNumber "3")),
     {Data = "1 + 2 * 3";
      Position = 9;
      Indentation = [0];})

Recursion

One thing that was tricky to implement at first was recursion. At some point in the grammar we need to refer back to the top parser . For example a parentheses expression can have any other expression and its recognized as a primary expression. For example:

1 * (2 + 3)

Needs to be parsed as:

     *
    / \
   /   \
  1  (2+3)
       |
       +
      / \
     2   3

What we need to do is to define a top level pExpression parser used for recognizing all our expressions . This parser will be used to define the parenthesis or nested expression. At this point our grammar written using pseudo BNF looks like this:

pPrimaryExpression = pNumber | pSymbol | pString

pAdditiveExpression = pPrimaryExpression ( (plusOperator |  minusOperator) pPrimaryExpression )*

pMultiplicativeExpression = pAdditiveExpression ( (divOperator |  timesOperator) pPrimaryExpression )*

pTopExpression = pMultiplicativeExpression

In F# this using our set of combinators it looks like:

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

let pTopExpression = pAdditiveExpression

We can use pTopExpression to parse any expression:

> parse "3+10*x-1/32" pTopExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Minus,
        PBinaryOperation
          (Plus,PNumber "3",PBinaryOperation (Times,PNumber "10",PSymbol "x")),
        PBinaryOperation (Div,PNumber "1",PNumber "32")),
     {Data = "3+10*x-1/32";
      Position = 11;
      Indentation = [0];})
> parse "x" pTopExpression;;          
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x", {Data = "x";
                      Position = 1;
                      Indentation = [0];})

But now we want to use pTopExpression to define one of our primary expressions:

//WARNING THE FOLLOWING CODE WILL NOT COMPILE
let readLPar =
       concatParsers whitespace (readSpecificChar '(')
let readRPar = readSpecificChar ')'

let pNested = readLPar    >>
              pTopExpression >>= (fun expr ->
              readRPar    >>  (preturn (PNested expr)))

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

let pTopExpression = pAdditiveExpression

When we try to compile this code we get the following error:

                pTopExpression >>= (fun expr ->
  --------------^^^^^^^^^^^^^^

/dir/stdin(6,15): error FS0039: The value or constructor 'pTopExpression' is not defined

Which is totally correct, pTopExpression is defined after pNested. It has to because pTopExpression is defined using pAdditiveExpression has a dependency on pPrimaryExpression. Until now we only used function composition to create our parsers.

What we want to do is to define:

   number symbol string nested
     ^      ^      ^     ^  |
     |      |      |     |  |
     |      |      |     |  |
     +------+------+-----+  |
            |               |
            |               |
         primary            |
            |               |
            |               |
        additive            |
            |               |
            |               |
      multiplicative        |
            |               |
            |               |
           top  <-----------+

To solve this problem we're going to take advantage of reference variables in F#. And do the following trick:

let pTopExpressions = ref []

let pTopExpression =
       fun state -> (List.reduce disjParser !pTopExpressions) state


let pNested = readLPar    >>
              pTopExpression >>= (fun expr ->
              readRPar    >>  (preturn (PNested expr)))

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression


pTopExpressions := [pAdditiveExpression]

Using a reference (ref) allows us to create the recursive parser that we need . Here's an example of using it:

> parse "1*(2+3)" pTopExpression;; 
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Times,PNumber "1",
        PNested (PBinaryOperation (Plus,PNumber "2",PNumber "3"))),
     {Data = "1*(2+3)";
      Position = 7;
      Indentation = [0];})

Next steps

Using the same techniques presented so far we can finish our basic grammar. This is part #2 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html.