Monday, April 10, 2017

A simple language with indentation based blocks. Part 4: Improving error messages

Going back to our first post, we defined the type of a parser function as :

ReaderState ->  (('a * ReaderState) option )

Which means that a parser is a function from a reader state to an Option<'a> of a value and a new reader state.

Using Option<'a> is handy for writing code that may result on a value or a failure indicator. In our case it's possible that the parser fails to recognize its input. The most common example it's a program with syntax errors. For example the following program is incorrect:

if cond1:
   if x y z:

A parser failure may also be something we expected. This is the case when need to use failure to choose a parser from a list of possible parsers. For example the expression parser:

   pStatements := [pReturn; ifParser; pCallStatement; pAssignStatement; whileParser]

   let pStatement =
       fun state -> (List.reduce disjParser !pStatements) state

Using the bultin Option<'a> type comes handy, but it has the problem that it doesn't provide information about the parsing failure. When a parser cannot recognize some input string, the only response we get is None. That's why we introduced the ParsingResult<'a> type to replace the Option<'a> type. Here's the definition:

type ReaderState = { Data:        string;
                     Position:    int;
                     Line:        int;
                     Indentation: int list }

type ParserFailure =
    | Fatal of string * int
    | Fail

type ParsingResult<'a> =
     | Success of ('a * ReaderState)
     | Failure of ParserFailure

We still have two possible states: Success and Failure. However now we have a space to add more information about a syntax error . In this case Fatal has a couple of slots to specify an error message and a line number.

Failure information

The Failure has a parameter of type ParserFailure. This type is defined as follows:

   type ParserFailure =
       | Fatal of string * int
       | Fail

We use this two alternatives to represent the scenarios described at the beginning of this post:

  • Fatal : a syntax error in the input string
  • Fail : a failure to completely recognize something

We use Fatal for scenarios such as the following program which has a syntax error in the while condition ( x y ):

while x y:

This is the definition of the whileParser:

   let whileParser =
        whileKeyword    >>
        pExpression     >>= (fun condition ->
        colon           >>
        pBlock          >>= (fun block ->
        preturn (PWhile(condition, block))))

We can say that any failure after the while keyword is a fatal failure. However a failure detecting the while keyword is a simple failure . In order to represent this situation we're going to introduce the +>> which is going to sequence two parsers and produce a fatal failure in case that the first parser fails. The definition of this operation looks like this:

   let inline (+>>)  (parser1 : (ReaderState ->  ParsingResult<'a>))
                     (parser2 : (ReaderState ->  ParsingResult<'b>)) =
       fun input -> match (parser1 input) with
                    | Success (_, restState) -> parser2 restState
                    | Failure(f & Fatal(_, _)) -> Failure(f)
                    | Failure(_) -> Failure(Fatal("Parsing error ", input.Line))

   let inline (+>>=) (parser1 : (ReaderState ->  ParsingResult<'a>))
                     (parser2N : ('a ->  (ReaderState ->  ParsingResult<'b>))) =
       fun input -> match (parser1 input) with
                    | Success (matchedTxt, restState) -> (parser2N matchedTxt) restState
                    | Failure(f & Fatal(_)) -> Failure(f)
                    | Failure(_) -> Failure(Fatal("Parse problem ", input.Line))                    

Now with this operator we can modify the definition of whileParser to be as follows:

   let whileParser =
        whileKeyword    >>
        pExpression     +>>= (fun condition ->
        colon           +>>
        pBlock          +>>= (fun block ->
        preturn (PWhile(condition, block))))


Now we can see the result of parsing a file:

> parse "if x:
-       while x y:
-           print(1)
- " pStatement;;
val it : ParsingResult<PStat> = Failure (Fatal ("Parsing error ",3))

Expected failure

Now we also we parser failures for choosing from a set of possible parsers. We do that in the disjParser which chooses between two parsers. The old definition of the disjParser looks like this:

let disjParser (parser1 : (ReaderState ->  (('a * ReaderState) option )))
               (parser2 : (ReaderState ->  (('a * ReaderState) option ))) =
    fun input -> match (parser1 input) with
                 | result & Some _ -> result
                 | _ -> parser2 input

Notice that this definition uses the Option<'a> cases (Some and None) to determine if it succeeds or if it needs to give a chance to parser2. With our new ParserResult<'a> type we need to make a distinction between fatal and a "controlled" failure. We can now change the definition of this parser to be:

let disjParser (parser1 : (ReaderState ->  ParsingResult<'a>))
               (parser2 : (ReaderState ->  ParsingResult<'a>)) =
    fun input -> match (parser1 input) with
                 | success & Success(_) -> success
                 | Failure(fatal & Fatal(_, _)) -> Failure(fatal)
                 | _ -> parser2 input

Next steps

In the next post we're going to deal with code comments.

This is part #4 of an ongoing series of posts on building a parser for a small language. The first post can be found here: and a GitHub repository with the code can be found there: