Going back to our first post, we defined the type of a parser function as :
ReaderState -> (('a * ReaderState) option )
Which means that a parser is a function from a reader state to an Option<'a>
of a value and a new reader state.
Using Option<'a>
is handy for writing code that may result on a value or a failure indicator. In our case it's possible that the parser fails to recognize its input. The most common example it's a program with syntax errors. For example the following program is incorrect:
if cond1:
if x y z:
print(x)
A parser failure may also be something we expected. This is the case when need to use failure to choose a parser from a list of possible parsers. For example the expression parser:
pStatements := [pReturn; ifParser; pCallStatement; pAssignStatement; whileParser]
let pStatement =
fun state -> (List.reduce disjParser !pStatements) state
Using the bultin Option<'a>
type comes handy, but it has the problem that it doesn't provide information about the parsing failure. When a parser cannot recognize some input string, the only response we get is None
. That's why we introduced the ParsingResult<'a>
type to replace the Option<'a>
type. Here's the definition:
type ReaderState = { Data: string;
Position: int;
Line: int;
Indentation: int list }
type ParserFailure =
| Fatal of string * int
| Fail
type ParsingResult<'a> =
| Success of ('a * ReaderState)
| Failure of ParserFailure
We still have two possible states: Success
and Failure
. However now we have a space to add more information about a syntax error . In this case Fatal
has a couple of slots to specify an error message and a line number.
Failure information
The Failure
has a parameter of type ParserFailure
. This type is defined as follows:
type ParserFailure =
| Fatal of string * int
| Fail
We use this two alternatives to represent the scenarios described at the beginning of this post:
Fatal
: a syntax error in the input stringFail
: a failure to completely recognize something
We use Fatal
for scenarios such as the following program which has a syntax error in the while
condition ( x y
):
while x y:
print(1)
This is the definition of the whileParser
:
let whileParser =
whileKeyword >>
pExpression >>= (fun condition ->
colon >>
pBlock >>= (fun block ->
preturn (PWhile(condition, block))))
We can say that any failure after the while
keyword is a fatal failure. However a failure detecting the while
keyword is a simple failure . In order to represent this situation we're going to introduce the +>>
which is going to sequence two parsers and produce a fatal failure in case that the first parser fails. The definition of this operation looks like this:
let inline (+>>) (parser1 : (ReaderState -> ParsingResult<'a>))
(parser2 : (ReaderState -> ParsingResult<'b>)) =
fun input -> match (parser1 input) with
| Success (_, restState) -> parser2 restState
| Failure(f & Fatal(_, _)) -> Failure(f)
| Failure(_) -> Failure(Fatal("Parsing error ", input.Line))
let inline (+>>=) (parser1 : (ReaderState -> ParsingResult<'a>))
(parser2N : ('a -> (ReaderState -> ParsingResult<'b>))) =
fun input -> match (parser1 input) with
| Success (matchedTxt, restState) -> (parser2N matchedTxt) restState
| Failure(f & Fatal(_)) -> Failure(f)
| Failure(_) -> Failure(Fatal("Parse problem ", input.Line))
Now with this operator we can modify the definition of whileParser
to be as follows:
let whileParser =
whileKeyword >>
pExpression +>>= (fun condition ->
colon +>>
pBlock +>>= (fun block ->
preturn (PWhile(condition, block))))
Example
Now we can see the result of parsing a file:
> parse "if x:
-
- while x y:
- print(1)
- " pStatement;;
val it : ParsingResult<PStat> = Failure (Fatal ("Parsing error ",3))
Expected failure
Now we also we parser failures for choosing from a set of possible parsers. We do that in the disjParser
which chooses between two parsers. The old definition of the disjParser
looks like this:
let disjParser (parser1 : (ReaderState -> (('a * ReaderState) option )))
(parser2 : (ReaderState -> (('a * ReaderState) option ))) =
fun input -> match (parser1 input) with
| result & Some _ -> result
| _ -> parser2 input
Notice that this definition uses the Option<'a>
cases (Some
and None
) to determine if it succeeds or if it needs to give a chance to parser2
. With our new ParserResult<'a>
type we need to make a distinction between fatal and a "controlled" failure. We can now change the definition of this parser to be:
let disjParser (parser1 : (ReaderState -> ParsingResult<'a>))
(parser2 : (ReaderState -> ParsingResult<'a>)) =
fun input -> match (parser1 input) with
| success & Success(_) -> success
| Failure(fatal & Fatal(_, _)) -> Failure(fatal)
| _ -> parser2 input
Next steps
In the next post we're going to deal with code comments.
This is part #4 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html and a GitHub repository with the code can be found there: https://github.com/ldfallas/fsharpparsing.