Monday, October 20, 2008

Using F# computation expressions to read binary files

In this post I'm going to show a little experiment of using F# computation expression syntax to build a binary file reader.

Computation expressions



F# Computation expressions provides a way to override the interpretation of a subset of language constructs(for example let,for,while) in a given context. This feature provides similar functionality to Haskell's do-notation or LINQ's query syntax.

Two excellent resources to learn about this feature are: Don Syme's "Some Details on F# Computation Expressions" and Robert Pickering's "Beyond Foundations of F# - Workflows".

As described by Don Syme, this feature is related to Monads:


... Likewise the kinds of operations used under the hood are much like the operations used in both LINQ and Haskell monads. Indeed, computation expressions can be seen as a general monadic syntax for F#.


There's a lot of nice documentation available on the net about this topic.

Monads and Parsers



While learning this feature I decided to create a little example based on the monadic parser combinators technique. The "Monadic Parsing in Haskell" paper by Graham Hutton and Erik Meijer provides a nice introduction to this topic.

There are some blog posts that talk about parser combinators in F#, for example the Inside F# blog has a nice post called Monadic parser combinators... in F#. Also Harry Pierson wrote a nice post called Monadic Philosophy Part 4 - The Parser Monad in F# treating this topic. Finally there's an adaptation of the Parsec library to F# called FParsec.

This time I wanted to create something that helps reading the content of a binary file.


Binary parser



The binary parser definition:



type ParseResult<'a> =
| Success of 'a * BinaryReader
| Failure of int64 * string

type BinParser<'a> =
| BinParser of (BinaryReader -> ParseResult<'a>)
with
member this.Function =
match this with
BinParser pFunc -> pFunc

end


Notice that I'm using a BinaryReader as the input instead of a byte list or array. By doing this I'm sacrificing flexibility (for example when creating a choice) for some useful methods included in BinaryReader.


The following code shows some basic binary parsers for bytes,int16 and int32 values.


let IOExceptionHandlingWrapper(f:BinaryReader -> ParseResult<'a>) =
fun i -> try
f(i)
with
(e & :? IOException ) -> Failure(i.BaseStream.Position,e.Message)


let RByte =
BinParser(IOExceptionHandlingWrapper(
fun (i:BinaryReader) -> Success(i.ReadByte(),i)))

let RShort =
BinParser(IOExceptionHandlingWrapper(
fun (i:BinaryReader) -> Success(i.ReadInt16(),i)))

let RInt =
BinParser(IOExceptionHandlingWrapper(
fun (i:BinaryReader) -> Success(i.ReadInt32(),i)))


Notice that the IOExceptionHandlingWrapper handles the case of a IOException and makes the parser fail.

An additional parser for reading an expected byte is the following:


let AByte(b:byte) =
BinParser(IOExceptionHandlingWrapper(
fun (i:BinaryReader) ->
let rB = i.ReadByte() in
if (rB = b) then
Success(byte(rB),i)
else
Failure(i.BaseStream.Position,
System.String.Format("Expecting {0}, got {1}",b,rB))))


Another useful parser let's you read a fixed sequence of elements recognized by another parser.


let ParsingStep (func:'a -> BinParser<'b>) (accumulatedResult:ParseResult<'b list>) currentSeqItem =
match accumulatedResult with
| Success(result,inp) ->
match ((func currentSeqItem).Function inp) with
| Success(result2,inp2) -> Success(result2::result,inp2)
| Failure(offset,description) -> Failure(offset,description)
| Failure(offset,description) -> Failure(offset,description)


let FixedSequence (s:seq<'b>,parser:BinParser<'a>) =
BinParser(fun i ->
match (Seq.fold (ParsingStep (fun _ -> parser)) (Success([],i)) s) with
| Success(result,input) -> Success(List.rev(result),input)
| Failure(offset,description) -> Failure(offset,description))


A fixed sequence of elements is recognized. The number of elements is given by the seq instance.

Finally the definitions required to use the computation expression are the following:


type BinParserBuilder() =
member this.Bind(p:BinParser<'a>,rest:'a -> BinParser<'b>) : BinParser<'b> =
BinParser(fun i -> match p.Function(i) with
| Success(r:'a,i2) -> ((rest r).Function i2)
| Failure(offset,description) -> Failure(offset,description)
)

member this.Return(x) = BinParser(fun i -> Success(x,i))


Notice that we're only defining the behavior for the let! and return elements.

Sample use



The following code shows a parser for uncompressed BMP file using the computational expression syntax.


let theBmpBinParser = pBuilder {
let! _ = AByte(byte(0x42))
let! _ = AByte(byte(0x4D))
let! size = RInt
let! reserved1 = RShort
let! reserved2 = RShort
let! dataOffset = RInt
let! headerSize = RInt
let! width = RInt
let! height = RInt
let! colorPlanes = RShort
let! bpp = RShort
let! compression = RInt
let! imageSize = RInt
let! hResolution = RInt
let! vResolution = RInt
let! paletteColors = RInt
let! importantColors = RInt
let numberOfColorsInPalette =
if int(bpp) < 16 && paletteColors = 0
then (pown 2 (int(bpp)))
else paletteColors
let! palette = FixedSequence({1..numberOfColorsInPalette },RInt)
let! content = FixedSequence({1..height},
FixedSequence({1..adjust_to_32_boundary(width,bpp)},RByte))
return (headerSize,bpp,width,height,compression,paletteColors,content)
}


The following example shows how to use this binary parser to print the contents of a monochrome BMP file:


let PrintMonoBmpData(data: byte list) =
Seq.iter(fun (x:byte) ->
Seq.iter (fun w -> match int(x &&& byte(128 >>> w)) with
| 0 -> (System.Console.Write(0))
| _ -> System.Console.Write(1)) { 0..7}
) data


let prsed = (theBmpBinParser.Function br)

match prsed with
| Success((_,bpp,_,_,_,pc,ctnt),_) when bpp = int16(1) ->
Seq.iter(fun r ->PrintMonoBmpData(r)
System.Console.WriteLine()) ctnt
| Failure(x,y) -> System.Console.WriteLine("Error: {0},in {1}",y,x)
| _ -> System.Console.WriteLine("Done");


Running this program with a monochrome BMP with a single square shows:


11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111110000000000000000001111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110111111111111111101111111
11111110000000000000000001111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111


Code for this post can be found here.