## Monday, October 20, 2008

### Using F# computation expressions to read binary files

In this post I'm going to show a little experiment of using F# computation expression syntax to build a binary file reader.

## Computation expressions

F# Computation expressions provides a way to override the interpretation of a subset of language constructs(for example let,for,while) in a given context. This feature provides similar functionality to Haskell's do-notation or LINQ's query syntax.

Two excellent resources to learn about this feature are: Don Syme's "Some Details on F# Computation Expressions" and Robert Pickering's "Beyond Foundations of F# - Workflows".

As described by Don Syme, this feature is related to Monads:

... Likewise the kinds of operations used under the hood are much like the operations used in both LINQ and Haskell monads. Indeed, computation expressions can be seen as a general monadic syntax for F#.

While learning this feature I decided to create a little example based on the monadic parser combinators technique. The "Monadic Parsing in Haskell" paper by Graham Hutton and Erik Meijer provides a nice introduction to this topic.

There are some blog posts that talk about parser combinators in F#, for example the Inside F# blog has a nice post called Monadic parser combinators... in F#. Also Harry Pierson wrote a nice post called Monadic Philosophy Part 4 - The Parser Monad in F# treating this topic. Finally there's an adaptation of the Parsec library to F# called FParsec.

This time I wanted to create something that helps reading the content of a binary file.

## Binary parser

The binary parser definition:

``type ParseResult<'a> =   | Success of 'a * BinaryReader   | Failure of int64 * stringtype BinParser<'a> =   | BinParser of (BinaryReader -> ParseResult<'a>)   with   member this.Function =             match this with                 BinParser pFunc -> pFunc                 end``

Notice that I'm using a BinaryReader as the input instead of a byte list or array. By doing this I'm sacrificing flexibility (for example when creating a choice) for some useful methods included in BinaryReader.

The following code shows some basic binary parsers for bytes,int16 and int32 values.

``let IOExceptionHandlingWrapper(f:BinaryReader -> ParseResult<'a>) =   fun i -> try               f(i)            with              (e & :? IOException ) -> Failure(i.BaseStream.Position,e.Message)    let RByte =    BinParser(IOExceptionHandlingWrapper(                fun (i:BinaryReader) -> Success(i.ReadByte(),i)))                   let RShort =    BinParser(IOExceptionHandlingWrapper(               fun (i:BinaryReader) -> Success(i.ReadInt16(),i)))let RInt =    BinParser(IOExceptionHandlingWrapper(               fun (i:BinaryReader) -> Success(i.ReadInt32(),i)))``

Notice that the `IOExceptionHandlingWrapper` handles the case of a IOException and makes the parser fail.

``let AByte(b:byte) =    BinParser(IOExceptionHandlingWrapper(               fun (i:BinaryReader) ->                  let rB = i.ReadByte() in                      if (rB = b) then                          Success(byte(rB),i)                      else                          Failure(i.BaseStream.Position,                               System.String.Format("Expecting {0}, got {1}",b,rB))))``

Another useful parser let's you read a fixed sequence of elements recognized by another parser.

``let ParsingStep (func:'a -> BinParser<'b>) (accumulatedResult:ParseResult<'b list>) currentSeqItem =   match accumulatedResult with    | Success(result,inp) ->            match ((func currentSeqItem).Function inp) with              | Success(result2,inp2) -> Success(result2::result,inp2)              | Failure(offset,description) -> Failure(offset,description)    | Failure(offset,description) -> Failure(offset,description)    let FixedSequence (s:seq<'b>,parser:BinParser<'a>) =       BinParser(fun i ->               match  (Seq.fold (ParsingStep (fun _ -> parser)) (Success([],i)) s) with                   | Success(result,input) -> Success(List.rev(result),input)                   | Failure(offset,description) -> Failure(offset,description))``

A fixed sequence of elements is recognized. The number of elements is given by the `seq` instance.

Finally the definitions required to use the computation expression are the following:

``type BinParserBuilder() =    member this.Bind(p:BinParser<'a>,rest:'a -> BinParser<'b>) : BinParser<'b> =         BinParser(fun i -> match p.Function(i) with                           | Success(r:'a,i2) -> ((rest r).Function i2)                            | Failure(offset,description) -> Failure(offset,description)                   )                                                                          member this.Return(x) = BinParser(fun i -> Success(x,i))``

Notice that we're only defining the behavior for the `let!` and `return` elements.

## Sample use

The following code shows a parser for uncompressed BMP file using the computational expression syntax.

``let theBmpBinParser = pBuilder {   let! _ = AByte(byte(0x42))   let! _ = AByte(byte(0x4D))     let! size = RInt   let! reserved1 = RShort   let! reserved2 = RShort   let! dataOffset = RInt   let! headerSize = RInt   let! width = RInt   let! height = RInt   let! colorPlanes = RShort   let! bpp = RShort   let! compression = RInt   let! imageSize = RInt   let! hResolution = RInt   let! vResolution = RInt   let! paletteColors = RInt   let! importantColors = RInt   let  numberOfColorsInPalette =         if int(bpp) < 16 && paletteColors = 0        then (pown 2 (int(bpp)))        else paletteColors   let! palette = FixedSequence({1..numberOfColorsInPalette },RInt)   let! content = FixedSequence({1..height},                       FixedSequence({1..adjust_to_32_boundary(width,bpp)},RByte))      return (headerSize,bpp,width,height,compression,paletteColors,content)}``

The following example shows how to use this binary parser to print the contents of a monochrome BMP file:

``let PrintMonoBmpData(data: byte list) =   Seq.iter(fun (x:byte) ->               Seq.iter (fun w -> match int(x &&& byte(128 >>> w)) with                                  | 0 -> (System.Console.Write(0))                                 | _ -> System.Console.Write(1)) { 0..7}                        ) datalet prsed = (theBmpBinParser.Function br)match prsed with  | Success((_,bpp,_,_,_,pc,ctnt),_)  when bpp = int16(1) ->                Seq.iter(fun r ->PrintMonoBmpData(r)                         System.Console.WriteLine()) ctnt  | Failure(x,y) -> System.Console.WriteLine("Error: {0},in {1}",y,x)  | _ -> System.Console.WriteLine("Done");``

Running this program with a monochrome BMP with a single square shows:

``1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111110000000000000000001111111111111101111111111111111011111111111111011111111111111110111111111111110111111111111111101111111111111101111111111111111011111111111111011111111111111110111111111111110111111111111111101111111111111101111111111111111011111111111111011111111111111110111111111111110111111111111111101111111111111101111111111111111011111111111111011111111111111110111111111111110111111111111111101111111111111101111111111111111011111111111111011111111111111110111111111111110000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111``

Code for this post can be found here.

Anonymous said...

It is an interesting example and it shows that everything can be done in F# as it can in many other languages but, beyond that, why would you ever parse binary files in F# unless somebody aimed the gun at your head and forced you to do it?

Anonymous said...

lol at comment above

Anonymous said...

As a temporal speed test to check a hand coded asm or c function against a f# implementation so I can understand how to multiply A * B and get C with Unicode characters as the input in both A and B, and the output into C..explanation answer to Anonymous post #1..?

Btw thanks for the useful post. :)