Thursday, November 13, 2008

Using a MGrammar parser from F#

In this post I'm going to show a little example of manipulating the output tree of a MGrammar parser using some F# features.

MGrammar



MGrammar is one of the components of the "Oslo" Modeling Platform which allows the definition of domain specific languages.

In his post I'm not giving an introduction to MGrammar. There are really nice articles about it for example: MGrammar in a Nutshell, Parsing with Oslo’s MGrammar (Mg) and Creating a WatiN DSL using MGrammar. Also the Oslo:Building Textual DSLs talk provides a very nice introduction to the topic.

The example



For this post I'm going to create a MGrammar definition of a little language that describes the structure of a binary file. Then I'm going to create a F# program that interprets these definitions to load binary files.

The following code shows an example of the language to be defined:


test = begin
byte == 0x22
int32 x
int16 w
bytes[16] k
end


This code describes binary files that start with a byte with the 0x22 value, followed by a int32 value associated with the "x" name, followed by an int16 value named "w" followed by an array of 16 bytes named "k".

The grammar



The grammar for this language is the following:


module LangexplTests
{
language BinaryRecognizer
{
token Int32Type = "int32";
token Int16Type = "int16";
token ByteType = "byte";

token Bytes = "bytes";

token Integer = ("0".."9")+;
token HexNumber = "0x" ("0".."9"|"A".."F")+;
token Symbol = (("A".."Z") | ("a".."z"))+;

token NewLine = "\n";
token LineFeed = "\r";
token LineBreak = LineFeed? NewLine;

token Begin = "begin";
token End = "end";

syntax TypeName = Int32Type => Int32[] |Int16Type => Int16[] |ByteType => Byte[];

syntax Expression = i:Integer=>Int[i] | s:Symbol => Var[s] | h:HexNumber => Hex[h];

syntax TypeDeclaration = TypeName Symbol;

syntax MultipleBytesDeclaration = Bytes "[" n:Expression "]" s:Symbol =>
MultipleBytesDeclaration[n,s];
syntax LiteralValueDeclaration = t:TypeName "==" e:Expression => LiteralValueDeclaration[t,e];

syntax ItemDeclaration = TypeDeclaration|MultipleBytesDeclaration|LiteralValueDeclaration;

syntax Sequence(G,Separator) =
e:G => [e]
| es:Sequence(G,Separator) Separator e:G => [valuesof(es),e];


syntax Definition = name:Symbol "=" Begin LineBreak
decs:Sequence(ItemDeclaration, LineBreak)
LineBreak
End
LineBreak* => Recognizer[name,decs];

syntax Main = Definition;

interleave Whitespace = " ";
}
}


Parsing the code



After compiling the grammar we can now load it in F#. Based on the example presented the in the introductory articles we can write:


open Microsoft.M.Grammar
open System.Dataflow

...

let runProgram() =
printf "Start\n"
let parser = MGrammarCompiler.LoadParserFromMgx(basepath+"BinaryFileRecognizer.mgx", null);
let parsedDocument = parser.ParseObject(basepath+"sample.brg",ErrorReporter.Standard)
let reco = buildRecognizer(parsedDocument)
...


The buildRecognizer will navigate the MGraph structure generated by the parser and generate a binary file recognizer based on the code from the "Using F# computation expressions to read binary files" post.

Active patterns for the parsed AST



As described in the C# samples provided in the "Programatic" section of the MGrammar in a Nutshell article, the GraphBuilder class provides a way to access parts of the parsed AST.

By defining a some F# Active Patterns we can improve the experience of navigating this tree structure. For example:


let (|SequenceElements|_|)(x) =
let gb = new GraphBuilder()
in
if (gb.IsSequence(x)) then
Some(Seq.to_list <| gb.GetSequenceElements(x))
else
None


let (|Entity|_|)(x) =
let gb = new GraphBuilder()
in
if (gb.IsEntity(x)) then
Some(gb.GetEntityLabel(x),
gb.GetEntityMembers(x)
|> Seq.map (fun (kvp:System.Collections.Generic.KeyValuePair) -> kvp.Value)
|> Seq.to_list)
else
None


let (|EntityName|_|)(x) =
let gb = new GraphBuilder()
in
if (gb.IsEntity(x)) then
Some(gb.GetEntityLabel(x))
else
None

let (|Identifier|_|)(x:obj) =
if (x :? System.Dataflow.Identifier) then
let identifier = (x :?> System.Dataflow.Identifier)
in
Some(identifier.Text)
else
None

let (|AString|_|)(x:obj) =
if (x :? System.String) then
Some(x :?> System.String)
else
None


These active pattern definitions allow the extraction of parts of a MGraph structure.

For example given following code:


test = begin
int32 x
end


The following tree structure is created by the parser.


Main[
Recognizer[
"test",
[
ItemDeclaration[
TypeDeclaration[
Int32[
],
"x"
]
]
]
]
]



Based on this example we can look at the definition of the buildRecognizer function that was referenced above.


let buildRecognizer(ast:obj) =
match ast with
| Entity(Identifier("Main"),
[Entity(Identifier("Recognizer"),
[AString(name);
SequenceElements(definitions)])])
->
printf "Processing %s \n" name
buildingRecognizer(definitions,Map.empty,new BinParserBuilder())
| _ -> raise (new System.InvalidOperationException("Invalid document"))


Notice that by combining the Entity and SequenceElements active patterns we can easily get the sequence of definitions for a given file.


In the buildingRecognizer we can process each different case of definitions specified in the grammar. For example for the following code:


int32 x


We get the following tree structure:


ItemDeclaration[
TypeDeclaration[
Int32[
],
"x"
]
]


And the case that processes this definition is the following:


let rec buildingRecognizer(definitions:obj list,variables: Map<int64,string>,builder:BinParserBuilder) =
match definitions with
| ((Entity(Identifier("ItemDeclaration"),[declaration]))::rest) ->
...
match declaration with
| Entity(Identifier("TypeDeclaration"),
[EntityName(Identifier(typeName));AString(name)])
->
let restParser = buildingRecognizer(rest,variables,builder)
in builder.Bind( (match typeName with
| "Int32" -> BRInt(name)
| "Int16" -> BRShort(name)
| "Byte" -> BRByte(name)
| _ -> raise(new System.InvalidOperationException("Unknown type"))), fun _ -> restParser)
...


In another example, for the following input code:


bytes[a] content


We get the following tree:


MultipleBytesDeclaration[
Var[
"a"
],
"content"
]


Which is processed by the following case:


| Entity(Identifier("MultipleBytesDeclaration"),
[Entity(Identifier("Var"),[AString(lengthVarName)]);
AString(varName)]) ->

let restParser = buildingRecognizer(rest,variables,builder)
in builder.Bind(
BoundFixedByteSequence(varName,lengthVarName,RByte), fun _ -> restParser)



Reading a BMP file



The following code shows a simple recognizer for reading 8 bit BMP files.


bmpFile = begin
byte == 0x42
byte == 0x4D
int32 fileSize
int16 resF
int16 resS
int32 pixelOffset
int32 headerSize
int32 width
int32 height
int16 colorPlanes
int16 == 8
int32 compression
int32 imageSize
int32 hResolution
int32 vResolution
int32 == 0x0
int32 == 0x0
bytes[1024] paletteColors
bytes[imageSize] bitmapData
end


By running this programming with a 32x32 8bit bmp file we get the following output (by printing the values of the variables):


Start
Parsed
Processing bmpFile
bitmapData = Bytes ff , ff , ff , ff ...
colorPlanes = 1
compression = 0
fileSize = 2102
hResolution = 0
headerSize = 40
height = 32
imageSize = 1024
paletteColors = Bytes 0 , 0 , 0 , 0 ...
pixelOffset = 1078
resF = 0
resS = 0
vResolution = 0
width = 32
2102


Final words



The use of Active Patterns can simplify the manipulation of the default AST generated by a MGrammar parser.

Something bad about this implementation is that it requires the creation of a GraphBuilder instance for each use of the active pattern. Maybe this could be solved by creating a single GraphBuilder instance for the module where the active patterns are defined.

Code for this post can be found here.

No comments: