Sunday, March 15, 2009

Support for the LookupSwitch opcode in AbcExplorationLib

The LookupSwitch AVM2 opcode is used to represent the ActionScript switch statement.

This is an interesting branch instruction because it has multiple targets. It is almost a direct translation of the switch statement because it has two parameters, a default case and an array of possible targets.

Recently support for this opcode was added to AbcExplorationLib.

Given the following ActionScript code:


var x;
for(x = 1;x < 5;x++) {
switch(x) {
case 1:
print("one");
break;
case 2:
print("two");
break;
case 3:
print("three");
break;
case 4:
print("four");
break;
}
}


We compile this file to a .abc file using the following command:


$ java -jar /opt/flex3sdk/lib/asc.jar testswitch.as

testswitch.abc, 280 bytes written


By using a little IronPython example included with AbcExplorationLib we can see how this library interprets the LookupSwitch opcode:


$ mono /opt/IronPython-2.0/ipy.exe ../ipyexample/abccontents.py testswitch.abc
...
Instructions:
getlocal_0
pushscope
pushbyte 1
getglobalscope
swap
setslot 1
jump dest177
dest12:
label
jump dest74
dest17:
label
findpropertystrict M.print
pushstring "one"
callprop M.print
pop
jump dest165
dest30:
label
findpropertystrict M.print
pushstring "two"
callprop M.print
pop
jump dest165
dest43:
label
findpropertystrict M.print
pushstring "three"
callprop M.print
pop
jump dest165
dest56:
label
findpropertystrict M.print
pushstring "four"
callprop M.print
pop
jump dest165
dest69:
label
jump dest165
dest74:
getglobalscope
getslot 1
setlocal_1
pushbyte 1
getlocal_1
ifstrictneq dest91
pushshort 0
jump dest143
dest91:
pushbyte 2
getlocal_1
ifstrictneq dest104
pushshort 1
jump dest143
dest104:
pushbyte 3
getlocal_1
ifstrictneq dest117
pushshort 2
jump dest143
dest117:
pushbyte 4
getlocal_1
ifstrictneq dest130
pushshort 3
jump dest143
dest130:
pushfalse
iffalse SolvedReference dest141
pushshort 4
jump dest143
dest141:
pushshort 4
dest143:
kill
lookupswitch dest69 dest17,dest30,dest43,dest56,dest69
dest165:
getglobalscope
getslot 1
increment
setlocal_1
getlocal_1
getglobalscope
swap
setslot 1
kill
dest177:
getglobalscope
getslot 1
pushbyte 5
iflt dest12
returnvoid


As described in the documentation the parameters of the LookupSwitch instruction are specified as relative byte offsets that specify the target. In order to give a higher level representation of the code, these relative offsets are converted to symbolic references. This process is detailed in the "Using F# Active Patterns to encapsulate complex conditions" post.

This process starts in the following functions:


static member ReadAndProcessInstructions(aInput:BinaryReader,
count,
constantPool:ConstantPoolInfo) =
let instructionsAndOffsets =
(AvmMethodBody.ReadingInstructions([],
aInput,
count,
constantPool)) in
let destinations =
AvmMethodBody.CollectDestinations(instructionsAndOffsets,
Map.empty,
instructionsAndOffsets)
in
AvmMethodBody.UpdateCodeWithDestinations(
destinations,
instructionsAndOffsets,[]) |> List.to_array


The CollectDestinations method collects all the absolute offsets used by branch instructions and stores them in a dictionary with a generated label. The UpdateCodeWithDestinations method modifies the instruction list use the generated labels.

For example to add support in CollectDestinations the following code was added:


static member CheckLookupSwitchCase baseOffset
(totalInstructions:(int64*AbcFileInstruction)
list) =
fun (destinations:Map<int64,string>) target ->
match target with
| UnSolvedReference(relativeOffset) when
(AvmMethodBody.IsDestinationDefined(int(baseOffset+relativeOffset),
totalInstructions)) ->
destinations.Add(int64(baseOffset+relativeOffset),
sprintf "dest%d" (baseOffset+relativeOffset))
| _ -> destinations

static member CollectDestinations(instructions:(int64*AbcFileInstruction) list,
destinations:Map<int64,string>,
totalInstructions:(int64*AbcFileInstruction) list) =
match instructions with
...
| ((offset,(LookupSwitch(defaultBranch,cases)))::rest) ->
let baseOffset = int(offset) in
AvmMethodBody.CollectDestinations(rest,
Seq.append [defaultBranch] cases |>
Seq.fold (AvmMethodBody.CheckLookupSwitchCase baseOffset totalInstructions ) destinations,
totalInstructions)
...


Then the code is modified in the UpdateCodeWithDestinations method to use the generated labels.


static member UpdateCodeWithDestinations(destinations:Map<int64,string>,
instructions,
resultingInstructions) =
let processedInstructions =
match instructions with
...
| ((offset,LookupSwitch(defaultCase,cases))::rest) ->
(offset,
LookupSwitch(AvmMethodBody.SolveSwitchCase defaultCase offset destinations,
Array.map (fun c->AvmMethodBody.SolveSwitchCase c offset destinations) cases))::rest
| _ -> instructions
...

Tuesday, March 3, 2009

Manipulating AVM2 byte code with F#

In this post I'm going to show an example of using AbcExplorationLib to manipulate simple AVM2 byte code (ActionScript). This example show how load a .ABC file and write it back to disk.

AbcExplorationLib is a library that will allow the manipulation of AVM2 Byte Code(described here). Although it's still incomplete, some basic examples work as the ones presented in this post .

The following ActionScript code will be compiled to byte code .


var i = 0;
for( i = 0;i < 10;i++) {
print("inside loop");
}
print("Done");


To generate the ".abc" file we type:

c:\test\> java -jar c:\flexsdk\lib\asc.jar test.as



Loading the compiled file



We're going to use the F# REPL(fsi.exe) to manipulate the file. We start by referencing the library.


> #r "abcexplorationlib.dll";;

--> Referenced 'C:\test\abcexplorationlib.dll'

> open Langexplr.Abc;;

Now we load the file:


> let abcFile = using (new System.IO.FileStream("test.abc",System.IO.FileMode.Open)) (
fun s -> AvmAbcFile.Create(s));;


Now abcFile contains the code of the compiled program.


> abcFile;;
val it : AvmAbcFile
= Langexplr.Abc.AvmAbcFile {Classes = [];
Scripts = [Langexplr.Abc.AvmScript];}




Inspecting the instructions



We're interested in the instructions of the top-level script for this .abc file. By typing the following expression we can get to this section:


> abcFile.Scripts.[0].InitMethod.Body.Value.Instructions;;
val it : AbcFileInstruction array
= [|GetLocal0; PushScope; PushByte 0uy; GetGlobalScope; Swap; SetSlot 1;
PushByte 0uy; GetGlobalScope; Swap; SetSlot 1;
Jump (SolvedReference "dest39"); ArtificialCodeBranchLabel "dest18"; Label
FindPropertyStrict
(MQualifiedName
([|Ns ("",CONSTANT_Namespace); Ns ("test.as$0",CONSTANT_PrivateNs)|],
"print")); PushString "inside loop";
CallProperty
(MQualifiedName
([|Ns ("",CONSTANT_Namespace); Ns ("test.as$0",CONSTANT_PrivateNs)|],
"print"),1); Pop; GetGlobalScope; GetSlot 1; Increment; SetLocal_2;
GetLocal2; GetGlobalScope; Swap; SetSlot 1; Kill 2;
ArtificialCodeBranchLabel "dest39"; GetGlobalScope; GetSlot 1;
PushByte 10uy; IfLt (SolvedReference "dest18");
FindPropertyStrict
(MQualifiedName
([|Ns ("",CONSTANT_Namespace); Ns ("test.as$0",CONSTANT_PrivateNs)|],
"print")); PushString "Done";
CallProperty
(MQualifiedName
([|Ns ("",CONSTANT_Namespace); Ns ("test.as$0",CONSTANT_PrivateNs)|],
"print"),1); CoerceA; SetLocal_1; GetLocal1; ReturnValue; Kill 1|]



We're going to define the following function to assist in the presentation of instruction listings.


> open Langexplr.Abc.InstructionPatterns;;
> let pr (i:AbcFileInstruction) =
- match i with
- | ArtificialCodeBranchLabel t -> printf "%s:\n" <| t.ToString()
- | i & UnsolvedSingleBranchInstruction(d,_) -> printf " %s %d\n" i.Name d
- | i & SolvedSingleBranchInstruction(l,_) -> printf " %s %s\n" i.Name l
- | _ -> printf " %s\n" i.Name;;

val pr : AbcFileInstruction -> unit


Now we can type:


> abcFile.Scripts.[0].InitMethod.Body.Value.Instructions |> Array.iter pr;;
getlocal_0
pushscope
pushbyte
getglobalscope
swap
setslot
pushbyte
getglobalscope
swap
setslot
jump dest39
dest18:
label
findpropertystrict
pushstring
callprop
pop
getglobalscope
getslot
increment
setlocal_2
getlocal_2
getglobalscope
swap
setslot
kill
dest39:
getglobalscope
getslot
pushbyte
iflt dest18
findpropertystrict
pushstring
callprop
coerce_a
setlocal_1
getlocal_1
returnvalue
kill
val it : unit = ()




A note on branch instructions



In order to make it easy to manipulate and analyze the code AbcExplorationLib adds a non-existing instruction called ArtificialCodeBranchLabel to mark the position where a branch instruction will jump. When these labels are generated the branch instructions are modified to point to the label's name instead of a relative byte offset. Details on how this process is briefly described in "Using F# Active Patterns to encapsulate complex conditions"

Converting from label references to byte offsets is also necessary to write code back to an .abc file. This process is performed by a function called ConvertSymbolicLabelsToByteReferences, for example:


> let c = AbcFileCreator();;

val c : AbcFileCreator

> abcFile.Scripts.[0].InitMethod.Body.Value.Instructions |>
- InstructionManipulation.ConvertSymbolicLabelsToByteReferences c |>
- Array.iter pr;;
getlocal_0
pushscope
pushbyte
getglobalscope
swap
setslot
pushbyte
getglobalscope
swap
setslot
jump 21
dest18:
label
findpropertystrict
pushstring
callprop
pop
getglobalscope
getslot
increment
setlocal_2
getlocal_2
getglobalscope
swap
setslot
kill
dest39:
getglobalscope
getslot
pushbyte
iflt -30
findpropertystrict
pushstring
callprop
coerce_a
setlocal_1
getlocal_1
returnvalue
kill
val it : unit = ()
>



Modifying the code



Values for branch instruction targets are adjusted if new code added, for example, lets add some code to print "Hola!" inside the loop.


> let printName = CQualifiedName(Ns("",NamespaceKind.CONSTANT_Namespace),"print"
- ) ;;

val printName : QualifiedName

> let printCode = [| FindPropertyStrict printName ;
- PushString "Hola!" ;
- CallProperty(printName,1);
- Pop |] ;;

val printCode : AbcFileInstruction array

> Seq.append instr.[0..16] <| Seq.append printCode instr.[17..] |>
- Seq.to_array |>
- InstructionManipulation.ConvertSymbolicLabelsToByteReferences c |>
- Array.iter pr;;
getlocal_0
pushscope
pushbyte
getglobalscope
swap
setslot
pushbyte
getglobalscope
swap
setslot
jump 29
dest18:
label
findpropertystrict
pushstring
callprop
pop
findpropertystrict
pushstring
callprop
pop

getglobalscope
getslot
increment
setlocal_2
getlocal_2
getglobalscope
swap
setslot
kill
dest39:
getglobalscope
getslot
pushbyte
iflt -38
findpropertystrict
pushstring
callprop
coerce_a
setlocal_1
getlocal_1
returnvalue
kill
val it : unit = ()



Writing the new file



We can write this code back to a .abc file by doing this:


> let newCode = Seq.append instr.[0..16] <| Seq.append printCode instr.[17..] |
- > Seq.to_array;;

val newCode : AbcFileInstruction array

> let newBody = AvmMethodBody(oldbody.Method,
- oldbody.MaxStack,
- oldbody.LocalCount,
- oldbody.InitScopeDepth,
- oldbody.MaxScopeDepth,
- newCode,
- oldbody.Exceptions,
- oldbody.Traits);;

val newBody : AvmMethodBody

> let newFile = AvmAbcFile( [AvmScript( abcFile.Scripts.[0].InitMethod.CloneWithBody(newBody), abcFile.Scripts.[0].Members)], []);;

val newFile : AvmAbcFile
> open System.IO;;
> let c = AbcFileCreator();;

val c : AbcFileCreator

>
- using (new BinaryWriter(new FileStream("test_modified.abc",FileMode.Create)))
-
- (fun f -> let file = newFile.ToLowerIr(c) in file.WriteTo(f));;
val it : unit = ()


Running this program using Tamarin shows:


c:\test\>avmplus_sd.exe test_modified.abc
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
inside loop
Hola!
Done



The AbcExplorationLib library is still pretty incomplete. Also there's a lot to improve, for example name handling and instruction modification. Future posts will present new features/experiments.