Exploring Beautiful Languages

Saturday, June 10, 2023

A quick look at functors in OCaml

A couple of weeks ago I was working on a small program that required generating code in different ways depending on a user option. I was trying to make as few changes as possible. Because of the way the program was created and the language it was written in (not OCaml), it required changing several places in the code.

OCaml functors

I remembered reading a little bit about the concept of a functor in OCaml. Functors are a powerful mechanism that allow you to create modules parameterized by modules. In the case of the task I was working on, I can use it to write the code generation section using module parameterized by a module that provides the final implementation of the code generation.

The example: Code generation via method calls or operators

I’m going to use a simple example of using a functor. Say that we have a representation for a very simple language:

(* ast.ml *)
type expr =
        | Plus of expr * expr
        | Minus of expr * expr
        | Div of expr * expr
        | Times of expr * expr
        | Call of expr * (expr list)
        | Var of string
        | Dot of expr * string

I want to have the posibility of generating arithmetic expressions in this language in two ways:

Generating method calls of arithmetic operations for a Java-like language
Generate common operators

One example of option #1 is:

var1.multiply(10).plus(var2)

For option #2 is:

var1 * 10 + var2

The code that generates the expressions must not be aware of the strategy that we are using to generate the code. To do this in OCaml we define a signature for the module used to generate the code:

module type GeneratorFuncs = sig
    val (+) :  Ast.expr -> Ast.expr -> Ast.expr
    val (-) : Ast.expr -> Ast.expr -> Ast.expr
    val (/) : Ast.expr -> Ast.expr -> Ast.expr
    val ( * ) : Ast.expr -> Ast.expr -> Ast.expr
end

We use operators to make it easy to write the code generation. The module that makes the code generator is written as a functor with a parameter that is the module which implements the GeneratorFuncs signature.

The following code shows the “generator” which generates random arithmetic expressions using the provided module for emitting the code:

module Make_generator(Current_funcs : GeneratorFuncs) = struct
  let gen_single() = 
        if Random.int 10 > 5 then
           Ast.Var "x"
        else 
           Ast.Lit (1 + Random.int 5)


  let rec generate_sample (depth: int) =  
    let new_depth = depth - 1 in
    Current_funcs.(
        match depth, Random.int 10 with
        | 0, _ -> gen_single()
        | _, r when r >= 0 && r <= 3 -> 
              ((generate_sample new_depth) + (generate_sample new_depth))
        | _, r when r >= 4 && r <= 6 -> 
              ((generate_sample new_depth) - (generate_sample new_depth))
        | _, r when r >= 7 && r <= 10 -> 
              ((generate_sample new_depth) / (generate_sample new_depth))
        | _, _ -> Ast.Var "x")
end

We can write our emitter for generating code using method calls like this:

(* mgenerator.ml *)
module MGenerator = struct
        let simple_call (obj: Ast.expr) mmethod arg =
                Ast.Call(
                        Ast.Dot(
                                obj, 
                                mmethod), [arg])
        let (+) (a: Ast.expr) (b: Ast.expr) = simple_call a "plus" b
        let ( * ) a b = simple_call a "times" b
        let (-) a b = simple_call a "minus" b
        let (/) a b = simple_call a "div" b
end

module MGen = Generator.Make_generator(MGenerator)

An alternative module that generates arithmetic expression using operators:

module OGenerator = struct
        let (+) (a: Ast.expr) (b: Ast.expr) = Ast.Plus(a, b)
        let ( * ) a b = Ast.Times(a, b)
        let (-) a b = Ast.Minus(a, b)
        let (/) a b = Ast.Div(a, b)
end

module OGen = Generator.Make_generator(OGenerator)

We can use this modules to generate sample code snippets:

let main = 
        let _ = Random.self_init() in
        in let generat = Mgenerator.MGen.generate_sample 3 
        in let generat2 = Ogenerator.OGen.generate_sample 3 
        in  
           print_endline (Ast.pprint_string generat);
           print_endline (Ast.pprint_string generat2)

Output:

x.plus(2).plus(5.div(4)).div(x.plus(x).plus(5.div(1)))
x + 4 / x - 2 + 4 + x + 5 + x

Sunday, June 4, 2023

Haskell 'newtype' and the record syntax

While reading some Haskell code snippets I found something that seemed confusing. The snippet involved newtype and record syntax. A simplified example is the following:

newtype PersonName = PersonName { theName :: String }
...
let p1 = PersonName $ getName obj
in print $ theName

I was not able to find a place where the PersonName was created by specifying the value of theName explicitly for example (Person { theName = "xyz" }) . The reason is that record syntax allows an alternative way of specifying the field values. For example:

let p1 = PersonName "Luis" -- valid!
...
let p2 = PersonName { theName = "Luis" }  -- valid!

Another thing that I found interesting is the newtype restrictions. For example it only allows you to specify one field in our record:

newtype PersonName' = PersonName' { firstName :: String, lastName ::String }

Compiling this is going to generate the following error:

newtypeexp.hs:6:23: error:
    • The constructor of a newtype must have exactly one field
        but ‘PersonName'’ has two
    • In the definition of data constructor ‘PersonName'’
      In the newtype declaration for ‘PersonName'’
  |
6 | newtype PersonName' = PersonName' { firstName :: String, lastName ::String }
  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This seems to be related to the fact that newtype is used as a compile-type concept only. More info here https://wiki.haskell.org/Newtype#The_short_version .

Friday, April 7, 2023

Some considerations for using closures in Rust/WASM

Here are a couple of things I learned while trying to pass a Rust closure to a JavaScript function. Some of these notes are a result of my lack of experience with Rust and Rust/WASM.

Passing a closure that is going to outlive the current function call

Passing a Rust function that exists in the stack to JavaScript is easy for example, here is a call to Array.map:

#[wasm_bindgen]
pub fn square_elements(a : &js_sys::Array) -> js_sys::Array {
    a.map(&mut |value:JsValue, idx: u32, arr: js_sys::Array| {
        let value = value.as_f64().unwrap();
        JsValue::from_f64(value * value)
    })
}

In this case the function used as argument to map exists only for the time of the invocation of the square_elements function. You need to do something extra when passing a closure is going to outlive the current call. This section of the Rust WASM documentation: https://rustwasm.github.io/wasm-bindgen/reference/passing-rust-closures-to-js.html#heap-allocated-closures has details on how to call a JavaScript function that receives a closure that “survives” the current method or function call. A typical example is calling requestAnimationFrame .

It has a very important note that I overlooked at first:

Once a Closure is dropped, it will deallocate its internal memory and invalidate the corresponding JavaScript function so that any further attempts to invoke it raise an exception…https://rustwasm.github.io/wasm-bindgen/reference/passing-rust-closures-to-js.html#heap-allocated-closures

One thing that I want to do in Rust with WASM is to write a “requestAnimationFrame loop” which allows me to write code that performs a repetitive task without blocking the UI thread of the browser. Here is an example of how this looks in JavaScript

// JavaScript
let i = 0;
let action = () => {
   if (i < 3) {
      console.log(`Calling with ${i}`);
      // do something interesting...
      requestAnimationFrame(action);
   }
   i++;
};
requestAnimationFrame(action);

I found a nice example on how to do this loop in Rust here: https://rustwasm.github.io/docs/wasm-bindgen/examples/request-animation-frame.html . The example has a lot of documentation that explains its functionality in the comments. The code looks a little bit intimidating for a Rust newbie (like me). Here is a small reduced code with the most important parts of the example:

let f = Rc::new(RefCell::new(None));
let g = f.clone();

let mut i = 0;
*g.borrow_mut() = Some(Closure::wrap(Box::new(move || {
   if i > 300 {
       ...
       let _ = f.borrow_mut().take(); 
       return;
   }

   i += 1;
   ...
   request_animation_frame(f.borrow().as_ref().unwrap());
}))
...
request_animation_frame(g.borrow().as_ref().unwrap());

As described in the source of the example, this code uses Rc and RefCell to keep the Closure instance alive while the sequence of requestAnimationFrame calls do its work. In this case when the i counter reaches 300 the closure will return and finish the loop.

Each part in this code is very important. I did some mistakes that I’m going to detail in the next sections.

The closure is dropped before ‘requestAnimationFrame’ does its job

The goal of the two Rc/RefCell references to the same closure is to keep the Closure alive before finishing the call to the current function. This is an example of the error that is raised when you fail to do that:

// this example is incomplete
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    
    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
         log("At the end of the closure");
    }));
    log(&format!("Before quitting 'greet' {}", Rc::strong_count(&g)));
    request_animation_frame(g.borrow().as_ref().unwrap());
}

Notice that here we don’t pass f into the closure. Hence at the end of the function both g and f are going to be dropped along with the Closure instance. Running this code shows the following errors in the browser console:

...
Before quitting 'greet' 2 wasm_loop_bg.js:259:13
Uncaught Error: closure invoked recursively or after being dropped
...

To fix this issue in the incomplete example I just need to move the f instance to the closure so it will be captured:

// this example is incomplete
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    
    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
         let _ = f; /// Now 'f' is moved inside!
         log("At the end of the closure");
    }));
    log(&format!("Before quitting 'greet' {}", Rc::strong_count(&g)));
    request_animation_frame(g.borrow().as_ref().unwrap());
}

Resources not being released

After writing the complete loop, I also found that I was not doing the complete cleanup for the closure.

Here is an example that shows the problematic code:

#[derive(Debug)]
struct MyStruct {
    x: i32
}

impl Drop for MyStruct {
    fn drop(&mut self) {
        write_debug(format!("Calling `drop` on MyStruct: {}", self.x).as_str());
    }
}

#[wasm_bindgen]
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    let captured = MyStruct { x: 100 };

    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
        log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
        if i != 2 {
           request_animation_frame(f.borrow().as_ref().unwrap());
        }
        } else {
            log("Finished");                                                                                                        return;
        }
        i += 1;
    }));

    request_animation_frame(g.borrow().as_ref().unwrap());
    log("Before finishing 'greet'");
}

In this example I created a dummy struct called MyStruct . This structure implements the Drop trait to display a message in the console when the structured is being dropped. An instance of this structure is being captured by the closure passed to the requestAnimationFrame call.

When running this code we see the following messages in the console:

Before finishing 'greet' wasm_loop_bg.js:267:13
Counter: 0 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 1 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 2 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Finished

Notice that we don’t see the message logged in the drop method of MyStruct. The reason for this is that I forgot to release the value inside the Rc/RefCell wrapper.

Here is the corrected code:

...
#[wasm_bindgen]
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    let captured = MyStruct { x: 100 };

    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
        log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
        if i != 2 {
           request_animation_frame(f.borrow().as_ref().unwrap());
        } else {
            log("Finished");
            let _ = f.take();
            return;
        }
        i += 1;
    }));

    request_animation_frame(g.borrow().as_ref().unwrap());
    log("Before finishing 'greet'");
}

And now here is the output of the code:

Before finishing 'greet' wasm_loop_bg.js:267:13
Counter: 0 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 1 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 2 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Finished wasm_loop_bg.js:267:13
Calling `drop` on MyStruct: 100

As with the original example the take method is used to move the Closure out of the Rc/RefCell reference. The value will be dropped at the end of the call.

Not following the requirement of using FnMut for the closure

This is another case of not following the rules and not taking the time to read the error message. I did a small change to the code as follows:

…

fn my_dummy_function_requiring_move(ms: MyStruct) {
   log(format!("-- {:?}", &ms).as_str());
}
...

#[wasm_bindgen]
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    let captured = MyStruct { x: 100 };

    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
        my_dummy_function_requiring_move(captured);
        log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
        if i != 2 {
           request_animation_frame(f.borrow().as_ref().unwrap());
        } else {
            log("Finished");
            let _ = f.take();
            return;
        }
        i += 1;
    }));

    request_animation_frame(g.borrow().as_ref().unwrap());
    log("Before finishing 'greet'");
}

When introducing this code the compiles shows the following error:

error[E0525]: expected a closure that implements the `FnMut` trait, but this closure only implements `FnOnce`
   --> src\lib.rs:74:41
    |
74  |       *g.borrow_mut() = Some(Closure::new(move || {
    |                              ------------ -^^^^^^
    |                              |            |
    |  ____________________________|____________this closure implements `FnOnce`, not `FnMut`
    | |                            |
    | |                            required by a bound introduced by this call
75  | |
76  | |         my_dummy_function_requiring_move(captured);
    | |                                          -------- closure is `FnOnce` because it moves the variable `captured` out of its environment
77  | |         log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
...   |
85  | |         i += 1;
86  | |     }));
    | |_____- the requirement to implement `FnMut` derives from here
    |
    = note: required for `[closure@src\lib.rs:74:41: 74:48]` to implement `IntoWasmClosure<dyn FnMut()>`
note: required by a bound in `wasm_bindgen::prelude::Closure::<T>::new`
   --> C:\Users\ldfallasu2\.cargo\registry\src\github.com-1ecc6299db9ec823\wasm-bindgen-0.2.84\src\closure.rs:271:12
    |
271 |         F: IntoWasmClosure<T> + 'static,
    |            ^^^^^^^^^^^^^^^^^^ required by this bound in `wasm_bindgen::prelude::Closure::<T>::new`

It is really impressive to see how to compiler shows the location of the error and the related areas. As the error message says, the problem here is that we are moving a value out of the closure. This fact prevents the compiler from assuming that our closure implements the FnMut trait (More information here https://doc.rust-lang.org/stable/book/ch13-01-closures.html#moving-captured-values-out-of-closures-and-the-fn-traits ).

The solution in this case is simple, since the move was not really required I created an alternative version of the function that do not require a “move”:

fn my_dummy_function_requiring_ref(ms: &MyStruct) {
   write_debug(format!("-- {:?}", ms).as_str());
}

The call is changed to my_dummy_function_requiring_ref(&s); which removes the compilation error.

The examples created in this post use the following utility functions and declarations:

fn window() -> web_sys::Window {
    web_sys::window().expect("not global 'window'")
}

fn request_animation_frame(f: &Closure<dyn FnMut()>) {
    window()
        .request_animation_frame(f.as_ref().unchecked_ref())
        .expect("Call to request animation frame ");
}

#[wasm_bindgen]
extern {
    #[wasm_bindgen(js_namespace = console)]
    fn log(s: &str);
}

Conclusions

I think there are two main conclusions for this post:

Read the documentation carefully!
Take time to read the compiler errors. The Rust compiler team put a lot of effort explain the error and help you locate the origin of the problem.

Wednesday, November 2, 2022

Implementing WHILE in a toy BASIC interpreter

While working on a toy BASIC implementation, I ran into an unexpected challenge implementing the WHILE instruction.

The implementation of this instruction seems simple. Here is an example :

10 X = 1
20 WHILE X <> 10
30 PRINT X
40 X = X + 1
50 WEND

This program is going to print the numbers from 1 to 9. The WHILE statement is really a combination of WHILE and WEND. The WEND statement indicates the end of the 'block'.

The first challenge is to create a relation between these statements. This relation is going to be used by the interpreter to identify where to 'jump' when evaluating WHILE and WEND. One interesting challenge is that BASIC (GWBASIC) supports `nested` WHILE blocks:

10 X = 10
20 Y = 10
30 WHILE X <> 0
40 PRINT "X = ", X
50 Y = 10
60 WHILE Y <> 0
70 PRINT "y =", Y
80 Y = Y - 1
90 WEND
100 X = X - 1
110 WEND

Since the source of the original GW-BASIC implementation is published in GitHub, we can take a look at it here: https://github.com/microsoft/GW-BASIC. There is code in GWMAIN.ASM used to search for the correspoing WEND of a WHILE (a code block called WNDSCN). This also seems to be used to locate the FOR=/=NEXT pair of instructions. It seems that the interpreter tries to find the matching instruction by scanning the instructions that follow the WHILE. A counter is used to keep track of nested WHILE/WEND blocks:

..
PUBLIC	WNDSCN
WNDSCN: MOV	CL,LOW OFFSET ERRWH	;SCAN FOR MATCHING WEND THIS IS ERROR IF FAIL
.
.
.
FORINC: INC	CH		;INCREMENT THE COUNT WHENEVER "FOR" IS SEEN
.
.
.
        JZ	SHORT NXTLOK	;FOR/NEXT SEARCHING
        CMP	AL,LOW OFFSET $WHILE	;ANOTHER WHILE/WEND NEST?
        JZ	SHORT FORINC
        CMP	AL,LOW OFFSET $WEND
        JNZ	SHORT FNLOP
        DEC	CH
.
.
.

1.2 Strategy

One way to start implementing this feature in the interpreter was to add a table (pair_instruction_table) to the interpreter context. This table is going to keep the relation between instructions. In particular it will be used to relate a WHILE and WEND pair. In the near future it will keep the relation between FOR and NEXT.

The evaluation context will now look like this:

pub struct EvaluationContext<'a> {
    pub variables: HashMap<String, ExpressionEvalResult>,
    pub array_variables: HashMap<String, GwArray>,
    pub jump_table: HashMap<i16, i16>,
    pub underlying_program: Option<&'a mut GwProgram>,   
    pub pair_instruction_table: HashMap<i16, i16>,
}

With this information we can create a search function that emulates the same behavior as WNDSCN . Something like this:

fn find_wend(line: i16, real_lines: &Vec<&Box<dyn GwInstruction>>) -> i16{
    let mut curr_line = line + 1;
    let mut while_end_balance = 0;
    loop {
        if curr_line >= real_lines.len() as i16 {
            break;
        } else if let Some(ref instr) = real_lines.get(curr_line as usize) {

            if instr.is_while() {
                while_end_balance += 1;
            }
            if instr.is_wend() {
                if while_end_balance  == 0 {
                    return curr_line as i16;
                } else {
                    while_end_balance -= 1;
                }
            }
        }
        curr_line += 1;
    }
    return -1;
}

Notice that we solve the problem of nested WHILE/WEND blocks by keeping a counter while_end_balance.

With this utility we can create the implementation of the WHILE statement with the following code:

impl GwInstruction for GwWhile {
    fn eval (&self, line: i16, context : &mut EvaluationContext) -> InstructionResult {
        let mut wend_line : i16 = 0;

        // Find the cached corresponding line for this WHILE statement
        if let Some(corresponding_wend) =  context.pair_instruction_table.get(&line) {      
            wend_line = *corresponding_wend;
        } else if let Some(ref real_lines) = context.real_lines {
            // Try to look for the WEND statement in the program lines
            let index_of_wend = find_wend(line, real_lines);
            if index_of_wend == -1 {
                return InstructionResult::EvaluateToError(String::from("WHILE WITHOUT WEND"));
            } else {
                context.pair_instruction_table.insert(line, index_of_wend);
                context.pair_instruction_table.insert(index_of_wend, line);
            }       
            wend_line = index_of_wend;
        }

        // Evaluate the condition and move the following line
        let condition_evaluation = self.condition.eval(context);
        match condition_evaluation {
            ExpressionEvalResult::IntegerResult(result) if result == 0 => {
                InstructionResult::EvaluateLine(wend_line + 1)
            }
            ExpressionEvalResult::IntegerResult(_) => {
                InstructionResult::EvaluateNext
            }       
            _ => {
                InstructionResult::EvaluateToError(String::from("Type mismatch"))
            }
        }
    }

1.3 Additional problems

While working in this implementation I found an additional problem. In BASIC you can have several statements in one line separated by colons (:) . For example you can have a complete WHILE block in just one line:

10 X = 1 : WHILE X < 10 : PRINT X : X = X + 1 :WEND

Or you can have a WHILE in one line and the WEND inside another line:

10  x = 1
20 WHILE x <> 5
30 print x
40 x = x + 1 : print "a" : WEND
50 print "END"

To support his scenario the program is "flattened" before executing it. For example, the previous snippet becomes is converted to:

0  x = 1
1 WHILE x <> 5
2 print x
3 x = x + 1 
4 print "a" 
5 WEND
6 print "END"

This way it is easy to implement a jump between instructions that didn't exist in as explicit lines in the original program. This transformation is performed before running the program:

...
   let real_lines = &mut context.real_lines.as_mut().expect("instr vector");
   for e in self.lines.iter() {
      real_lines.push(&e.instruction);
      if let Some(ref rest) = e.rest_instructions {
         for nested in rest {
            real_lines.push(&nested);
         }
      }	    
   }
...

Something that needs to be improved is the way to identify existing WHILE and WEND instructions. Sadly, right now this is implemented using a pair of methods is_while and is_wend. These methods are defined in the GwInstruction trait and overriden on GwWhile and GwWend. This is ugly since these methods are very specific to this problem. It doesn't seem right to have them in GwInstruction.

One alternative to solve this problem is to redesign the code representation to include a way to retrieve the original instruction from a GwInstruction. This is one of the things that will be implemented next.

Code for the interpreter is here: https://github.com/ldfallas/rgwbasic .

Saturday, April 16, 2022

Executing code from a buffer with Rust on Windows

Creating and executing code at runtime is an intriguing topic. It is one of the pieces that makes it possible to write JIT compilers for things like Java or C#.

Creating a byte array with executable instructions and “casting” that array to a function pointer is not enough . For security reasons, modern operating systems require you to specify which region of memory of your program is executable. On Windows the VirtualAlloc and VirtualProtect functions are used to do this.

There is a nice StackOverflow answer: https://stackoverflow.com/questions/40936534/how-to-alloc-a-executable-memory-buffer by user Christian Hackl on how to use these API functions. In this post I’m going to try to replicate the C++ example from the SO post in Rust .

The first is to be able to call VirtualAlloc and VirtualProtect from Rust. There are several ways to call “C” style functions in Rust. However to call these Win32 API functions I am going to use Rust for Windows. This package provides an easy way to call into Win32 API .

First we start by adding the the windows crate to our dependencies. And we also specify that we need a couple of features:

//Cargo.toml
...
[dependencies.windows]
version="0.35.0"
features = [
    "alloc",
    "Win32_Foundation",
    "Win32_System_Memory",
]

Here, the most important feature is “Win32_System_Memory” which allows us to call VirtualAlloc and VirtualProtect. You can see that in the "Required features" section of the documentation entry here https://microsoft.github.io/windows-docs-rs/doc/windows/Win32/System/Memory/fn.VirtualAlloc.html

Now that we have this functions we can rewrite the example from the StackOverflow question:

use windows::{
    core::*,
    Win32::Foundation::*,
    Win32::System::Memory::*,
};

fn main() -> Result<()> {
    unsafe {
        let buffer = VirtualAlloc(
            std::ptr::null(),
            4096,
            MEM_COMMIT | MEM_RESERVE,
            PAGE_READWRITE,
        );
        let buff_arr = std::mem::transmute::<*mut ::core::ffi::c_void, &mut [u8; 6]>(buffer);
        buff_arr[0] = 0xb8; // MOV opcode
        buff_arr[1] = 0x05; // '5' value
        buff_arr[2] = 0x00;
        buff_arr[3] = 0x00;
        buff_arr[4] = 0x00;
        buff_arr[5] = 0xc3; // RET
        let mut dummy: [PAGE_PROTECTION_FLAGS; 1] = [PAGE_PROTECTION_FLAGS::default()];
        let vpResult = VirtualProtect(buffer, 6, PAGE_EXECUTE_READ, dummy.as_mut_ptr());
        if !vpResult.as_bool() {
            GetLastError().to_hresult().ok()?;
        }
        let new_function = std::mem::transmute::<*mut ::core::ffi::c_void, fn() -> i32>(buffer);
        let result = new_function();
        println!("The result is {}", result);
        VirtualFree(buffer, 0, MEM_RELEASE);
    }
    Ok(())
}

After compiling and running this example we can see:

The result is 5

I was very happy the first time I saw that running!. Here the buff_array buffer has real x86 instructions equivalent to something like:

mov eax, 0x5
ret

Encoding this instructions is a very complex process. The documentation contains dense tables explaining the format for example for MOV or RET.

Also it is clear that we need unsafe Rust here since we are dealing with low level code.

The process of encoding the instructions is very complex. We can take a shortcut using the iced-x86 crate. This really cool library has a complete x86 assembler and dissembler. It was very easy (with my limited Rust knowledge) to adapt it to this little example.

For we include it in the Cargo.toml file:

[dependencies.iced-x86]
version = "1.17.0"
features = ["code_asm"]

Now we can create the code using the nice API that iced-x86 provides. Here I’m adding a call to a function defined in the same program.

fn print_hello() -> u32 {
    println!("Hello!!!");
    1
}

fn encode_small_program() -> ::core::result::Result<Vec<u8>, asm::IcedError> {
    let mut assembler = asm::CodeAssembler::new(64)?;
    unsafe {
        let print_hello_addr = std::mem::transmute::<fn() -> u32, u64>(print_hello);
        assembler.sub(asm::rsp, 0x28)?;
        assembler.mov(asm::rax, print_hello_addr)?;
        assembler.call(asm::rax)?;
        assembler.mov(asm::eax, 0x7)?;
        assembler.add(asm::rsp, 0x28)?;
        assembler.ret()?;
    }

    let instr = assembler.take_instructions();
    let block = InstructionBlock::new(&instr, 0);
    let res = BlockEncoder::encode(64, block, BlockEncoderOptions::NONE)?;
    Ok(res.code_buffer)
}

We can modify the make program to use this new function:

    let encoded_program = encode_small_program().unwrap();
    let p = encoded_program.as_ptr();

    unsafe {
        let buffer = VirtualAlloc(
            std::ptr::null(),
            4096,
            MEM_COMMIT | MEM_RESERVE,
            PAGE_READWRITE,
        );
        let buff_arr = std::mem::transmute::<*mut ::core::ffi::c_void, *mut u8>(buffer);

        std::ptr::copy_nonoverlapping(p, buff_arr, encoded_program.len());

        let mut dummy: [PAGE_PROTECTION_FLAGS; 1] = [PAGE_PROTECTION_FLAGS::default()];
        let vpResult = VirtualProtect(buffer, 6, PAGE_EXECUTE_READ, dummy.as_mut_ptr());
        if !vpResult.as_bool() {
            GetLastError().to_hresult().ok()?;
        }
        let new_function = std::mem::transmute::<*mut ::core::ffi::c_void, fn() -> i32>(buffer);
        let result = new_function();
        println!("The result is {}", result);
        VirtualFree(buffer, 0, MEM_RELEASE);
    }

Running this program now shows:

Hello!!!
The result is 7

This experiment bring intriguing possibilities for future posts!.

Friday, April 15, 2022

Exploring a Webpack stats file with Prolog

A couple of days ago I was reading about the Webpack statistics file created using the following command line options:

npx webpack --profile --json

This file contains a lot of information collected by Webpack about the project being processed. The information in this file is used by nice visualization tools like Webpack Bundle Analyzer.

The dependency graph is included in this file. That is, all the dependencies between modules of the project. Being able to perform queries on this data could be useful to get insights into the code.

There are many tools to process JSON, but I wanted to try to use SWI-Prolog to see if I can get information from this file.

The information I am looking for is the module dependency information. By taking a look at the Module Object we can get this information using the reasons property.

We can start by parsing the stats.json file using SWI-Prolog builtin library for reading JSON:


:- use_module(library(http/json)).

read_json_file(FileName, Terms) :-
    open(FileName, read, Stream),
    json_read(Stream, Terms),
    close(Stream).

For convenience, I'm adding the loaded file to the Prolog database using assert/1:


?- read_json_file('c:\\smallexample\\stats.json',F), assert(testfile(F)).
F = json([hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json(...), ... = ...|...]).

Now that we can load the stats file we can start by performing simple queries. For example we can start by looking at top-level properties:


?- testfile(json(Contents)), member(Name=_,Contents).
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = hash ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = version ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = time ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = builtAt ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = publicPath ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
...

Here notice, that I'm using member/2 to get the name of the properties in the main file.

By the way, as a side note, yesterday I learned that you can exclude variables from Prolog results using the following (Stack overflow question here) goal:


set_prolog_flag(toplevel_print_anon, false).

With this nice tip, we can exclude variables that start with underscore from the results:


?- testfile(json(_Contents)), member(Name=_, _Contents).
Name = hash ;
Name = version ;
Name = time ;
Name = builtAt ;
Name = publicPath ;
Name = outputPath ;
Name = assetsByChunkName ;
Name = assets ;
Name = chunks ;
Name = modules ;
Name = entrypoints ;
Name = namedChunkGroups ;
Name = errors ;
Name = errorsCount ;
Name = warnings ;
Name = warningsCount ;
Name = children.

Now I can access the modules section to extract the reasons property. This property has information on modules that depend on the current module. For example say that we have a small TypeScript program that have the following structure:

We can start the exploration of this project by looking at the contents of the modules objects.


?- testfile(json(_Contents)),
|    member( ('modules'=_Modules), _Contents),
|    member( json(_ModulePropsList), _Modules),
|    member( ('name'=ModuleName), _ModulePropsList).
ModuleName = './src/index.ts' ;
ModuleName = './src/parser.ts' ;
ModuleName = './src/FuncApply.ts' ;
ModuleName = './src/NumLiteral.ts' ;
ModuleName = './src/SymbolObj.ts' ;
ModuleName = './src/BaseObject.ts' ;
ModuleName = 'webpack/runtime/define property getters' ;
ModuleName = 'webpack/runtime/hasOwnProperty shorthand' ;
ModuleName = 'webpack/runtime/make namespace object' ;

We can create a new goal with the code above which we can use later:


module_name(json(ContentsList), Name) :-
    member(('modules'=Modules), ContentsList),
    member(json(ModulePropertiesList), Modules),
    member('name'=Name, ModulePropertiesList).


module_properties_by_name(json(ContentsList), Name, ModulePropertiesList) :-
    member(('modules'=Modules), ContentsList),
    member(json(ModulePropertiesList), Modules),
    member('name'=Name, ModulePropertiesList).

Now that we located the modules, we can get the contents of the reasons property.


?- testfile(_Json),
|    module_properties_by_name(_Json, './src/BaseObject.ts', _Props),
|    member((reasons=_Reasons), _Props),
|    member(json([_|[RefModName|_]]), _Reasons).
RefModName =  (module='./src/FuncApply.ts') ;
RefModName =  (module='./src/FuncApply.ts') ;
RefModName =  (module='./src/NumLiteral.ts') ;
RefModName =  (module='./src/NumLiteral.ts') ;
RefModName =  (module='./src/SymbolObj.ts') ;
RefModName =  (module='./src/SymbolObj.ts') ;
false.

(Repeated results seem to indicate different "reasons")

With this data we can generate Graphviz representation (for example the one used in the graph above).


name_modules([], [], _).
name_modules([ModName|Rest], [ModNamePair|RestResult], Counter) :-
    number_string(Counter, CounterStr),
    string_concat('M', CounterStr, ModuleId),
    ModNamePair = ModName - ModuleId,
    NewCounter is Counter + 1,
    name_modules(Rest, RestResult, NewCounter).

module_dependencies_by_reason(File, (Name-Referencer)) :-
    module_name(File, Name),
    module_properties_by_name(File, Name,Props),
    member((reasons=R), Props),
    member(json([_|[(module=Referencer)|_]]),R).

generate_node_descriptions([],Result, Result).
generate_node_descriptions([(Name-Id)|Rest],TmpResult, OutStr) :-
    format(atom(Tmp3), '~a[label="~a"];\n', [Id, Name]),
    string_concat(TmpResult, Tmp3, OutStrTmp),
    generate_node_descriptions(Rest, OutStrTmp, OutStr).

generate_node_relations([], _, Result, Result).
generate_node_relations([(Target-Src)|Rest], NodeIds, TmpResult, Result) :-
    get_assoc(Src, NodeIds , SrcCode),
    get_assoc(Target, NodeIds , TargetCode),
    format(atom(RelationStr), '~a -> ~a;\n', [SrcCode, TargetCode]),
    string_concat(TmpResult, RelationStr, NewTmpResult),
    generate_node_relations(Rest, NodeIds, NewTmpResult, Result), !.
generate_node_relations([_|Rest], NodeIds, TmpResult, Result) :-
    generate_node_relations(Rest, NodeIds, TmpResult, Result),!.


dot_file_from_reasons(File, DotFileStr) :-
    findall(Name, module_name(File, Name), NameList),
    name_modules(NameList, CodedList, 0),
    list_to_assoc(CodedList, AssocNameModList),!,
    setof(Pairs, module_dependencies_by_reason(File, Pairs), PairList),
    generate_node_descriptions(CodedList, 'digraph G {\n', DotFileStrTmp1),
    generate_node_relations(PairList, AssocNameModList, DotFileStrTmp1, DotFileStrTmp2),
    string_concat(DotFileStrTmp2, '}', DotFileStr).

I am impressed by the power of Prolog. I have always admired the way it works differently dependending on how you use it. For example the way member/2 was used above to extract internal elements from terms. One would assume that this predicate is only used to test list membership. However by the power of Prolog unification and backtracking we can used to explore the contents of a list.

Sunday, December 26, 2021

A small programming exercise in APL #2: Combinations

In this post I'm going to continue my APL exploration by working in a possible solution for the problem of generating all the combinations of 'n' elements of an array.

Generating the 'n' combinations of an array means generating a sequence of all possible 'n' unique elements from the original array . For example, given the array 12, 34, 35, 65 all possible '2' combinations of this array are:

12, 34
12, 35
12, 65
34, 35
34, 65
35, 65

Notice that order is not important. That is "12, 34" is considered to be the same as "34, 12". Generating all 'n' permutations of the elements of an array may be an interesting problem for a future post.

One of my goals was to be as idiomatic as possible (with my limited APL knowledge!). Because of this, I will try avoid using explicit loops or conditionals and instead use array operators.

Strategy

The general strategy for solving this problem was to calculate all possible boolean arrays having 'n' bits and use Compress to extract the elements.

For example, in the array 12, 34, 35, 65 all possible boolean vectors having two bits on are:

1 1 0 0
1 0 1 0
1 0 0 1
0 1 1 0
0 1 0 1
0 0 1 1

Using these vectors in APL we get the elements we need:


      a
12 34 35 65
      1 1 0 0 / a
12 34
      1 0 1 0 / a
12 35
      1 0 0 1 / a
12 65
      0 1 1 0 / a
34 35
      0 1 0 1 / a
34 65
      0 0 1 1 / a
35 65

This strategy is not efficient in space or time but it allowed me to explore a solution without using imperative constructs or recursion.

Generating the boolean arrays

To generate the boolean arrays we start by generating all integer values required to encode n-bit values:


      a ← 12 34 35 65
      ⍳ ( ¯1 + 2 * ⍴ a)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

When converting these numbers to binary we get all possible boolean vectors for 4-bit values. This is same as the same as the length of our sample array. We can see these arrays if we use Encode and Each:


     { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ⍳ ( ¯1 + 2 * ⍴ a)
 0 0 0 1  0 0 1 0  0 0 1 1  0 1 0 0  0 1 0 1  0 1 1 0  0 1 1 1  1 0 0 0\
  1 0 0 1  1 0 1 0  1 0 1 1  1 1 0 0  1 1 0 1  1 1 1 0  1 1 1 1

I can reshape this array to get a better representation:


      15 1 ⍴ { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ⍳ ( ¯1 + 2 * ⍴ a)
 0 0 0 1
 0 0 1 0
 0 0 1 1
 0 1 0 0
 0 1 0 1
 0 1 1 0
 0 1 1 1
 1 0 0 0
 1 0 0 1
 1 0 1 0
 1 0 1 1
 1 1 0 0
 1 1 0 1
 1 1 1 0
 1 1 1 1

Now we are only interested in boolean arrays with 'n' number of '1' bits. For n = 2 we can get only those elements using the Compress operator:


      seq ← ⍳ ( ¯1 + 2 * ⍴ a)
      n ← 2
      ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
3 5 6 9 10 12

Again we can visualize these values using reshape and encode:


      6 1 ⍴ { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
 0 0 1 1
 0 1 0 1
 0 1 1 0
 1 0 0 1
 1 0 1 0
 1 1 0 0

Selecting desired elements

Now that we have our boolean arrays we can pick all the values using Compress:


     {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
 35 65  34 65  34 35  12 65  12 35  12 34

And again we can reshape this array to see it better:


      6 1 ⍴ {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
 35 65
 34 65
 34 35
 12 65
 12 35
 12 34

Finally we can pack these expressions in a function:


∇r←a ncombinations1 n;seq
  seq ← ⍳ ( ¯1 + 2 * ⍴ a)
  r ← {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ (({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq\
)
∇

We can use it like this:


      1 2 3 4 5 ncombinations1 3
 3 4 5  2 4 5  2 3 5  2 3 4  1 4 5  1 3 5  1 3 4  1 2 5  1 2 4  1 2 3

APL has a build-in Binomal operator which allows us to calculate the number of combinations. For example:


      a ← 1 2 3 4 5
      (3 ! ⍴ a) = ⍴ a ncombinations3 3
1

Final words

It was very interesting (and difficult) to try to write a solution of this problem using only array operations. Reading the definion of the ncombinations1 I noticed that there are many expressions nested in parenthesis (maybe related to my limited APL knowledge). I know that APL developers value terseness, so I tried to reduce the size of the expressions. Here is a new version:


∇r←a ncombinations3 n;seq
  s ← ⍳ ( ¯1 + 2 * ⍴ a)
  b ← 2⍴⍨⍴a
  r ← {a/⍨b⊤⍵}¨s/⍨{n=+/b⊤⍵}¨s
∇

I was able to remove a couple of parenthesis by taking advantage of the Commute operator.

GNU APL was used to create the examples in this post.