Exploring Beautiful Languages

A quick look at functors in OCaml

2023-06-10T13:16:00.000-06:00

A couple of weeks ago I was working on a small program that required generating code in different ways depending on a user option. I was trying to make as few changes as possible. Because of the way the program was created and the language it was written in (not OCaml), it required changing several places in the code.

OCaml functors

I remembered reading a little bit about the concept of a functor in OCaml. Functors are a powerful mechanism that allow you to create modules parameterized by modules. In the case of the task I was working on, I can use it to write the code generation section using module parameterized by a module that provides the final implementation of the code generation.

The example: Code generation via method calls or operators

I’m going to use a simple example of using a functor. Say that we have a representation for a very simple language:

(* ast.ml *)
type expr =
        | Plus of expr * expr
        | Minus of expr * expr
        | Div of expr * expr
        | Times of expr * expr
        | Call of expr * (expr list)
        | Var of string
        | Dot of expr * string

I want to have the posibility of generating arithmetic expressions in this language in two ways:

Generating method calls of arithmetic operations for a Java-like language
Generate common operators

One example of option #1 is:

var1.multiply(10).plus(var2)

For option #2 is:

var1 * 10 + var2

The code that generates the expressions must not be aware of the strategy that we are using to generate the code. To do this in OCaml we define a signature for the module used to generate the code:

module type GeneratorFuncs = sig
    val (+) :  Ast.expr -> Ast.expr -> Ast.expr
    val (-) : Ast.expr -> Ast.expr -> Ast.expr
    val (/) : Ast.expr -> Ast.expr -> Ast.expr
    val ( * ) : Ast.expr -> Ast.expr -> Ast.expr
end

We use operators to make it easy to write the code generation. The module that makes the code generator is written as a functor with a parameter that is the module which implements the GeneratorFuncs signature.

The following code shows the “generator” which generates random arithmetic expressions using the provided module for emitting the code:

module Make_generator(Current_funcs : GeneratorFuncs) = struct
  let gen_single() = 
        if Random.int 10 > 5 then
           Ast.Var "x"
        else 
           Ast.Lit (1 + Random.int 5)


  let rec generate_sample (depth: int) =  
    let new_depth = depth - 1 in
    Current_funcs.(
        match depth, Random.int 10 with
        | 0, _ -> gen_single()
        | _, r when r >= 0 && r <= 3 -> 
              ((generate_sample new_depth) + (generate_sample new_depth))
        | _, r when r >= 4 && r <= 6 -> 
              ((generate_sample new_depth) - (generate_sample new_depth))
        | _, r when r >= 7 && r <= 10 -> 
              ((generate_sample new_depth) / (generate_sample new_depth))
        | _, _ -> Ast.Var "x")
end

We can write our emitter for generating code using method calls like this:

(* mgenerator.ml *)
module MGenerator = struct
        let simple_call (obj: Ast.expr) mmethod arg =
                Ast.Call(
                        Ast.Dot(
                                obj, 
                                mmethod), [arg])
        let (+) (a: Ast.expr) (b: Ast.expr) = simple_call a "plus" b
        let ( * ) a b = simple_call a "times" b
        let (-) a b = simple_call a "minus" b
        let (/) a b = simple_call a "div" b
end

module MGen = Generator.Make_generator(MGenerator)

An alternative module that generates arithmetic expression using operators:

module OGenerator = struct
        let (+) (a: Ast.expr) (b: Ast.expr) = Ast.Plus(a, b)
        let ( * ) a b = Ast.Times(a, b)
        let (-) a b = Ast.Minus(a, b)
        let (/) a b = Ast.Div(a, b)
end

module OGen = Generator.Make_generator(OGenerator)

We can use this modules to generate sample code snippets:

let main = 
        let _ = Random.self_init() in
        in let generat = Mgenerator.MGen.generate_sample 3 
        in let generat2 = Ogenerator.OGen.generate_sample 3 
        in  
           print_endline (Ast.pprint_string generat);
           print_endline (Ast.pprint_string generat2)

Output:

x.plus(2).plus(5.div(4)).div(x.plus(x).plus(5.div(1)))
x + 4 / x - 2 + 4 + x + 5 + x

Haskell 'newtype' and the record syntax

2023-06-04T18:03:00.002-06:00

While reading some Haskell code snippets I found something that seemed confusing. The snippet involved newtype and record syntax. A simplified example is the following:

newtype PersonName = PersonName { theName :: String }
...
let p1 = PersonName $ getName obj
in print $ theName

I was not able to find a place where the PersonName was created by specifying the value of theName explicitly for example (Person { theName = "xyz" }) . The reason is that record syntax allows an alternative way of specifying the field values. For example:

let p1 = PersonName "Luis" -- valid!
...
let p2 = PersonName { theName = "Luis" }  -- valid!

Another thing that I found interesting is the newtype restrictions. For example it only allows you to specify one field in our record:

newtype PersonName' = PersonName' { firstName :: String, lastName ::String }

Compiling this is going to generate the following error:

newtypeexp.hs:6:23: error:
    • The constructor of a newtype must have exactly one field
        but ‘PersonName'’ has two
    • In the definition of data constructor ‘PersonName'’
      In the newtype declaration for ‘PersonName'’
  |
6 | newtype PersonName' = PersonName' { firstName :: String, lastName ::String }
  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This seems to be related to the fact that newtype is used as a compile-type concept only. More info here https://wiki.haskell.org/Newtype#The_short_version .

Some considerations for using closures in Rust/WASM

2023-04-07T22:02:00.000-06:00

Here are a couple of things I learned while trying to pass a Rust closure to a JavaScript function. Some of these notes are a result of my lack of experience with Rust and Rust/WASM.

Passing a closure that is going to outlive the current function call

Passing a Rust function that exists in the stack to JavaScript is easy for example, here is a call to Array.map:

#[wasm_bindgen]
pub fn square_elements(a : &js_sys::Array) -> js_sys::Array {
    a.map(&mut |value:JsValue, idx: u32, arr: js_sys::Array| {
        let value = value.as_f64().unwrap();
        JsValue::from_f64(value * value)
    })
}

In this case the function used as argument to map exists only for the time of the invocation of the square_elements function. You need to do something extra when passing a closure is going to outlive the current call. This section of the Rust WASM documentation: https://rustwasm.github.io/wasm-bindgen/reference/passing-rust-closures-to-js.html#heap-allocated-closures has details on how to call a JavaScript function that receives a closure that “survives” the current method or function call. A typical example is calling requestAnimationFrame .

It has a very important note that I overlooked at first:

Once a Closure is dropped, it will deallocate its internal memory and invalidate the corresponding JavaScript function so that any further attempts to invoke it raise an exception…https://rustwasm.github.io/wasm-bindgen/reference/passing-rust-closures-to-js.html#heap-allocated-closures

One thing that I want to do in Rust with WASM is to write a “requestAnimationFrame loop” which allows me to write code that performs a repetitive task without blocking the UI thread of the browser. Here is an example of how this looks in JavaScript

// JavaScript
let i = 0;
let action = () => {
   if (i < 3) {
      console.log(`Calling with ${i}`);
      // do something interesting...
      requestAnimationFrame(action);
   }
   i++;
};
requestAnimationFrame(action);

I found a nice example on how to do this loop in Rust here: https://rustwasm.github.io/docs/wasm-bindgen/examples/request-animation-frame.html . The example has a lot of documentation that explains its functionality in the comments. The code looks a little bit intimidating for a Rust newbie (like me). Here is a small reduced code with the most important parts of the example:

let f = Rc::new(RefCell::new(None));
let g = f.clone();

let mut i = 0;
*g.borrow_mut() = Some(Closure::wrap(Box::new(move || {
   if i > 300 {
       ...
       let _ = f.borrow_mut().take(); 
       return;
   }

   i += 1;
   ...
   request_animation_frame(f.borrow().as_ref().unwrap());
}))
...
request_animation_frame(g.borrow().as_ref().unwrap());

As described in the source of the example, this code uses Rc and RefCell to keep the Closure instance alive while the sequence of requestAnimationFrame calls do its work. In this case when the i counter reaches 300 the closure will return and finish the loop.

Each part in this code is very important. I did some mistakes that I’m going to detail in the next sections.

The closure is dropped before ‘requestAnimationFrame’ does its job

The goal of the two Rc/RefCell references to the same closure is to keep the Closure alive before finishing the call to the current function. This is an example of the error that is raised when you fail to do that:

// this example is incomplete
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    
    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
         log("At the end of the closure");
    }));
    log(&format!("Before quitting 'greet' {}", Rc::strong_count(&g)));
    request_animation_frame(g.borrow().as_ref().unwrap());
}

Notice that here we don’t pass f into the closure. Hence at the end of the function both g and f are going to be dropped along with the Closure instance. Running this code shows the following errors in the browser console:

...
Before quitting 'greet' 2 wasm_loop_bg.js:259:13
Uncaught Error: closure invoked recursively or after being dropped
...

To fix this issue in the incomplete example I just need to move the f instance to the closure so it will be captured:

// this example is incomplete
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    
    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
         let _ = f; /// Now 'f' is moved inside!
         log("At the end of the closure");
    }));
    log(&format!("Before quitting 'greet' {}", Rc::strong_count(&g)));
    request_animation_frame(g.borrow().as_ref().unwrap());
}

Resources not being released

After writing the complete loop, I also found that I was not doing the complete cleanup for the closure.

Here is an example that shows the problematic code:

#[derive(Debug)]
struct MyStruct {
    x: i32
}

impl Drop for MyStruct {
    fn drop(&mut self) {
        write_debug(format!("Calling `drop` on MyStruct: {}", self.x).as_str());
    }
}

#[wasm_bindgen]
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    let captured = MyStruct { x: 100 };

    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
        log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
        if i != 2 {
           request_animation_frame(f.borrow().as_ref().unwrap());
        }
        } else {
            log("Finished");                                                                                                        return;
        }
        i += 1;
    }));

    request_animation_frame(g.borrow().as_ref().unwrap());
    log("Before finishing 'greet'");
}

In this example I created a dummy struct called MyStruct . This structure implements the Drop trait to display a message in the console when the structured is being dropped. An instance of this structure is being captured by the closure passed to the requestAnimationFrame call.

When running this code we see the following messages in the console:

Before finishing 'greet' wasm_loop_bg.js:267:13
Counter: 0 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 1 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 2 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Finished

Notice that we don’t see the message logged in the drop method of MyStruct. The reason for this is that I forgot to release the value inside the Rc/RefCell wrapper.

Here is the corrected code:

...
#[wasm_bindgen]
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    let captured = MyStruct { x: 100 };

    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
        log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
        if i != 2 {
           request_animation_frame(f.borrow().as_ref().unwrap());
        } else {
            log("Finished");
            let _ = f.take();
            return;
        }
        i += 1;
    }));

    request_animation_frame(g.borrow().as_ref().unwrap());
    log("Before finishing 'greet'");
}

And now here is the output of the code:

Before finishing 'greet' wasm_loop_bg.js:267:13
Counter: 0 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 1 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Counter: 2 captured value: MyStruct { x: 100 } wasm_loop_bg.js:267:13
Finished wasm_loop_bg.js:267:13
Calling `drop` on MyStruct: 100

As with the original example the take method is used to move the Closure out of the Rc/RefCell reference. The value will be dropped at the end of the call.

Not following the requirement of using FnMut for the closure

This is another case of not following the rules and not taking the time to read the error message. I did a small change to the code as follows:

…

fn my_dummy_function_requiring_move(ms: MyStruct) {
   log(format!("-- {:?}", &ms).as_str());
}
...

#[wasm_bindgen]
pub fn greet() {
    let f = Rc::new(RefCell::new(None));
    let g = f.clone();
    let captured = MyStruct { x: 100 };

    let mut i  = 0;
    *g.borrow_mut() = Some(Closure::new(move || {
        my_dummy_function_requiring_move(captured);
        log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
        if i != 2 {
           request_animation_frame(f.borrow().as_ref().unwrap());
        } else {
            log("Finished");
            let _ = f.take();
            return;
        }
        i += 1;
    }));

    request_animation_frame(g.borrow().as_ref().unwrap());
    log("Before finishing 'greet'");
}

When introducing this code the compiles shows the following error:

error[E0525]: expected a closure that implements the `FnMut` trait, but this closure only implements `FnOnce`
   --> src\lib.rs:74:41
    |
74  |       *g.borrow_mut() = Some(Closure::new(move || {
    |                              ------------ -^^^^^^
    |                              |            |
    |  ____________________________|____________this closure implements `FnOnce`, not `FnMut`
    | |                            |
    | |                            required by a bound introduced by this call
75  | |
76  | |         my_dummy_function_requiring_move(captured);
    | |                                          -------- closure is `FnOnce` because it moves the variable `captured` out of its environment
77  | |         log(format!("Counter: {} captured value: {:?}", i, &captured).as_str());
...   |
85  | |         i += 1;
86  | |     }));
    | |_____- the requirement to implement `FnMut` derives from here
    |
    = note: required for `[closure@src\lib.rs:74:41: 74:48]` to implement `IntoWasmClosure<dyn FnMut()>`
note: required by a bound in `wasm_bindgen::prelude::Closure::<T>::new`
   --> C:\Users\ldfallasu2\.cargo\registry\src\github.com-1ecc6299db9ec823\wasm-bindgen-0.2.84\src\closure.rs:271:12
    |
271 |         F: IntoWasmClosure<T> + 'static,
    |            ^^^^^^^^^^^^^^^^^^ required by this bound in `wasm_bindgen::prelude::Closure::<T>::new`

It is really impressive to see how to compiler shows the location of the error and the related areas. As the error message says, the problem here is that we are moving a value out of the closure. This fact prevents the compiler from assuming that our closure implements the FnMut trait (More information here https://doc.rust-lang.org/stable/book/ch13-01-closures.html#moving-captured-values-out-of-closures-and-the-fn-traits ).

The solution in this case is simple, since the move was not really required I created an alternative version of the function that do not require a “move”:

fn my_dummy_function_requiring_ref(ms: &MyStruct) {
   write_debug(format!("-- {:?}", ms).as_str());
}

The call is changed to my_dummy_function_requiring_ref(&s); which removes the compilation error.

The examples created in this post use the following utility functions and declarations:

fn window() -> web_sys::Window {
    web_sys::window().expect("not global 'window'")
}

fn request_animation_frame(f: &Closure<dyn FnMut()>) {
    window()
        .request_animation_frame(f.as_ref().unchecked_ref())
        .expect("Call to request animation frame ");
}

#[wasm_bindgen]
extern {
    #[wasm_bindgen(js_namespace = console)]
    fn log(s: &str);
}

Conclusions

I think there are two main conclusions for this post:

Read the documentation carefully!
Take time to read the compiler errors. The Rust compiler team put a lot of effort explain the error and help you locate the origin of the problem.

Implementing WHILE in a toy BASIC interpreter

2022-11-02T20:20:00.000-06:00

While working on a toy BASIC implementation, I ran into an unexpected challenge implementing the WHILE instruction.

The implementation of this instruction seems simple. Here is an example :

10 X = 1
20 WHILE X <> 10
30 PRINT X
40 X = X + 1
50 WEND

This program is going to print the numbers from 1 to 9. The WHILE statement is really a combination of WHILE and WEND. The WEND statement indicates the end of the 'block'.

The first challenge is to create a relation between these statements. This relation is going to be used by the interpreter to identify where to 'jump' when evaluating WHILE and WEND. One interesting challenge is that BASIC (GWBASIC) supports `nested` WHILE blocks:

10 X = 10
20 Y = 10
30 WHILE X <> 0
40 PRINT "X = ", X
50 Y = 10
60 WHILE Y <> 0
70 PRINT "y =", Y
80 Y = Y - 1
90 WEND
100 X = X - 1
110 WEND

Since the source of the original GW-BASIC implementation is published in GitHub, we can take a look at it here: https://github.com/microsoft/GW-BASIC. There is code in GWMAIN.ASM used to search for the correspoing WEND of a WHILE (a code block called WNDSCN). This also seems to be used to locate the FOR=/=NEXT pair of instructions. It seems that the interpreter tries to find the matching instruction by scanning the instructions that follow the WHILE. A counter is used to keep track of nested WHILE/WEND blocks:

..
PUBLIC	WNDSCN
WNDSCN: MOV	CL,LOW OFFSET ERRWH	;SCAN FOR MATCHING WEND THIS IS ERROR IF FAIL
.
.
.
FORINC: INC	CH		;INCREMENT THE COUNT WHENEVER "FOR" IS SEEN
.
.
.
        JZ	SHORT NXTLOK	;FOR/NEXT SEARCHING
        CMP	AL,LOW OFFSET $WHILE	;ANOTHER WHILE/WEND NEST?
        JZ	SHORT FORINC
        CMP	AL,LOW OFFSET $WEND
        JNZ	SHORT FNLOP
        DEC	CH
.
.
.

1.2 Strategy

One way to start implementing this feature in the interpreter was to add a table (pair_instruction_table) to the interpreter context. This table is going to keep the relation between instructions. In particular it will be used to relate a WHILE and WEND pair. In the near future it will keep the relation between FOR and NEXT.

The evaluation context will now look like this:

pub struct EvaluationContext<'a> {
    pub variables: HashMap<String, ExpressionEvalResult>,
    pub array_variables: HashMap<String, GwArray>,
    pub jump_table: HashMap<i16, i16>,
    pub underlying_program: Option<&'a mut GwProgram>,   
    pub pair_instruction_table: HashMap<i16, i16>,
}

With this information we can create a search function that emulates the same behavior as WNDSCN . Something like this:

fn find_wend(line: i16, real_lines: &Vec<&Box<dyn GwInstruction>>) -> i16{
    let mut curr_line = line + 1;
    let mut while_end_balance = 0;
    loop {
        if curr_line >= real_lines.len() as i16 {
            break;
        } else if let Some(ref instr) = real_lines.get(curr_line as usize) {

            if instr.is_while() {
                while_end_balance += 1;
            }
            if instr.is_wend() {
                if while_end_balance  == 0 {
                    return curr_line as i16;
                } else {
                    while_end_balance -= 1;
                }
            }
        }
        curr_line += 1;
    }
    return -1;
}

Notice that we solve the problem of nested WHILE/WEND blocks by keeping a counter while_end_balance.

With this utility we can create the implementation of the WHILE statement with the following code:

impl GwInstruction for GwWhile {
    fn eval (&self, line: i16, context : &mut EvaluationContext) -> InstructionResult {
        let mut wend_line : i16 = 0;

        // Find the cached corresponding line for this WHILE statement
        if let Some(corresponding_wend) =  context.pair_instruction_table.get(&line) {      
            wend_line = *corresponding_wend;
        } else if let Some(ref real_lines) = context.real_lines {
            // Try to look for the WEND statement in the program lines
            let index_of_wend = find_wend(line, real_lines);
            if index_of_wend == -1 {
                return InstructionResult::EvaluateToError(String::from("WHILE WITHOUT WEND"));
            } else {
                context.pair_instruction_table.insert(line, index_of_wend);
                context.pair_instruction_table.insert(index_of_wend, line);
            }       
            wend_line = index_of_wend;
        }

        // Evaluate the condition and move the following line
        let condition_evaluation = self.condition.eval(context);
        match condition_evaluation {
            ExpressionEvalResult::IntegerResult(result) if result == 0 => {
                InstructionResult::EvaluateLine(wend_line + 1)
            }
            ExpressionEvalResult::IntegerResult(_) => {
                InstructionResult::EvaluateNext
            }       
            _ => {
                InstructionResult::EvaluateToError(String::from("Type mismatch"))
            }
        }
    }

1.3 Additional problems

While working in this implementation I found an additional problem. In BASIC you can have several statements in one line separated by colons (:) . For example you can have a complete WHILE block in just one line:

10 X = 1 : WHILE X < 10 : PRINT X : X = X + 1 :WEND

Or you can have a WHILE in one line and the WEND inside another line:

10  x = 1
20 WHILE x <> 5
30 print x
40 x = x + 1 : print "a" : WEND
50 print "END"

To support his scenario the program is "flattened" before executing it. For example, the previous snippet becomes is converted to:

0  x = 1
1 WHILE x <> 5
2 print x
3 x = x + 1 
4 print "a" 
5 WEND
6 print "END"

This way it is easy to implement a jump between instructions that didn't exist in as explicit lines in the original program. This transformation is performed before running the program:

...
   let real_lines = &mut context.real_lines.as_mut().expect("instr vector");
   for e in self.lines.iter() {
      real_lines.push(&e.instruction);
      if let Some(ref rest) = e.rest_instructions {
         for nested in rest {
            real_lines.push(&nested);
         }
      }	    
   }
...

Something that needs to be improved is the way to identify existing WHILE and WEND instructions. Sadly, right now this is implemented using a pair of methods is_while and is_wend. These methods are defined in the GwInstruction trait and overriden on GwWhile and GwWend. This is ugly since these methods are very specific to this problem. It doesn't seem right to have them in GwInstruction.

One alternative to solve this problem is to redesign the code representation to include a way to retrieve the original instruction from a GwInstruction. This is one of the things that will be implemented next.

Code for the interpreter is here: https://github.com/ldfallas/rgwbasic .

Executing code from a buffer with Rust on Windows

2022-04-16T08:44:00.000-06:00

Creating and executing code at runtime is an intriguing topic. It is one of the pieces that makes it possible to write JIT compilers for things like Java or C#.

Creating a byte array with executable instructions and “casting” that array to a function pointer is not enough . For security reasons, modern operating systems require you to specify which region of memory of your program is executable. On Windows the VirtualAlloc and VirtualProtect functions are used to do this.

There is a nice StackOverflow answer: https://stackoverflow.com/questions/40936534/how-to-alloc-a-executable-memory-buffer by user Christian Hackl on how to use these API functions. In this post I’m going to try to replicate the C++ example from the SO post in Rust .

The first is to be able to call VirtualAlloc and VirtualProtect from Rust. There are several ways to call “C” style functions in Rust. However to call these Win32 API functions I am going to use Rust for Windows. This package provides an easy way to call into Win32 API .

First we start by adding the the windows crate to our dependencies. And we also specify that we need a couple of features:

//Cargo.toml
...
[dependencies.windows]
version="0.35.0"
features = [
    "alloc",
    "Win32_Foundation",
    "Win32_System_Memory",
]

Here, the most important feature is “Win32_System_Memory” which allows us to call VirtualAlloc and VirtualProtect. You can see that in the "Required features" section of the documentation entry here https://microsoft.github.io/windows-docs-rs/doc/windows/Win32/System/Memory/fn.VirtualAlloc.html

Now that we have this functions we can rewrite the example from the StackOverflow question:

use windows::{
    core::*,
    Win32::Foundation::*,
    Win32::System::Memory::*,
};

fn main() -> Result<()> {
    unsafe {
        let buffer = VirtualAlloc(
            std::ptr::null(),
            4096,
            MEM_COMMIT | MEM_RESERVE,
            PAGE_READWRITE,
        );
        let buff_arr = std::mem::transmute::<*mut ::core::ffi::c_void, &mut [u8; 6]>(buffer);
        buff_arr[0] = 0xb8; // MOV opcode
        buff_arr[1] = 0x05; // '5' value
        buff_arr[2] = 0x00;
        buff_arr[3] = 0x00;
        buff_arr[4] = 0x00;
        buff_arr[5] = 0xc3; // RET
        let mut dummy: [PAGE_PROTECTION_FLAGS; 1] = [PAGE_PROTECTION_FLAGS::default()];
        let vpResult = VirtualProtect(buffer, 6, PAGE_EXECUTE_READ, dummy.as_mut_ptr());
        if !vpResult.as_bool() {
            GetLastError().to_hresult().ok()?;
        }
        let new_function = std::mem::transmute::<*mut ::core::ffi::c_void, fn() -> i32>(buffer);
        let result = new_function();
        println!("The result is {}", result);
        VirtualFree(buffer, 0, MEM_RELEASE);
    }
    Ok(())
}

After compiling and running this example we can see:

The result is 5

I was very happy the first time I saw that running!. Here the buff_array buffer has real x86 instructions equivalent to something like:

mov eax, 0x5
ret

Encoding this instructions is a very complex process. The documentation contains dense tables explaining the format for example for MOV or RET.

Also it is clear that we need unsafe Rust here since we are dealing with low level code.

The process of encoding the instructions is very complex. We can take a shortcut using the iced-x86 crate. This really cool library has a complete x86 assembler and dissembler. It was very easy (with my limited Rust knowledge) to adapt it to this little example.

For we include it in the Cargo.toml file:

[dependencies.iced-x86]
version = "1.17.0"
features = ["code_asm"]

Now we can create the code using the nice API that iced-x86 provides. Here I’m adding a call to a function defined in the same program.

fn print_hello() -> u32 {
    println!("Hello!!!");
    1
}

fn encode_small_program() -> ::core::result::Result<Vec<u8>, asm::IcedError> {
    let mut assembler = asm::CodeAssembler::new(64)?;
    unsafe {
        let print_hello_addr = std::mem::transmute::<fn() -> u32, u64>(print_hello);
        assembler.sub(asm::rsp, 0x28)?;
        assembler.mov(asm::rax, print_hello_addr)?;
        assembler.call(asm::rax)?;
        assembler.mov(asm::eax, 0x7)?;
        assembler.add(asm::rsp, 0x28)?;
        assembler.ret()?;
    }

    let instr = assembler.take_instructions();
    let block = InstructionBlock::new(&instr, 0);
    let res = BlockEncoder::encode(64, block, BlockEncoderOptions::NONE)?;
    Ok(res.code_buffer)
}

We can modify the make program to use this new function:

    let encoded_program = encode_small_program().unwrap();
    let p = encoded_program.as_ptr();

    unsafe {
        let buffer = VirtualAlloc(
            std::ptr::null(),
            4096,
            MEM_COMMIT | MEM_RESERVE,
            PAGE_READWRITE,
        );
        let buff_arr = std::mem::transmute::<*mut ::core::ffi::c_void, *mut u8>(buffer);

        std::ptr::copy_nonoverlapping(p, buff_arr, encoded_program.len());

        let mut dummy: [PAGE_PROTECTION_FLAGS; 1] = [PAGE_PROTECTION_FLAGS::default()];
        let vpResult = VirtualProtect(buffer, 6, PAGE_EXECUTE_READ, dummy.as_mut_ptr());
        if !vpResult.as_bool() {
            GetLastError().to_hresult().ok()?;
        }
        let new_function = std::mem::transmute::<*mut ::core::ffi::c_void, fn() -> i32>(buffer);
        let result = new_function();
        println!("The result is {}", result);
        VirtualFree(buffer, 0, MEM_RELEASE);
    }

Running this program now shows:

Hello!!!
The result is 7

This experiment bring intriguing possibilities for future posts!.

Exploring a Webpack stats file with Prolog

2022-04-15T07:54:00.001-06:00

A couple of days ago I was reading about the Webpack statistics file created using the following command line options:

npx webpack --profile --json

This file contains a lot of information collected by Webpack about the project being processed. The information in this file is used by nice visualization tools like Webpack Bundle Analyzer.

The dependency graph is included in this file. That is, all the dependencies between modules of the project. Being able to perform queries on this data could be useful to get insights into the code.

There are many tools to process JSON, but I wanted to try to use SWI-Prolog to see if I can get information from this file.

The information I am looking for is the module dependency information. By taking a look at the Module Object we can get this information using the reasons property.

We can start by parsing the stats.json file using SWI-Prolog builtin library for reading JSON:


:- use_module(library(http/json)).

read_json_file(FileName, Terms) :-
    open(FileName, read, Stream),
    json_read(Stream, Terms),
    close(Stream).

For convenience, I'm adding the loaded file to the Prolog database using assert/1:


?- read_json_file('c:\\smallexample\\stats.json',F), assert(testfile(F)).
F = json([hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json(...), ... = ...|...]).

Now that we can load the stats file we can start by performing simple queries. For example we can start by looking at top-level properties:


?- testfile(json(Contents)), member(Name=_,Contents).
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = hash ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = version ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = time ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = builtAt ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
Name = publicPath ;
Contents = [hash='71029f66a2fb5a3de779', version='5.70.0', time=3694, builtAt=1649775116882, publicPath=auto, outputPath='C:\\smallexample\\public', assetsByChunkName=json([...]), assets=[...], ... = ...|...],
...

Here notice, that I'm using member/2 to get the name of the properties in the main file.

By the way, as a side note, yesterday I learned that you can exclude variables from Prolog results using the following (Stack overflow question here) goal:


set_prolog_flag(toplevel_print_anon, false).

With this nice tip, we can exclude variables that start with underscore from the results:


?- testfile(json(_Contents)), member(Name=_, _Contents).
Name = hash ;
Name = version ;
Name = time ;
Name = builtAt ;
Name = publicPath ;
Name = outputPath ;
Name = assetsByChunkName ;
Name = assets ;
Name = chunks ;
Name = modules ;
Name = entrypoints ;
Name = namedChunkGroups ;
Name = errors ;
Name = errorsCount ;
Name = warnings ;
Name = warningsCount ;
Name = children.

Now I can access the modules section to extract the reasons property. This property has information on modules that depend on the current module. For example say that we have a small TypeScript program that have the following structure:

We can start the exploration of this project by looking at the contents of the modules objects.


?- testfile(json(_Contents)),
|    member( ('modules'=_Modules), _Contents),
|    member( json(_ModulePropsList), _Modules),
|    member( ('name'=ModuleName), _ModulePropsList).
ModuleName = './src/index.ts' ;
ModuleName = './src/parser.ts' ;
ModuleName = './src/FuncApply.ts' ;
ModuleName = './src/NumLiteral.ts' ;
ModuleName = './src/SymbolObj.ts' ;
ModuleName = './src/BaseObject.ts' ;
ModuleName = 'webpack/runtime/define property getters' ;
ModuleName = 'webpack/runtime/hasOwnProperty shorthand' ;
ModuleName = 'webpack/runtime/make namespace object' ;

We can create a new goal with the code above which we can use later:


module_name(json(ContentsList), Name) :-
    member(('modules'=Modules), ContentsList),
    member(json(ModulePropertiesList), Modules),
    member('name'=Name, ModulePropertiesList).


module_properties_by_name(json(ContentsList), Name, ModulePropertiesList) :-
    member(('modules'=Modules), ContentsList),
    member(json(ModulePropertiesList), Modules),
    member('name'=Name, ModulePropertiesList).

Now that we located the modules, we can get the contents of the reasons property.


?- testfile(_Json),
|    module_properties_by_name(_Json, './src/BaseObject.ts', _Props),
|    member((reasons=_Reasons), _Props),
|    member(json([_|[RefModName|_]]), _Reasons).
RefModName =  (module='./src/FuncApply.ts') ;
RefModName =  (module='./src/FuncApply.ts') ;
RefModName =  (module='./src/NumLiteral.ts') ;
RefModName =  (module='./src/NumLiteral.ts') ;
RefModName =  (module='./src/SymbolObj.ts') ;
RefModName =  (module='./src/SymbolObj.ts') ;
false.

(Repeated results seem to indicate different "reasons")

With this data we can generate Graphviz representation (for example the one used in the graph above).


name_modules([], [], _).
name_modules([ModName|Rest], [ModNamePair|RestResult], Counter) :-
    number_string(Counter, CounterStr),
    string_concat('M', CounterStr, ModuleId),
    ModNamePair = ModName - ModuleId,
    NewCounter is Counter + 1,
    name_modules(Rest, RestResult, NewCounter).

module_dependencies_by_reason(File, (Name-Referencer)) :-
    module_name(File, Name),
    module_properties_by_name(File, Name,Props),
    member((reasons=R), Props),
    member(json([_|[(module=Referencer)|_]]),R).

generate_node_descriptions([],Result, Result).
generate_node_descriptions([(Name-Id)|Rest],TmpResult, OutStr) :-
    format(atom(Tmp3), '~a[label="~a"];\n', [Id, Name]),
    string_concat(TmpResult, Tmp3, OutStrTmp),
    generate_node_descriptions(Rest, OutStrTmp, OutStr).

generate_node_relations([], _, Result, Result).
generate_node_relations([(Target-Src)|Rest], NodeIds, TmpResult, Result) :-
    get_assoc(Src, NodeIds , SrcCode),
    get_assoc(Target, NodeIds , TargetCode),
    format(atom(RelationStr), '~a -> ~a;\n', [SrcCode, TargetCode]),
    string_concat(TmpResult, RelationStr, NewTmpResult),
    generate_node_relations(Rest, NodeIds, NewTmpResult, Result), !.
generate_node_relations([_|Rest], NodeIds, TmpResult, Result) :-
    generate_node_relations(Rest, NodeIds, TmpResult, Result),!.


dot_file_from_reasons(File, DotFileStr) :-
    findall(Name, module_name(File, Name), NameList),
    name_modules(NameList, CodedList, 0),
    list_to_assoc(CodedList, AssocNameModList),!,
    setof(Pairs, module_dependencies_by_reason(File, Pairs), PairList),
    generate_node_descriptions(CodedList, 'digraph G {\n', DotFileStrTmp1),
    generate_node_relations(PairList, AssocNameModList, DotFileStrTmp1, DotFileStrTmp2),
    string_concat(DotFileStrTmp2, '}', DotFileStr).

I am impressed by the power of Prolog. I have always admired the way it works differently dependending on how you use it. For example the way member/2 was used above to extract internal elements from terms. One would assume that this predicate is only used to test list membership. However by the power of Prolog unification and backtracking we can used to explore the contents of a list.

A small programming exercise in APL #2: Combinations

2021-12-26T16:03:00.000-06:00

In this post I'm going to continue my APL exploration by working in a possible solution for the problem of generating all the combinations of 'n' elements of an array.

Generating the 'n' combinations of an array means generating a sequence of all possible 'n' unique elements from the original array . For example, given the array 12, 34, 35, 65 all possible '2' combinations of this array are:

12, 34
12, 35
12, 65
34, 35
34, 65
35, 65

Notice that order is not important. That is "12, 34" is considered to be the same as "34, 12". Generating all 'n' permutations of the elements of an array may be an interesting problem for a future post.

One of my goals was to be as idiomatic as possible (with my limited APL knowledge!). Because of this, I will try avoid using explicit loops or conditionals and instead use array operators.

Strategy

The general strategy for solving this problem was to calculate all possible boolean arrays having 'n' bits and use Compress to extract the elements.

For example, in the array 12, 34, 35, 65 all possible boolean vectors having two bits on are:

1 1 0 0
1 0 1 0
1 0 0 1
0 1 1 0
0 1 0 1
0 0 1 1

Using these vectors in APL we get the elements we need:


      a
12 34 35 65
      1 1 0 0 / a
12 34
      1 0 1 0 / a
12 35
      1 0 0 1 / a
12 65
      0 1 1 0 / a
34 35
      0 1 0 1 / a
34 65
      0 0 1 1 / a
35 65

This strategy is not efficient in space or time but it allowed me to explore a solution without using imperative constructs or recursion.

Generating the boolean arrays

To generate the boolean arrays we start by generating all integer values required to encode n-bit values:


      a ← 12 34 35 65
      ⍳ ( ¯1 + 2 * ⍴ a)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

When converting these numbers to binary we get all possible boolean vectors for 4-bit values. This is same as the same as the length of our sample array. We can see these arrays if we use Encode and Each:


     { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ⍳ ( ¯1 + 2 * ⍴ a)
 0 0 0 1  0 0 1 0  0 0 1 1  0 1 0 0  0 1 0 1  0 1 1 0  0 1 1 1  1 0 0 0\
  1 0 0 1  1 0 1 0  1 0 1 1  1 1 0 0  1 1 0 1  1 1 1 0  1 1 1 1

I can reshape this array to get a better representation:


      15 1 ⍴ { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ⍳ ( ¯1 + 2 * ⍴ a)
 0 0 0 1
 0 0 1 0
 0 0 1 1
 0 1 0 0
 0 1 0 1
 0 1 1 0
 0 1 1 1
 1 0 0 0
 1 0 0 1
 1 0 1 0
 1 0 1 1
 1 1 0 0
 1 1 0 1
 1 1 1 0
 1 1 1 1

Now we are only interested in boolean arrays with 'n' number of '1' bits. For n = 2 we can get only those elements using the Compress operator:


      seq ← ⍳ ( ¯1 + 2 * ⍴ a)
      n ← 2
      ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
3 5 6 9 10 12

Again we can visualize these values using reshape and encode:


      6 1 ⍴ { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
 0 0 1 1
 0 1 0 1
 0 1 1 0
 1 0 0 1
 1 0 1 0
 1 1 0 0

Selecting desired elements

Now that we have our boolean arrays we can pick all the values using Compress:


     {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
 35 65  34 65  34 35  12 65  12 35  12 34

And again we can reshape this array to see it better:


      6 1 ⍴ {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
 35 65
 34 65
 34 35
 12 65
 12 35
 12 34

Finally we can pack these expressions in a function:


∇r←a ncombinations1 n;seq
  seq ← ⍳ ( ¯1 + 2 * ⍴ a)
  r ← {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ (({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq\
)
∇

We can use it like this:


      1 2 3 4 5 ncombinations1 3
 3 4 5  2 4 5  2 3 5  2 3 4  1 4 5  1 3 5  1 3 4  1 2 5  1 2 4  1 2 3

APL has a build-in Binomal operator which allows us to calculate the number of combinations. For example:


      a ← 1 2 3 4 5
      (3 ! ⍴ a) = ⍴ a ncombinations3 3
1

Final words

It was very interesting (and difficult) to try to write a solution of this problem using only array operations. Reading the definion of the ncombinations1 I noticed that there are many expressions nested in parenthesis (maybe related to my limited APL knowledge). I know that APL developers value terseness, so I tried to reduce the size of the expressions. Here is a new version:


∇r←a ncombinations3 n;seq
  s ← ⍳ ( ¯1 + 2 * ⍴ a)
  b ← 2⍴⍨⍴a
  r ← {a/⍨b⊤⍵}¨s/⍨{n=+/b⊤⍵}¨s
∇

I was able to remove a couple of parenthesis by taking advantage of the Commute operator.

GNU APL was used to create the examples in this post.

A small programming exercise in APL #1

2021-07-26T17:55:00.000-06:00

This post shows a possible solution written in APL for the following programming problem:

Given an array, determine if a value 'number' is found in every consecutive segment of 'size' elements.

For example, given this array:


1,2,1,3,4,1

This predicate is true for number = 1 and size = 2. Because '1' is found in (1,2), (1,3) and (4,1).

A possible APL solution

This is a possible solution to this problem (using my limited APL knowledge).


∇ z←array arraysegments args
  number ← args[1]
  size ← args[2]
  z ← ∧/ ({number∊⍵} ⍤1) (((⍴ array) ÷ size) , size) ⍴ array
∇

This function is called arraysegments and is a dyadic function. It receives the array to validate as the left argument and on the right argument a two-value array with the value ('number') to find inside the array segment and the length of the segments.

An example of using this function


      1 2 1 3 4 1 arraysegments 1 2
1
      1 2 1 3 4 1 arraysegments 2 2
0
      1 2 1 3 4 1 arraysegments 1 3
1
      300 200 300 300 400 500 arraysegments 300 3
1
      300 200 300 300 400 500 arraysegments 400 3
0

To see how this function works I'm going to start by defining some sample data:


      array ← 1 2 1 3 4 1 
      number ← 1
      size ← 3

1. First, start by reshaping the input array into a matrix with each row being


      (((⍴ array) ÷ size) , size)
2 3      
      (((⍴ array) ÷ size) , size) ⍴ array
1 2 1
3 4 1

2. Now that we have the 'groups' as the rows of the new matrix, we apply the membership operation to each element of the matrix. To do this, we use the rank operator in conjunction with an inline function to test the membership:


      ({number∊⍵} ⍤1) (((⍴ array) ÷ size) , size) ⍴ array
1 1

As a final step we apply the reduce operator '/' with the And operator to see if the input number exists in every group:


      ∧/ ({number∊⍵} ⍤1) (((⍴ array) ÷ size) , size) ⍴ array
1

There are some problem with this solution. For example if the array cannot be splitted in groups of the specified size you get a runtime error.

Reading APL #1: Roman numerals to decimal

2021-06-13T17:26:00.000-06:00

As part of a new recurring series I’m going to take a small APL function or program and try to break it down in to its parts. I hope this is going to help me get a better understanding of the language.

Reading APL

This article: The APL Programming Language Source Code has a nice introduction to the language and its history. One sentence in this article is key to understand how to read APL code:

Order of evaluation: Expressions in APL are evaluated right-to-left, and there is no hierarchy of function precedence

Two additional concepts are useful to understand when trying to read code:

A monadic function : an operation applied to the right expression
A dyadic function : a function applied to two arguments ( left and right )

One example of these elements is the following:

      2 2 ⍴⍳⍴ 8 8 8 8
1 2
3 4

Here is an example of evaluating this expression from right to left.

      myarray←8 8 8 8
      myarray
8 8 8 8
      length_of_my_array← ⍴ myarray
      length_of_my_array
4
      new_array← ⍳ length_of_my_array
      new_array
1 2 3 4
      2 2 ⍴ new_array
1 2
3 4

Note that here we use ⍴ both as a monadic function (Shape) and as a dyadic function (Reshape).

The example

For this post I’m going to use a small function I found in the APL Utils repository by Blake McBride . This function takes a string representing a roman number and converts it back to an integer:

∇z←Romanu a
 z←+/(¯1+2×z≥1↓z,0)×z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]
∇

An example of executing this function:

      Romanu 'XVII'
17
      Romanu 'IX'
9

Reading the code

Let’s start by reading the function definition:

∇z←Romanu a
   ...
∇

This code represents the definition of the Romanu function with an a argument.

Now, the interesting part is reading the body of the function which performs the actual calculations.

z←+/(¯1+2×z≥1↓z,0)×z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

As described in the previous section we need to read this code from right to left. To do this we need to ‘parse’ it and extract the largest complete expression from the right:

(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

This expression is a particular example of an bracked indexing expression(https://aplwiki.com/wiki/Bracket_indexing). A simpler example of this is:

      (10 20 30)[1]
10
      (10 20 30)[2]
20
      (10 20 30)[3]
30
      (10 20 30)[3 1 1 2]
30 10 10 20
      a←10 20 30
      a[3 1 1 2]
30 10 10 20

As shown above this expression is used to select elements of an array into a new array.

In the Romanu function this expression is used to assign a value to eacho of the roman numeral 'digits':

'IVXLCDM'⍳a

Here the dyadic form of ⍳ or ‘Index of’(https://aplwiki.com/wiki/Index_Of) . This expression returns the indices of the left side array of the elements on the right. Here are some examples of this expression in action:

      (10 20 30) ⍳ (20 20 10 20 30 10)
2 2 1 2 3 1

When using this expression with the roman digits string we get the equivalent positions:

      'IVXLCDM' ⍳ 'XVII'
3 2 1 1

As shown above we get the positions in the roman digits string. Returning to our original expression we use the selection expression to get the value of each roman digit in decimal form:

      (1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'XVII']
10 5 1 1

As you can see here we almost have the result for the conversion of XVII. We just have to sum these numbers up to get 17.

Going back to the example and continuing reading for right to left we find and assignment to the z variable.

z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

The rest of the expression is going to take care of subtracting the value required for converting numbers like: ‘IX’.

Reading the expression from right to left we find:

(¯1+2×z≥1↓z,0)×z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

This expression is a multiplication (×). For arrays it applies to operation element wise :

      2 3 × 4 5
8 15

The interesting part here is the expression being used for the left operand of the multiplication:

(¯1+2×z≥1↓z,0)

Again we are going to analyze this expression from right to left.

z,0

This expression returns the an array with a zero at the end:

     (10 20 30),0
10 20 30 0
      z←(1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'IX']
      z,0
1 10 0

Then the following operation drops the first element of the array:

      1↓z,0
10 0

The drop expression returns an array without the first ‘n’ elements:

      3 ↓ 20 30 40 30 20 10
30 20 10

The following section is very interesting . We use the greater equal expression ≥ to determine which elements of the array are greater or equal than the second array:

       10 20 30 ≥ 5 21 30
1 0 1

Applying this to our example for ‘IX’:

      z←(1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'IX']
      z≥1↓z,0
0 1

This is interesting: we get an array that indicates which entries are lower than its successor. In the case of IX we are getting 0 1 saying that the first digit needs to be subtracted from the second. The final part performs this operation.

      z←(1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'IX']
      z≥1↓z,0
0 1
      ¯1+2×z≥1↓z,0
¯1 1
      (¯1+2×z≥1↓z,0)×z
¯1 10

Now we can finally sum all the numbers on the resulting array and get the decimal number. This is performed with the reduce operation / (https://aplwiki.com/wiki/Reduce) with the + operator.

      +/ 10 20 30
60
      10 + 20 + 30
60

Applying this operation to our partial expression returns the expected value:

      (¯1+2×z≥1↓z,0)×z
¯1 10

      +/(¯1+2×z≥1↓z,0)×z
9

We can see this process in action by executing all the parts with an interesting number like MMCXXIX (2129).

      z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]
      z
1000 1000 100 10 10 1 10
      z,0
1000 1000 100 10 10 1 10 0
      1↓z,0
1000 100 10 10 1 10 0
      z≥1↓z,0
1 1 1 1 1 0 1
      ¯1+2×z≥1↓z,0
1 1 1 1 1 ¯1 1
      +/(¯1+2×z≥1↓z,0)×z
2129

A small experiment with fragment shaders

2021-06-05T16:11:00.000-06:00

I wanted to work on an experiment that allowed me to learn a little bit about modern graphics programming with OpenGL (with shaders that is). A nice choice is the 'hello world; of graphics programming: rendering the Mandelbrot set.

In this post I'm going to record my experiences from zero to a basic rendering of the Mandelbrot set. Here is how the final product looks like:

1.1 The experiment

To get the "native" feel I decided to create the example as a C++ Windows program (but using mostly C). There are many options to do this, but the following combination worked for me:

A C++ Compiler : Visual Studio build tools
An IDE: VSCode C++ integration (https://code.visualstudio.com/docs/cpp/config-msvc)
OpenGL libraries and loader: Windows SDK (https://developer.microsoft.com/en-us/windows/downloads/windows-10-sdk/) + GLAD (https://glad.dav1d.de/)
A build system (although the experiment is quite small): CMake
A simple GUI library: GFLW (https://www.glfw.org/)

I'm quite impressed with the VSCode C++ support .

1.2 The program

The structure of the program it's very simple since it is a hello world program. We start with some initialization boilerplate (quite short since GLFW abstracts away many windowing system details):

GLFWwindow *window;

glfwSetErrorCallback(error_callback);

if (!glfwInit())
{
  return -1;
}

window = glfwCreateWindow(
    800,
    600,
    "Shader example",
    NULL,
    NULL);

if (!window)
{
  glfwTerminate();
  return -1;
}
glfwMakeContextCurrent(window);
gladLoadGL();

This code creates a 800x600 window ready to use with OpenGL. To access OpenGL I'm using a loaded called GLAD.

1.3 Strategy for rendering the Mandelbrot set

In this experiment I am going to use the "escape time algorithm" to render the image of the Mandelbrot set. This algorithm is very simple and easy to implement using fragment shaders.

Before we start implementing this algorithm we are going to define geometry to and apply a fragment shader to it. Since we are going to render a 2D image, the easiest way to do this is to create two triangles that fill the viewport. Since OpenGL uses normalized coordinates we can define this triangles using values between -1.0 and 1.0 using the following code:


GLfloat vertices[] = {
    -1.0f, 1.0f,
    1.0f, 1.0f,
    1.0f, -1.0f,

    1.0f, -1.0f,
    -1.0f, -1.0f,
    -1.0f, 1.0f};

GLuint vertex_buffer;
glGenBuffers(1, &vertex_buffer);
glBindBuffer(GL_ARRAY_BUFFER, vertex_buffer);
glBufferData(
    GL_ARRAY_BUFFER,
    sizeof(vertices),
    vertices,
    GL_STATIC_DRAW);

This code is going to define two triangles that are highlighted here:

This code is also making these coordinates as the active ones (glBindBuffer).

Now we need both a vertex shader to apply transformations to the vertex data and a fragment shader to provide color to our pixels.

Our vertex shader is really simple since we are not performing transformations to the triangles:


#version 110
attribute vec2 pos;
void main() {
  gl_Position = vec4(pos, 0.0, 1.0); 
}

In this vertex shader we can manipulate the vertices of the triangles that we are going to draw. The X and Y values of the vertices are passed using the pos attribute. This is not automatic, we need to specify the data that is passed to the vertex shader in the C++ program for the pos attribute. This is accomplished by using the following code:

GLuint vpos_location;
vpos_location = glGetAttribLocation(program, "pos");
glEnableVertexAttribArray(vpos_location);
glVertexAttribPointer(
    vpos_location,
    2,
    GL_FLOAT,
    GL_FALSE,
    sizeof(float) * 2, 0);

One of the most interesting aspects of this snippet is the [glVertexAttribPointer](https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glVertexAttribPointer.xhtml). This function specifies the way the values are going to be extracted from the array data. Here we specify:

vpos_location the attribute that we are configuring
2 the number of components (remember that pos is vec2)
GL_FLOAT the data type of the data element
GL_FALSE the data is not normalized. This seems to be important for integer data (here we use floating point data, more info here: https://gamedev.stackexchange.com/questions/10859/glvertexattribpointer-normalization).
sizeof(float) * 2 The offset between consecutive elements. This is useful when the array has different kinds of data elements. For example vertices and normals mixed in the same array. This parameter must be used to skip undesired data elements. In our case 0 is also a valid value since we only have vertex data.

The boilerplate code to compile this shader is the following:

GLint shader_compiled;
GLuint vertex_shader;
vertex_shader = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(vertex_shader, 1, &vertex_shader_src, NULL);
glCompileShader(vertex_shader);

glGetShaderiv(vertex_shader, GL_COMPILE_STATUS, &shader_compiled);
if (shader_compiled != GL_TRUE)
{
  std::cout << "Vertex shader not compiled";
  GLchar message[1023];
  GLsizei log_size;
  glad_glGetShaderInfoLog(vertex_shader, 1024, &log_size, message);
  std::cout << message;
}

When you are starting with shaders, the call to glGetShaderiv is useful . This code returns error information that occurred when compiling the shader.

There is an important part of the process that is always confusing to me. Many things in the OpenGL API change a global state aspect of the functionality. For example the glBindBuffer function used above is going to determine the set of vertices used by the glDrawArrays function.

Here's the code of the fragment shader:

#version 110
uniform vec4 boundaries;  
void main()  {
   float x0, y0,x ,y;
   x0 = gl_FragCoord.x*((boundaries.z - boundaries.x)/800.0) + boundaries.x ;
   y0 = gl_FragCoord.y*((boundaries.w - boundaries.y)/600.0) + boundaries.y ;
   int maxIteration, iteration;
   maxIteration = 256;
   iteration = 0;
   while((x*x + y*y <= 4.0) && (iteration < maxIteration)) {
      float tmp;
      tmp = x*x - y*y + x0;
      y = 2.0*x*y + y0;
      x = tmp;
   iteration = iteration + 1;
   }
   gl_FragColor = vec4(vec3(0.0, float(iteration)/256.0, 0.0 ),1.0);
}

This code contains a simple implementation of the escape time algorithm described in Wikipedia here: https://en.wikipedia.org/wiki/Plotting_algorithms_for_the_Mandelbrot_set#Unoptimized_naïve_escape_time_algorithm .

For simplicity this shader chooses only between shades of green for the colors.

1.4 Zooming in

As part of this experiment I wanted to have the possibility to zoom a particular area of the Mandelbrot set. To add support for this we need to pass the top-left and bottom right coordinates of the area when want to render. This is accomplished by declaring a uniform to pass this value from the C++ program.

Here is the code that uses this parameter in the fragment shader:

uniform vec4 boundaries;  
...
   x0 = gl_FragCoord.x*((boundaries.z - boundaries.x)/800.0) + boundaries.x ;
   y0 = gl_FragCoord.y*((boundaries.w - boundaries.y)/600.0) + boundaries.y ;

And here is the code that specify the value of the boundaries uniform in the C++ program:

int coordinatesUniformLocation = glGetUniformLocation(program, "boundaries");
glUniform4f(coordinatesUniformLocation, boundaries.x1, boundaries.y1, boundaries.x2, boundaries.y2);

We can manipulate the values of the boundaries array using a mouse click handler like this:

glfwSetMouseButtonCallback(window, mouseCallback);
...
void mouseCallback(GLFWwindow* window, int button, int action, int mods) {
  if (action == GLFW_PRESS) {
     double xpos, ypos;
     glfwGetCursorPos(window, &xpos, &ypos);

ypos = 600 - ypos;
     double xclick, yclick;
     xclick = xpos*((boundaries.x2 - boundaries.x1)/800.0) + boundaries.x1;
     yclick = ypos*((boundaries.y2 - boundaries.y1)/600.0) + boundaries.y1;

     double currentWidth = (boundaries.x2 - boundaries.x1) - (boundaries.x2 - boundaries.x1)/10;
     double currentHeight = (boundaries.y2 - boundaries.y1) - (boundaries.y2 - boundaries.y1)/10;
     boundaries.x1 = (float)( xclick - currentWidth/2);
     boundaries.x2 = (float)( xclick + currentWidth/2);

     boundaries.y1 = (float)( yclick - currentHeight/2);
     boundaries.y2 = (float)( yclick + currentHeight/2);
   }
}

Finally we define the code draw loop like this:

while (!glfwWindowShouldClose(window))
{
  int width, height;
  glfwGetFramebufferSize(window, &width, &height);
  glViewport(0, 0, width, height);
  glClear(GL_COLOR_BUFFER_BIT);
  glUseProgram(program);
  int coordinatesUniformLocation = glGetUniformLocation(program, "boundaries");

  glUniform4f(coordinatesUniformLocation, boundaries.x1, boundaries.y1, boundaries.x2, boundaries.y2);

  glDrawArrays(GL_TRIANGLES, 0, 6);
  glfwSwapBuffers(window);
  glfwPollEvents();
}

1.5 Conclusion

This was a fun experiment!. I got a small glimpse on how to work with OpenGL. Future posts are going to explore even further and maybe use other programming languages to explore OpenGL.

Code for this experiment can be found here: https://github.com/ldfallas/openglmandelbrot .

First programming language: GW-BASIC

2020-02-23T16:56:00.002-06:00

Like many developers my age, the first programming language I ever used was BASIC. Specifically, GW-BASIC in a nice Epson "Abacus" XT machine with a green-on-black monitor back in primary school.

For me, the experience of writing my first program was magical. Having feedback from the computer with just few key strokes was an essential part of it.

10 PRINT "hola"
RUN
hola

Writing a small/toy implementation of GW-BASIC may be a good exercise and a homage to this programming language. In the following posts I'm going to write about the ongoing experience of doing this.

As with other posts in this blog I'm going to try to write this program using a language I'm learning. For this task I'll be using Rust. Rust is a very interesting language with many concepts to learn. For me the only way of learning a new programming language is trying to use it to implement something.

The repo for work in this small project it's located here: https://github.com/ldfallas/rgwbasic .

The current implementation still lacks of most of the features to make it a useful (or usable) implementation. But at least you can write:

$ cargo run
...
    Finished dev [unoptimized + debuginfo] target(s) in 0.05s
     Running `target/debug/rgwbasic`
Ok
10 print "hello"
Ok
20 goto 10
Ok
run
"HELLO"
"HELLO"
"HELLO"
"HELLO"
"HELLO"
...

A quick note about programming in Prolog

2019-03-10T17:54:00.000-06:00

Prolog predicates work in different ways depending on how you use them. A simple example is the append predicate. One may say that append is used to get the result of appending two lists. For example:

?- append([1,2,3], [4,5,6], NewList).
NewList = [1, 2, 3, 4, 5, 6].

But we can also use append to get the prefix of a list given a suffix.

?- append(Prefix, [5,6], [1,2,3,4,5,6]).
Prefix = [1, 2, 3, 4] ;

Or we can get all the possible combinations of lists that can concatenated produce a specified list:

?- append(First, Second, [1,2,3,4,5,6]).
First = [],
Second = [1, 2, 3, 4, 5, 6] ;
First = [1],
Second = [2, 3, 4, 5, 6] ;
First = [1, 2],
Second = [3, 4, 5, 6] ;
First = [1, 2, 3],
Second = [4, 5, 6] ;
First = [1, 2, 3, 4],
Second = [5, 6] ;
First = [1, 2, 3, 4, 5],
Second = [6] ;
First = [1, 2, 3, 4, 5, 6],
Second = [] ;

This is a powerful concept that is accomplished by Prolog's execution mecanism which involves unification, resolution and backtracking. This mechanism tries to find a way to bind free variables so the goal makes sense.

Recently, I found an example of this behavior while writing some quick experiments with a JavaScript parser I'm working on (which deserves a separate post).

It all started while trying to define a predicate to translate from a JavaScript switch statement to an equivalent sequence of if/else statements. An example of this conversion it's take the following statement:

switch(myVar) {
  case 1: 
     print('one');
     break;
  case 2:
     print('two');
     break;
}

And convert it to something like:

if (myVar == 1) {
   print('one');
} else if (myVar == 2) {
   print('two');
}

A simple version of a predicate that performs this conversion is the following:

switch_if(js_switch(js_identifier(Variable, _),
                    Cases,
                   _),
          IfStat) :-
       switch_if_cases(Variable, Cases, IfStat).

switch_if_cases(Variable,
                [js_case(Value, Body, _)|Rest],
                js_if(Comparison,
                      js_block(NoBreakBody, _),
                      RestIf,
                      _)) :-
     value_comparison(Comparison, Variable, Value),
     body_with_no_break(Body, NoBreakBody),
     switch_if_cases(Variable, Rest, RestIf).

switch_if_cases(Variable,
                [js_default( Body, _)],
                js_block(NoBreakBody, _)) :-
     body_with_no_break(Body, NoBreakBody).

value_comparison(js_binary_operation(
                          equals_op,
                          js_identifier(Variable, _),
                          Value,
                          _),
           Variable, Value).

body_with_no_break([js_break(_)], []).
body_with_no_break([Stat|Rest1], [Stat|Rest2]) :-
    body_with_no_break(Rest1, Rest2)

Here's an example of using the switch_if predicate to convert a small code snippet. The input contains a switch statement. The result is unified in the IfString variable.

?- parse_js_stat_string("switch(w) {
case 1:
   print('first');
   break;
case 2:
   print('second');
   break;
default:
   print('Nothing');
   break;
}", SwitchAst), switch_if(SwitchAst, IfAst), print_js_ast_to_string(IfAst, IfString), writef(IfString).
if (w == 1) {
 print('first', );
} else if (w == 2) {
 print('second', );
} else {
 print('Nothing', );
}
SwitchAst = js_switch(...),
IfString = "if (w == 1) {\n print('first', );\n} else if (w == 2) {\n print('second', );\n} else {\n print('Nothing', );\n}"

The parse_js_stat_string predicate parses a string containing a statement. On success this predicate generates a simple AST representation of the input element as a Prolog term. For example:

?- parse_js_stat_string("switch(w) {
       case 1:
         print('first');
         break;
       case 2:
         print('second');
         break;
       default:
         print('Nothing');
         break;
    }", SwitchAst).
SwitchAst = js_switch(js_identifier([119], lex_info(1, [])), [js_case(js_literal(number, [49], lex_info(2, [ws(19, false)])), [js_expr_stat(js_call(js_identifier([112, 114|...], lex_info(3, [ws(..., ...)])), js_arguments([js_literal(string, [...|...], lex_info(..., ...))], lex_info(3, [], 3, [])), null)), js_break(lex_info(4, [ws(43, true)]))], lex_info(2, [ws(11, true)])), js_case(js_literal(number, [50], lex_info(5, [ws(63, false)])), [js_expr_stat(js_call(js_identifier([112|...], lex_info(6, [...])), js_arguments([js_literal(..., ..., ...)], lex_info(6, [], 6, [])), null)), js_break(lex_info(7, [ws(..., ...)]))], lex_info(5, [ws(55, true)])), js_default([js_expr_stat(js_call(js_identifier([...|...], lex_info(..., ...)), js_arguments([...], lex_info(..., ..., ..., ...)), null)), js_break(lex_info(10, [...]))], lex_info(8, [ws(100, true)]))], lex_info(1, []))

Going the other way

An awesome surprise it's that the switch_if predicate can be used in "the order direction" without any modification. That is, we can specify an if AST and get a switch version of it. Of course this can only be done if the conversion makes sense. For example:

?- parse_js_stat_string("
if (x == 10) {
    print('10 val');
} else if (x == 20) {
    print('20 val');
} else {
    print('None');
}", IfAst), switch_if(SwitchAst, IfAst), print_js_ast_to_string(SwitchAst, SwitchString), writef(SwitchString).
switch(x) {
 case 10:
  print('10 val', );
  break;
 case 20:
  print('20 val', );
  break;
 default:
   print('None', );
  break;
}
IfAst = js_if(js_binary_operation(equals_op,...),
SwitchAst = js_switch(...),
SwitchString = "switch(x) {\n case 10:\n  print('10 val', );\n  break;\n case 20:\n  print('20 val', );\n  break;\n default:\n   print('None', );\n  break;\n}\n"

Several alternatives for converting between `switch` and `if`

One of the most intriguing characteristics of Prolog is the posibility of giving different valid results for a given goal.

For example we can give another posibility to the value_comparison predicate:

value_comparison(js_binary_operation(
                          equals_op,
                          js_identifier(Variable, _),
                          Value,
                          _),
           Variable, Value).
value_comparison(js_binary_operation(
                          equals_op,
                          Value,
                          js_identifier(Variable, _),
                          _),
           Variable, Value).

By doing this what we are saying is that either x == 1 and 1 == x are valid possibilities for the condition of the if statement replacing a case section.

If we do this we can get more alternatives:

?- quick_switch_if_test("switch(w) {
case 1:
   print('first');
   break;
case 2:
   print('second');
   break;
default:
   print('Nothing');
   break;
}").
if (w == 1) {
 print('first', );
} else if (w == 2) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
if (w == 1) {
 print('first', );
} else if (2 == w) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
if (1 == w) {
 print('first', );
} else if (w == 2) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
if (1 == w) {
 print('first', );
} else if (2 == w) {
 print('second', );
} else {
 print('Nothing', );
}
true ;
false.

Code for this post can be found here.

Using Racklog for parsing

2018-12-29T09:18:00.001-06:00

This post shows a small experiment of creating parsing predicates using the Racket's Racklog package . This package enables the use of logic programming features inside Racket.

Logic programming languages, like Prolog, use the Definite clause grammars syntax for this purpose. In this post this technique is not directly used, the goal is to express the grammar in the same way the DCG syntax is expanded. A nice experiment for a future post is to create macros for hiding state change variables.

As always the goal is to experiment with a concept without worrying too much about performance!

1.1 How it looks

Here's an example of a predicate for parsing an if statement for a fictional language.

(define %if-stat
  (%rel (ctxt new-ctxt)
        [(ctxt new-ctxt)
         (%let (ifkw-ctxt lpar-ctxt rpar-ctxt
                cond-ctxt then-ctxt
                stats-ctxt else-ctxt end-ctxt)
               (%and
                ((%literal-id "if")   ctxt ifkw-ctxt) 
                (%lpar                ifkw-ctxt lpar-ctxt)
                (%expression          lpar-ctxt cond-ctxt)
                (%rpar                cond-ctxt rpar-ctxt)
                ((%literal-id "then") rpar-ctxt then-ctxt)
                (%stat                then-ctxt stats-ctxt)
                ((%opt %else-block)   stats-ctxt else-ctxt)
                (%is new-ctxt (with-result
                                `(-if ,(parse-context-result cond-ctxt)
                                      ,(parse-context-result stats-ctxt)
                                      ,(parse-context-result else-ctxt))
                                else-ctxt))))]))

Running this predicate from the REPL using a string with sample code produces the following output:

racklogexperiments.rkt> (parse-context-result
 (cdar
   (%which (result-ctxt)
           (%if-stat (string-context "if (c) then
                                        foo(c);
                                      else
                                         moo(d);")
                 result-ctxt))))
'(-if
  (-id "c")
  (-call-stat (-call "foo" ((-id "c"))))
  (-else (-call-stat (-call "moo" ((-id "d"))))))
racklogexperiments.rkt>

1.2 Creating parsing predicates

An important piece that we need to define our parsers it's the "parsing context". This element is represented as as simple structure which keeps track of :

The result or output of the last parser.
The text used as input for the parser.
The current position inside the string as a zero-based index.

(struct
  parse-context
  (result text position))

(define (string-context str)
        (parse-context '() str 0))

(define (with-result value ctxt)
  (struct-copy parse-context ctxt [result value]))

The way we define parsers using Racklog predicates is that we transform an input parsing context to new context. The new context is going to have a the result of the last parsing operation and it will move the position as many characters as consumed by the last parsing operation.

We use a couple of parsers as the foundation for all must all other parsers. The first one, %a-char has the next available character as the result.

(define %a-char
  (%rel (a-char ctxt new-ctxt)
        [(a-char ctxt new-ctxt)
            (%is #t (has-more-chars? ctxt))
            (%is new-ctxt (get-char-from-context ctxt))
            (%is a-char (parse-context-result new-ctxt))
            ]))

The code for the utility functions used in these parsers is the following:

(define (get-char-from-context ctxt)
  (struct-copy
     parse-context
     ctxt
     [result (string-ref (parse-context-text ctxt) (parse-context-position ctxt))]
     [position (+ 1 (parse-context-position ctxt))]))

(define (has-more-chars? ctx)
   (<
      (parse-context-position ctx)
      (string-length (parse-context-text ctx))))

The get-char-from-context function creates a new context with the current character as result and advances the context to the next position. We use Racklog unification and %and to chain together the parsing contexts. The following interaction illustrates this:

racklogexperiments.rkt> (define result (%which (c1 result-ctxt) 
                                               (%and (%a-char #\a (string-context "abc") c1)
                                                     (%a-char #\b c1 result-ctxt))))
racklogexperiments.rkt> result
'((c1 . #<parse-context>) (result-ctxt . #<parse-context>))
racklogexperiments.rkt> (assoc 'result-ctxt result)
'(result-ctxt . #<parse-context>)
racklogexperiments.rkt> (parse-context-result (cdr (assoc 'result-ctxt result)))
#\b
racklogexperiments.rkt> (parse-context-position (cdr (assoc 'result-ctxt result)))
2
racklogexperiments.rkt>

Here we create a very simple parser that recognizes the sequence: "ab" . As presented above, the position of the resulting parsing context is 2 which is the zero based position inside the string after 'b'.

1.3 Sequences

A very useful parser is one that let's you apply another parser zero or more times. The parser that archives this is the following:

(define %p-simple-sequence
  (λ (pred)
    (%rel (ctxt new-ctxt)
          [(ctxt new-ctxt)
           (%let (tmp-ctxt tmp-result)
                 (%or
                  (%and
                   (pred ctxt tmp-ctxt)
                   (%is tmp-result (with-result                                     
                                     (cons (parse-context-result tmp-ctxt)
                                           (parse-context-result ctxt))
                                     tmp-ctxt))
                   ((%p-simple-sequence pred) tmp-result new-ctxt))
                  (%is new-ctxt ctxt)))])))

(define %p-sequence
  (λ (pred)
    (%rel ( ctxt new-ctxt)
          [(ctxt new-ctxt)
           (%let (tmp-ctxt)
               (%and
                 (%is tmp-ctxt (with-result '() ctxt))
                 ((%p-simple-sequence pred)
                  tmp-ctxt new-ctxt)
                 ))]
          )))

We can apply this parser as follows:

racklogexperiments.rkt> (parse-context-result 
       (cdar (%which (result-ctxt) 
               ((%p-sequence 
                    (%rel (c r) [(c r) (%a-char #\a c r)])) 
                (string-context "aaaa") result-ctxt)))  )
'(#\a #\a #\a #\a)
racklogexperiments.rkt>

1.4 Optionals

Another useful element we need is a way to parse optional elements. We used this in our if example above for the else section.

To implement this we use %or to try to parse the optional parser first or succeed with an empty result. Using this technique will enable multiple solutions (see an example of this below).

(define %opt
  (λ (parser)
    (%rel (ctxt new-ctxt)
          [(ctxt new-ctxt)
           (%let (tmp-ctxt)
                 (%and
                  (%or (parser ctxt new-ctxt)
                       (%is new-ctxt (with-result '() ctxt)))))])))

1.5 Multiple possible ASTs

One interesting possibility of using a Racklog (or Prolog's DCGs) is that you can get many possible interpretations of the grammar. Although it may not be of practical use it looks rather interesting.

An example of these shows up when parsing an if statement with a https://en.wikipedia.org/wiki/Dangling_else .

(define sample-code
  "if (x) then if (y) then foo(); else goo();")

Here there are two possible valid interpretations of this if statement. The default one:

racklogexperiments.rkt> (parse-context-result
 (cdar
  (%which (result-ctxt)
          (%if-stat (string-context sample-code)
                    result-ctxt))))

'(-if
  (-id "x")
  (-if
   (-id "y")
   (-call-stat (-call "foo" ()))
   (-else (-call-stat (-call "goo" ()))))
  ())
racklogexperiments.rkt>

Here's an alternative visualization of this tree:

We can now ask Racklog for another solution using the %more function. See the result here:

racklogexperiments.rkt> (parse-context-result (cdar (%more)))
'(-if
  (-id "x")
  (-if (-id "y") (-call-stat (-call "foo" ())) ())
  (-else (-call-stat (-call "goo" ()))))
racklogexperiments.rkt>

Here's the other alternative visualization:

1.6 Code

The code for this post can be found here.

A small Tetris-like clone using J and ncurses. Part 2

2017-05-11T20:53:00.001-06:00

This is part two of our Tetris-like clone in J series. In this part we're going to see how the ncurses interface was created.

Creating the ncurses UI

To create the user interface we used the ncurses UI using the api/ncurses package. Sadly all interactions with this API makes the code look like straightforward imperative code.

Since we represented the game field using a matrix, we need a way to visualize this matrix. The next snippet shows how we used the wattr_on and mvwprintw functions to print each cell of the game field with the given color.

drawGame=: 3 : 0
matrix =. >{.}.y
win =. >{.y
cols =. }. $ matrix
rows =. {. $ matrix
for_row. i.rows do.
  for_col. i.cols do.
     value =. (<row, col) { matrix
     wattr_on_ncurses_ win;(COLOR_PAIR_ncurses_ (value+1));0
     mvwprintw_1_ncurses_ win; row; (2*col); (('  '))
  end.
end.
)

The game loop handles user interactions and game rules . Here's how it looks:

NB. Game loop
while. 1 do.
   c =. wgetch_ncurses_ vin 
   if. c = KEY_UP_ncurses_ do.
      game =: put_in_matrix (current*_1);k;j;game
      current =. rotate current
      needs_refresh =. 1
   elseif. c = KEY_RIGHT_ncurses_ do.
      game_tmp =. put_in_matrix (current*_1);k;j;game
      if. can_put_in_matrix current;k;(j + 1);game_tmp do.
         game =: game_tmp
         j =. j + 1
         needs_refresh =. 1
      end.
   elseif. c = KEY_LEFT_ncurses_ do.
      game_tmp =. put_in_matrix (current*_1);k;j;game
      if. can_put_in_matrix (current);k;(j - 1);game_tmp do.
         game =: game_tmp
         j =. j - 1
         needs_refresh =. 1
      end.
   elseif. 1 do. 
      if. ((seconds_from_start'') - timestamp) < 0.1 do.
         continue.
      else.
         timestamp =. seconds_from_start'' 
      end.
   
      if. automove = 0 do.
         game =: put_in_matrix (current*_1);k;j;game
         if. can_put_in_matrix (current);(k+1);j;game do.
            k =. k + 1
         else.
           game =: put_in_matrix (current);k;j;game
           k =. 0
           j =. 0
           if. can_put_in_matrix current;k;j;game do.
              current =. (?@$ tetriminos) {:: tetriminos
           else.
             mvwprintw_1_ncurses_ vin; 0; 0; ' Game over '
             nodelay_ncurses_ vin ;'0'
             wgetch_ncurses_ vin 
             exit''
          
           end.
         end.
         automove =. 2
         needs_refresh =. 1
      else.
          automove =. automove - 1
      end.
   end.
   unget_wch_ncurses_ c
   if. needs_refresh do.
      game =: put_in_matrix (current);k;j;game
      game =: remove_full_rows game
      drawGame vin; game
      wrefresh_ncurses_  vin
      needs_refresh =. 0
   end.
end.

The rest of the code is pure ncurses initialization which is not that interesting. Code for this post can be found here: : https://github.com/ldfallas/jcurtris .

A small Tetris-like clone using J and ncurses. Part 1

2017-04-30T11:02:00.001-06:00

For me, the J programming language it's a very intriguing. It is full of ideas and concepts that I'm not familiar with. Getting to know a programming language it's not only about learning the syntax. It is learning the techniques that people use to take advantage of it what gives you more insight . This is particularly true for J.

For me the best way to learn more about a programming language is to try to solve a small problem with it. In this post I'm going to describe an attempt to write a small and incomplete Tetris-like clone using J and the ncurses library. Here's a preview of how it looks:

Tetriminos

According to the Wikipedia page for Tetris, the pieces are named Tetriminos https://en.wikipedia.org/wiki/Tetris#Gameplay. Each piece is composed of blocks. In J we can represent this pieces as matrices.

To create this matrices we use the Shape verb ($) For example:

The "L" tetrimino:

   ] l_tetrimino =. 2 3 $ 1 1 1 1 0 0 
1 1 1
1 0 0

The "O" tetrimino:

   ] b_tetrimino =. 2 2 $ 4 4 4 4 
4 4
4 4

The "S" tetrimino:

   ] s_tetrimino =. 2 3 $ 0 5 5 5 5 0 
0 5 5
5 5 0

Tetrimino rotation

In Tetris pressing the 'up' arrow is going to rotate the current piece. We can use matrix Transpose (|:) and Reverse (|.) verbs and compose them together using the Atop conjunction (@). Here's the definition:

rotate =: |.@|:

Here we can see how this verb works:

   l_tetrimino
1 1 1
1 0 0
   rotate l_tetrimino
1 0
1 0
1 1
   rotate rotate l_tetrimino
0 0 1
1 1 1
   rotate rotate rotate l_tetrimino
1 1
0 1
0 1
   rotate rotate rotate rotate l_tetrimino
1 1 1
1 0 0

We can apply this transformation to the other tetriminos for example:

   ] s_tetrimino =. 2 3 $ 0 5 5 5 5 0 
0 5 5
5 5 0
   rotate s_tetrimino
5 0
5 5
0 5
   rotate rotate s_tetrimino
0 5 5
5 5 0

We use different numbers for each tetrimino so we can use different colors to paint them.

Tetrimino placement

A natural way to model the game is to use a matrix representing the playing field. We use a 10 columns by 20 rows matrix for this effect. We use the Shape verb ($) to do this:

   ] game =:  20 10 $ 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0

A fundamental piece of functionality that we need is a way to put a tetrimino inside this matrix. This proved to be tricky (maybe because of lack of J knowledge). We're going to incrementally create this verb.

Reading the J documentation, it seems that we can use the Amend (m } _ _ _) verb to change just a set of cells of the game matrix. Here's an example on how to use this verb

   ] sample =. 5 5 $ 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

   1 2 3 4 (1 1;2 1;1 2;2 2) } sample
0 0 0 0 0
0 1 3 0 0
0 2 4 0 0
0 0 0 0 0
0 0 0 0 0

What we are saying here is that we can to change the following cells in sample:

row 1 column 1 with value 1
row 2 column 1 with value 2
row 1 column 2 with value 3
row 2 column 2 with value 4

Now to take advantage of this verb we need to calculate the target coordinates to change the value of a tetrimino. First we start by generating coordinates for each of the cells of the tetrimino.

We're going to use the following predifined values:

   ] l_tetrimino =. 2 3 $ 1 1 1 1 0 0 
1 1 1
1 0 0
   sample
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

We start by determining how many rows and columns. We use the Shape of verb ($) to do this:

   $ l_tetrimino
2 3

Now we want to generate (0, 0);(0, 1);...(1, 2);(2, 2). With the result of Shape of we generate a sequence of numbers for each of the axis. To do this we use the Integers (i.) verbwith the number of rows and the number of columns. For example:

   NB. Get the number of rows:
   (0&{@$) l_tetrimino
2
   NB. Get the number of columns:
   (1&{@$) l_tetrimino
3
   NB. Get an integer sequence from zero to number of rows or columns
   i.(1&{@$) l_tetrimino
0 1 2
   i.(0&{@$) l_tetrimino
0 1

Now this is very cool, we can use the Table verb (/)to pair this two arrays. From the documentation:

In general, each cell of x is applied to the entire of y . Thus x u/ y is equivalent to x u"(lu,_) y where lu is the left rank of u .

This is very important!. To taken advantage of this we need to use the Append verb (,) but changing the rack to operate on each of the elements from the right argument. See this example:

   0 (,"0) 0 1 2
0 0
0 1
0 2

Now we can take advantage of this and write:

   (  (i.@(0&{@$)) (<@,"0)/ (i.@(1&{@$))) l_tetrimino 
+---+---+---+
|0 0|0 1|0 2|
+---+---+---+
|1 0|1 1|1 2|
+---+---+---+

Now this is almost what we need. We can use the Ravel veb (,) to flatten this box:


, (  (i.@(0&{@$)) (<@,"0)/ (i.@(1&{@$))) l_tetrimino
+---+---+---+---+---+---+
|0 0|0 1|0 2|1 0|1 1|1 2|
+---+---+---+---+---+---+

With this positions we can use the Amend verb to change our game matrix:

   (,l_tetrimino) positions } sample
1 1 1 0 0
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

We need something else since this technique only allows us to put the tetrimino at the top of the matrix. In order to do this we need to sum the target position to the coordinates that we generated. We can use the powerful Under verb (&.) which allows us to apply an operation to each of the cells of a box.

   (3 2)&+ &.> positions
+---+---+---+---+---+---+
|3 2|3 3|3 4|4 2|4 3|4 4|
+---+---+---+---+---+---+

We construct this operation by:

using Bond conjuction (&) to tie together the position (3 2) with the plus operation (+) . That is (3 2)&+ .
we apply this operation to each of the elements of the box and then assemble the box again. That is &.>

Now we can put the tetrimino in row 3, column 2.

   target_position =. 3 2
   target_position =. 2 1
   (,l_tetrimino) (target_position&+&.>positions) } sample
0 0 0 0 0
0 0 0 0 0
0 1 0 0 0
0 1 1 1 0
0 0 0 0 0

We cannot just pust the tetrimino in the target position. It may also "blend" with existing values. For example say the following game field and the following tetrminio:

   ] game
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 1 0
0 0 0 1 1

positions =., (  (i.@(0&{@$)) (<@,"0)/ (i.@(1&{@$))) tetrimino

   (,tetrimino) ((1 2)&+&.> positions) } game
0 0 0 0 0
0 0 1 1 0
0 0 1 0 0
0 0 1 0 0
0 0 0 1 1

Because of this we need to mix the tetrimino with the current values of the target region. We do this by extracting the values of the target position:

   ($tetrimino) $ ((1 2)&+&.> positions) { game
0 0
0 1
0 1

We can combine this array with the tetrimino and we get the desired target value:

   ] target_tetrimino =. +&tetrimino ($tetrimino) $ ((1 2)&+&.> positions) { game
1 1
1 1
1 1

   (,target_tetrimino) ((1 2)&+&.> positions) } game
0 0 0 0 0
0 0 1 1 0
0 0 1 1 0
0 0 1 1 0
0 0 0 1 1

The final verb definition looks like this:

put_in_matrix =: 3 : 0
 NB. unpack argumets
 tetrimino =. > 0 { y
 i =. > 1 { y
 j =. > 2 { y
 game =. > 3 { y

 NB. calculate target positions 
 positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino

 NB. combine tetrimino with target section
 tetrimino =. +&tetrimino ($tetrimino) $ ((+&(i,j))&.> positions) { game
  
 NB. change matrix
 (,tetrimino) ((+&(i,j))&.> positions)} game
)

Checking if space is available

The other piece of functionality that we need is a way to verify if we can put the tetrimino in a target position. We need to verify two conditions: 1. We can put the tetrimino inside the game field. 2. There's space available in the target position.

To check the boundaries we use common comparison operators:

 NB. Verify field boundaries
 is_inside =. (xpos >: 0) *. (ypos >: 0) *. ( (xpos+tetrimino_width - 1) < game_width) *. ((ypos+tetrimino_height - 1) < game_height)

The second criteria it's more intersesting. To illustrate how we did the detection we're going to start with a predifined game field:

   game
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 0 0
   ] tetrimino =. 2 3 $ 2 0 0 2 2 2
2 0 0
2 2 2

The first step is to reset the values of the tetrimino to be either zero or one:

   ] tetrimino =. 0&< tetrimino
1 0 0
1 1 1

Now we extract the elements of the target position (in this example column 1, row 3)

   ypos =. 3
   xpos =. 1
   positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino
   ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
   
0 0 1
0 0 0

   xpos =. 1
   ypos =. 2
   positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino
   ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
0 0 0
0 0 1

Now we can multiply the tetrimino by the target value:

   target_segment =. ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
   ] hits =. +/ , *&tetrimino target_segment
1

Now the variable hits contains the number of elements with a target cell value. The final predicate looks like this:

can_put_in_matrix =: 3 : 0
 NB. Unpack the arguments
 tetrimino =. 0&< > 0 { y
 tetrimino_width =. 1 { $ tetrimino
 tetrimino_height =. 0 { $ tetrimino
 ypos =. > 1 { y
 xpos =. > 2 { y
 game =. > 3 { y
 game_width  =. 1 { $ game
 game_height =. 0 { $ game

 NB. Verify field boundaries
 is_inside =. (xpos >: 0) *. (ypos >: 0) *. ( (xpos+tetrimino_width - 1) < game_width) *. ((ypos+tetrimino_height - 1) < game_height)

 NB. Check if we hit an occupied cell
 hits =. 0
 if. is_inside do.
   positions =. , ((i.@(0&{)@$)(<@,"0)/(i.@(1&{)@$)) tetrimino
   hits =. +/ , *&tetrimino ($tetrimino) $ ((+&(ypos,xpos))&.> positions){ game
 end.

 is_inside *. (hits = 0)
)

End words

As it was said in the beginning, J it's very interesting. For me there are many things to learn (you can tell that by looking at all those parenthesis in some expressions). Also there are many strategies in array languages that one needs to understand in other to write idiomatic code.

The ncurses interface will be discussed in part 2. For future posts it will be interesting to talk about concepts like the obverse (http://www.jsoftware.com/help/jforc/u_1_uv_uv_and_u_v.htm#_Toc191734413) and state machines (http://www.jsoftware.com/help/jforc/loopless_code_vii_sequential.htm#_Toc191734470) .

Code for this post can be found here: https://github.com/ldfallas/jcurtris

A simple language with indentation based blocks. Part 4: Improving error messages

2017-04-10T20:51:00.002-06:00

Going back to our first post, we defined the type of a parser function as :

ReaderState ->  (('a * ReaderState) option )

Which means that a parser is a function from a reader state to an Option<'a> of a value and a new reader state.

Using Option<'a> is handy for writing code that may result on a value or a failure indicator. In our case it's possible that the parser fails to recognize its input. The most common example it's a program with syntax errors. For example the following program is incorrect:

if cond1:
   if x y z:
      print(x)

A parser failure may also be something we expected. This is the case when need to use failure to choose a parser from a list of possible parsers. For example the expression parser:

   pStatements := [pReturn; ifParser; pCallStatement; pAssignStatement; whileParser]

   let pStatement =
       fun state -> (List.reduce disjParser !pStatements) state

Using the bultin Option<'a> type comes handy, but it has the problem that it doesn't provide information about the parsing failure. When a parser cannot recognize some input string, the only response we get is None. That's why we introduced the ParsingResult<'a> type to replace the Option<'a> type. Here's the definition:

type ReaderState = { Data:        string;
                     Position:    int;
                     Line:        int;
                     Indentation: int list }

type ParserFailure =
    | Fatal of string * int
    | Fail

   
type ParsingResult<'a> =
     | Success of ('a * ReaderState)
     | Failure of ParserFailure

We still have two possible states: Success and Failure. However now we have a space to add more information about a syntax error . In this case Fatal has a couple of slots to specify an error message and a line number.

Failure information

The Failure has a parameter of type ParserFailure. This type is defined as follows:

   type ParserFailure =
       | Fatal of string * int
       | Fail

We use this two alternatives to represent the scenarios described at the beginning of this post:

Fatal : a syntax error in the input string
Fail : a failure to completely recognize something

We use Fatal for scenarios such as the following program which has a syntax error in the while condition ( x y ):

while x y:
   print(1)

This is the definition of the whileParser:

   let whileParser =
        whileKeyword    >>
        pExpression     >>= (fun condition ->
        colon           >>
        pBlock          >>= (fun block ->
        preturn (PWhile(condition, block))))

We can say that any failure after the while keyword is a fatal failure. However a failure detecting the while keyword is a simple failure . In order to represent this situation we're going to introduce the +>> which is going to sequence two parsers and produce a fatal failure in case that the first parser fails. The definition of this operation looks like this:

   let inline (+>>)  (parser1 : (ReaderState ->  ParsingResult<'a>))
                     (parser2 : (ReaderState ->  ParsingResult<'b>)) =
       fun input -> match (parser1 input) with
                    | Success (_, restState) -> parser2 restState
                    | Failure(f & Fatal(_, _)) -> Failure(f)
                    | Failure(_) -> Failure(Fatal("Parsing error ", input.Line))

   let inline (+>>=) (parser1 : (ReaderState ->  ParsingResult<'a>))
                     (parser2N : ('a ->  (ReaderState ->  ParsingResult<'b>))) =
       fun input -> match (parser1 input) with
                    | Success (matchedTxt, restState) -> (parser2N matchedTxt) restState
                    | Failure(f & Fatal(_)) -> Failure(f)
                    | Failure(_) -> Failure(Fatal("Parse problem ", input.Line))

Now with this operator we can modify the definition of whileParser to be as follows:

   let whileParser =
        whileKeyword    >>
        pExpression     +>>= (fun condition ->
        colon           +>>
        pBlock          +>>= (fun block ->
        preturn (PWhile(condition, block))))

Example

Now we can see the result of parsing a file:

> parse "if x:
- 
-       while x y:
-           print(1)
- " pStatement;;
val it : ParsingResult<PStat> = Failure (Fatal ("Parsing error ",3))

Expected failure

Now we also we parser failures for choosing from a set of possible parsers. We do that in the disjParser which chooses between two parsers. The old definition of the disjParser looks like this:

let disjParser (parser1 : (ReaderState ->  (('a * ReaderState) option )))
               (parser2 : (ReaderState ->  (('a * ReaderState) option ))) =
    fun input -> match (parser1 input) with
                 | result & Some _ -> result
                 | _ -> parser2 input

Notice that this definition uses the Option<'a> cases (Some and None) to determine if it succeeds or if it needs to give a chance to parser2. With our new ParserResult<'a> type we need to make a distinction between fatal and a "controlled" failure. We can now change the definition of this parser to be:

let disjParser (parser1 : (ReaderState ->  ParsingResult<'a>))
               (parser2 : (ReaderState ->  ParsingResult<'a>)) =
    fun input -> match (parser1 input) with
                 | success & Success(_) -> success
                 | Failure(fatal & Fatal(_, _)) -> Failure(fatal)
                 | _ -> parser2 input

Next steps

In the next post we're going to deal with code comments.

This is part #4 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html and a GitHub repository with the code can be found there: https://github.com/ldfallas/fsharpparsing.

A simple language with indentation based blocks. Part 3: Blocks

2017-03-25T10:25:00.000-06:00

In this post we're going to add support for statements containing blocks which is the main goal of these series of posts.

Identifying blocks by using indentation

The technique we're going to use to identify blocks is based the following description from the (excellent) Python documentation:

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line's indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.

This section was taken from: https://docs.python.org/3/reference/lexical_analysis.html#indentation

The IDENT and DEDENT tokens act as begin/end block indicators. They have the same function as '{' and '}' in C-based languages. The following Python grammar fragment shows how this tokens are used.

...
suite: simple_stmt | NEWLINE IDENT stmt+ DEDENT
...
if_stmt: 'if' test ':' suite ...
...

Since our little parser uses parsing combinators without a tokenizer, we need to adjust this strategy a little bit. We're going to jump right ahead to show how we define the parser for the if statement.

   let pBlock =
       newlines   >>
       indent     >>
       pStatement >>= (fun firstStat ->
       (zeroOrMore
           (newlines  >>
            indented  >>
            pStatement) [])  >>= (fun restStats ->
       dedent      >> preturn (firstStat::restStats)))

   let ifParser =
       ifKeyword   >>
       pExpression >>= (fun expr ->
       colon       >>
       pBlock      >>= (fun block ->
       (optionalP pElse []) >>= (fun optElseBlockStats ->
                     preturn (PIf(expr,
                                  block,
                                  (match optElseBlockStats with
                                   | [] -> None
                                   | elements -> Some elements))))))

Here's an example of the generated AST for a simple if statement:

> parse "if x:
-           print(x)
-           return x*x
- " ifParser;;
val it : (PStat * ReaderState) option =
  Some
    (PIf
       (PSymbol "x",
        [PCallStat (PCall ("print",[PSymbol "x"]));
         PReturn (PBinaryOperation (Times,PSymbol "x",PSymbol "x"))],null),
     {Data = "if x:
          print(x)
          return x*x
";
      Position = 45;
      Indentation = [0];})

The key element here is the definition of the pBlock parser. In the definition of this parser we try to emulate the same strategy as the Python grammar definition:

   let pBlock =
       newlines   >>
       indent     >>
       pStatement >>= (fun firstStat ->
       (zeroOrMore
           (newlines  >>
            indented  >>
            pStatement) [])  >>= (fun restStats ->
       dedent      >> preturn (firstStat::restStats)))

We define indent, dedent and indented as follows:

   let indentation =
          pGetIndentation >>=(fun indentation ->                          
                (readZeroOrMoreChars (fun c -> c = ' '))
                >>=
                (fun spaces ->
                   match (spaces.Length,
                          indentation) with
                   | (lessThan, top::_) when lessThan > top ->
                       (pSetIndentation lessThan) >> (preturn "INDENT")
                   | (lessThan, top::_) when lessThan = top ->
preturn "INDENTED"
                   | (identifiedIndentation, top::rest) ->
                          (pSetFullIndentation rest) >> (preturn "DEDENT")
                   | _ -> pfail))

   let indent = indentation >>= (fun result -> if result = "INDENT" then preturn result else pfail)
   let dedent = indentation >>= (fun result -> if result = "DEDENT" then preturn result else pfail)
   let indented = indentation >>= (fun result -> if result = "INDENTED" then preturn result else pfail)

We define each of these indentation elements as:

INDENTED verifies that the expected indentation level is present. It consumes this indentation. For example, if expecting 3 space indentation, it will consume and verify that there are three spaces from the current position.
INDENT Verifies that there's a new indentation level can be found from the current position. It changes the 'Indentation' value in the current reader state.
DEDENT decreases the expected indentation level from the reader.

We can test these combinators to see how they affect the state of the parser.

We start with the following fragment:

> let code = "if x:
  if y:
     return 1
  else:
     return 2
"

We can test parts of our complex parsers by specifying just pieces of others. For example let's get to the then part of the parser.

> parse code (ifKeyword >> pExpression >> colon >> newlines >> indent);;
val it : (string * ReaderState) option =
  Some
    ("INDENT", {Data = "if x:
  if y:
     return 1
  else:
     return 2
";
                Position = 8;
                Indentation = [2; 0];})

As you can see, after executing these parsers we get to position 8. For example:

> code.Substring(8);;
val it : string = "if y:
     return 1
  else:
     return 2
"

Also the indentation stack now has 2 and 0. If we execute these series of parsers again to get to the then section of the first nested if we get:

> parse code (ifKeyword >> pExpression >> colon >> newlines >> indent >>
-             ifKeyword >> pExpression >> colon >> newlines >> indent);;
val it : (string * ReaderState) option =
  Some
    ("INDENT", {Data = "if x:
  if y:
     return 1
  else:
     return 2
";
                Position = 19;
                Indentation = [5; 2; 0];})

Now we have three indentation levels on our stack and we got to position 19.

Blocks

Just as the Python grammar reuses suite production, we can reuse the pBlock parser to define other kinds of statements. For example we can define the while statement as follows:

   let whileParser =
        whileKeyword    >>
        pExpression     >>= (fun condition ->
        colon           >>
        pBlock          >>= (fun block ->
        preturn (PWhile(condition, block))))

Now we can define more interesting statements such as:

val code : string = "if x > 0:
  while x < 10:
    print(x)
    x := x + 1
"

> parse code pStatement;;
val it : (PStat * ReaderState) option =
  Some
    (PIf
       (PBinaryOperation (Gt,PSymbol "x",PNumber "0"),
        [PWhile
           (PBinaryOperation (Lt,PSymbol "x",PNumber "10"),
            [PCallStat (PCall ("print",[PSymbol "x"]));
             PAssignStat
               (PSymbol "x",PBinaryOperation (Plus,PSymbol "x",PNumber "1"))])],
        null),
     {Data = "if x > 0:
  while x < 10:
    print(x)
    x := x + 1
";
      Position = 53;
      Indentation = [0];})

Next steps

Now that we can define blocks we can focus in other parts of our little language parser. One thing we can definetly improve is error messages. For example, let's try to parse the following fragment:

> parse "notanif x y z" ifParser;;
val it : (PStat * ReaderState) option = None

Here's where the Option<'a> type is not that useful since it doesn't allow you to specify a failure value such as a error message or location.

Another thing that we need to implement is comments. Our parser combinators work directly with the text. Because of this it will be interesting to see to introduce comments support without changing all our parsers.

This is part #3 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html.

A simple language with indentation based blocks. Part 2: Expressions

2017-02-26T19:41:00.001-06:00

The focus of the second part will be expression parsing. The desired grammar for the expression small language experiment is the following (here in pseudo BNF):

NESTED_EXPRESSION = '(' EXPRESSION ')'

PRIMARY_EXPRESSION = SYMBOL | NUMBER | STRING | NESTED

UNARY_EXPRESSION =  NOT_EXPRESSION | CALL_EXPRESSION | ARRAY_ACCESS_EXPRESSION |
                    PRIMARY_EXPRESSION

MULTIPLICATIVE_EXPRESSION = UNARY_EXPRESSION ( ('*' | '/') UNARY_EXPRESSION)*

ADDITIVE_EXPRESSION = MULTIPLICATIVE_EXPRESSION ( ('*' | '/') MULTIPLICATIVE_EXPRESSION)*

RELATIONAL_EXPRESSION = ADDITIVE_EXPRESSION ( ('<' | '>') ADDITIVE_EXPRESSION)*

EQUALITY_EXPRESSION = RELATIONAL_EXPRESSION ( ('=' | '<>') RELATIONAL_EXPRESSION)*

LOGICAL_OR_EXPRESSION = EQUALITY_EXPRESSION  ('||'  EQUALITY_EXPRESSION)* 

LOGICAL_AND_EXPRESSION = LOGICAL_OR_EXPRESSION ('&&'  LOGICAL_OR_EXPRESSION)* 

EXPRESSION = LOGICAL_AND_EXPRESSION

In this post we're going to build some key techniques for creating the code for parsing a subset of this grammar. Not every element of expression grammar will be presented since it gets repetitive at some point.

For this small experiment we're going to use the following F# types as the AST of the program:

type Operator =
    | Plus
    | Minus
    | Times
    | Div
    | And
    | Or
    | Equal
    | NotEqual
    | Assign
    | Lt
    | Gt
    

type PExpr =
    | PSymbol  of string
    | PString  of string
    | PNumber  of string
    | PBoolean of bool
    | PNot     of PExpr
    | PCall    of string * (PExpr list)
    | PNested  of PExpr
    | PArrayAccess of PExpr * PExpr
    | PBinaryOperation of Operator * PExpr * PExpr

Simple atomic expressions

We're going to start with the atomic expressions:

Symbols or variables (ex. x,y,z)
Number literals (ex 1, 1.2)
String literals (ex. "Hello")

We already got these parsers from the previous post:

   let pSymbol =
       concatParsers2
          (readWithConditionOnChar  (fun c -> System.Char.IsLetter(c, 0)))
          (fun initialChar ->
               concatParsers2
                  (readZeroOrMoreChars (fun c -> System.Char.IsLetter(c) || System.Char.IsDigit(c)))
                  (fun suffixString -> (preturn (PSymbol (initialChar + suffixString))))
           )


   let pNumber =
              ( (optionalP (readSpecificChar '-') "") >>= (fun neg -> 
                digitP  >>= (fun firstChar -> 
                (readZeroOrMoreChars (fun c ->  System.Char.IsDigit(c))) >>= (fun chars ->
                (optionalP decimalPartP "") >>= (fun dec ->                                                                               
                preturn (PNumber (neg + firstChar + chars + dec)))))))


   let pString =
       whitespace >>
       readSpecificChar '"' >>
       readZeroOrMoreStringChars (fun previous current ->
                 (previous = '\\' && current = '"') || current <> '"')
       >>= (fun stringContents ->
            readSpecificChar '"' >> (preturn (PString stringContents)))

We can call these expressions "primary expressions", which are used in conjunction with operators to create more complex elements. We need a parser that recognizes any one these elements. We can use the disjParser from the previous post :

> let primaryExpression = (disjParser (disjParser pSymbol pNumber) pString);;

val primaryExpression : (ReaderState -> (PExpr * ReaderState) option)

> parse "10" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
  Some (PNumber "10", {Data = "10";
                       Position = 2;
                       Indentation = [0];})
> parse "x" myPrimaryExpression;; 
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x", {Data = "x";
                      Position = 1;
                      Indentation = [0];})
> parse "\"hello\"" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
  Some (PString "hello", {Data = ""hello"";
                          Position = 7;
                          Indentation = [0];})

We can use List.reduce function to improve this code since we could have several parsers as the primary expression.

> let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]  ;;

val myPrimaryExpression : (ReaderState -> (PExpr * ReaderState) option)

> parse "10" myPrimaryExpression
- ;;
val it : (PExpr * ReaderState) option =
  Some (PNumber "10", {Data = "10";
                       Position = 2;
                       Indentation = [0];})
> parse "x1" myPrimaryExpression 
- ;; 
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x1", {Data = "x1";
                       Position = 2;
                       Indentation = [0];})

Binary operations

Binary expressions are composed of two expressions and an operator which connects them. For example: a + b . For these operations we can define the following utility parser creation function:

   let buildExpressions (leftExpr:PExpr) (rightExprs:(Operator * PExpr) list) =
       (List.fold (fun left (op,right) -> PBinaryOperation(op, left, right)) leftExpr rightExprs)

   let pBinaryExpression operators lowerLevelElementParser  =
       lowerLevelElementParser
         >>= (fun leftTerm ->
                 (zeroOrMore
                     (operators
                        >>= (fun op ->
                               lowerLevelElementParser >>= (fun expr -> preturn (op, expr))))
                     [])
                   >>= (fun acc -> preturn (buildExpressions leftTerm acc) ))

This function allows us to build parsers for binary expressions . The same expressed in pseudo BNF may look like this:

   pBinaryExpression = lowerLevelElementParser ( operators  lowerLevelElementParser )*

Which means that we'll like to recognize an element with lowerLevelElementParser and zero or more pairs of an element recognized by operators followed by another lowerLevelElementParser. What looks strange at first is the use of zero or more which implies that this parser can recognize just one element with lowerLevelElementParser. This will be useful in the following section when we try to implement operator precedence.

As an example we can define a parser for plus as follows:

> let identifyOperator operatorChar operatorResult =
-        concatParsers
-           whitespaceNoNl
-           ((readSpecificChar operatorChar) >> (preturn operatorResult))
- ;;

val identifyOperator :
  operatorChar:char ->
    operatorResult:'a -> (ReaderState -> ('a * ReaderState) option)
          
> let plusOperator = identifyOperator '+' Plus;;

val plusOperator : (ReaderState -> (Operator * ReaderState) option)

> let myPlus = pBinaryExpression plusOperator myPrimaryExpression;;

val myPlus : (ReaderState -> (PExpr * ReaderState) option)

> parse "1+2" myPlus;;
val it : (PExpr * ReaderState) option =
  Some (PBinaryOperation (Plus,PNumber "1",PNumber "2"), {Data = "1+2";
                                                          Position = 3;
                                                          Indentation = [0];})

Arithmetic operations

Now we're going to add support for basic arithmetic operations: +, -, /, *. By using the pBinaryExpression utility function defined in the previous section we can preserve operator precedence. To do this we define the operations as follows:

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

What we say here is that an additive expression is composed of addition or subtraction (disjParser plusOperator minusOperator) of multiplicative expressions. Also we said that a multiplicative expression is composed of multiplication or division (disjParser pDivOperator pTimesOperator) of primary expressions.

An example of using these operators looks like this:

> parse "1 * 2 + 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Plus,PBinaryOperation (Times,PNumber "1",PNumber "2"),PNumber "3"),
     {Data = "1 * 2 + 3";
      Position = 9;
      Indentation = [0];})
> parse "1 + 2 * 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Plus,PNumber "1",PBinaryOperation (Times,PNumber "2",PNumber "3")),
     {Data = "1 + 2 * 3";
      Position = 9;
      Indentation = [0];})

Recursion

One thing that was tricky to implement at first was recursion. At some point in the grammar we need to refer back to the top parser . For example a parentheses expression can have any other expression and its recognized as a primary expression. For example:

1 * (2 + 3)

Needs to be parsed as:

What we need to do is to define a top level pExpression parser used for recognizing all our expressions . This parser will be used to define the parenthesis or nested expression. At this point our grammar written using pseudo BNF looks like this:

pPrimaryExpression = pNumber | pSymbol | pString

pAdditiveExpression = pPrimaryExpression ( (plusOperator |  minusOperator) pPrimaryExpression )*

pMultiplicativeExpression = pAdditiveExpression ( (divOperator |  timesOperator) pPrimaryExpression )*

pTopExpression = pMultiplicativeExpression

In F# this using our set of combinators it looks like:

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

let pTopExpression = pAdditiveExpression

We can use pTopExpression to parse any expression:

> parse "3+10*x-1/32" pTopExpression;;
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Minus,
        PBinaryOperation
          (Plus,PNumber "3",PBinaryOperation (Times,PNumber "10",PSymbol "x")),
        PBinaryOperation (Div,PNumber "1",PNumber "32")),
     {Data = "3+10*x-1/32";
      Position = 11;
      Indentation = [0];})
> parse "x" pTopExpression;;          
val it : (PExpr * ReaderState) option =
  Some (PSymbol "x", {Data = "x";
                      Position = 1;
                      Indentation = [0];})

But now we want to use pTopExpression to define one of our primary expressions:

//WARNING THE FOLLOWING CODE WILL NOT COMPILE
let readLPar =
       concatParsers whitespace (readSpecificChar '(')
let readRPar = readSpecificChar ')'

let pNested = readLPar    >>
              pTopExpression >>= (fun expr ->
              readRPar    >>  (preturn (PNested expr)))

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression

let pTopExpression = pAdditiveExpression

When we try to compile this code we get the following error:

                pTopExpression >>= (fun expr ->
  --------------^^^^^^^^^^^^^^

/dir/stdin(6,15): error FS0039: The value or constructor 'pTopExpression' is not defined

Which is totally correct, pTopExpression is defined after pNested. It has to because pTopExpression is defined using pAdditiveExpression has a dependency on pPrimaryExpression. Until now we only used function composition to create our parsers.

What we want to do is to define:

   number symbol string nested
     ^      ^      ^     ^  |
     |      |      |     |  |
     |      |      |     |  |
     +------+------+-----+  |
            |               |
            |               |
         primary            |
            |               |
            |               |
        additive            |
            |               |
            |               |
      multiplicative        |
            |               |
            |               |
           top  <-----------+

To solve this problem we're going to take advantage of reference variables in F#. And do the following trick:

let pTopExpressions = ref []

let pTopExpression =
       fun state -> (List.reduce disjParser !pTopExpressions) state


let pNested = readLPar    >>
              pTopExpression >>= (fun expr ->
              readRPar    >>  (preturn (PNested expr)))

let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]

let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator)  myPrimaryExpression
         
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator)  pMultiplicativeExpression


pTopExpressions := [pAdditiveExpression]

Using a reference (ref) allows us to create the recursive parser that we need . Here's an example of using it:

> parse "1*(2+3)" pTopExpression;; 
val it : (PExpr * ReaderState) option =
  Some
    (PBinaryOperation
       (Times,PNumber "1",
        PNested (PBinaryOperation (Plus,PNumber "2",PNumber "3"))),
     {Data = "1*(2+3)";
      Position = 7;
      Indentation = [0];})

Next steps

Using the same techniques presented so far we can finish our basic grammar. This is part #2 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/a-simple-language-with-indentation.html.

A simple language with indentation based blocks. Part 1: Parsing combinators

2017-01-29T09:36:00.001-06:00

In this post, I'm going to start with the implementation of the parser using the F# programming language. There are several tools for creating parsers for example FParsec http://www.quanttec.com/fparsec/ or FsYacc/FsLex https://en.wikibooks.org/wiki/F_Sharp_Programming/Lexing_and_,arsing. However for this experiment I wanted to create a small set of simple parser combinators, just to learn more about them.

The contents of this post is loosely based on the definition of the Parser Monad . There are many great articles on the net about them. One of this papers is the incredibly nice Monadic Parsing in Haskell http://www.cs.nott.ac.uk/~pszgmh/pearl.pdf .

Let's start by defining the type of the state of our parser:

   type ReaderState = { Data : string;
                        Position: int;
                        Indentation: int list}

This state contains the following:

Data : the input string containing the program to parse
Position : the current position of the parser inside the Data string
Indentation : an indentation stack (more on that in future posts)

And now we're going to define the type of a "parser" :

ReaderState ->  (('a * ReaderState) option )

This means that a "parser" is a function that takes the reader state and returns a tuple of some parsed value ('a) and the new parsing state. Notice that parsers could fail, so the result it's wrapped in an option type https://docs.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/options.

A very simple example of a parser is the following:

   let readChar(state : ReaderState) : (string * ReaderState) option =
       if state.Data.Length > state.Position then
          Some (state.Data.[state.Position].ToString(),
                { state with Position = state.Position + 1 } )
       else
          None

We can test this parser using the following function:

   let parse (input : string) (parser : (ReaderState ->  (('a * ReaderState) option ))) =
       parser {  Data = input ; Position = 0; Indentation = [0] }

For example:

> parse "hello" readChar;;        
val it : (string * ReaderState) option = Some ("h", {Data = "hello";
                                                     Position = 1;
                                                     Indentation = [0];})

Here we can see that the result is a successful optional Some with a string with the single recognized char . Also part of the result is the new reader state .

We can also specify operations for several characters for example:

   let readZeroOrMoreChars (pred:char -> bool) (state : ReaderState) : (string * ReaderState) option =
       let
         secondPosition = readingChar state.Position pred state.Data
         in
           Some ( state.Data.Substring(state.Position,
                                       secondPosition - state.Position),
                 { state with Position = secondPosition })

We can use this function as follows:

> parse "1231abc" (readZeroOrMoreChars System.Char.IsDigit);;
val it : (string * ReaderState) option = Some ("1231", {Data = "1231abc";
                                                        Position = 4;
                                                        Indentation = [0];})

Using this function we can define a parser for whitespace:

   let whitespace = readZeroOrMoreChars System.Char.IsWhiteSpace

Connecting parsers

The parsers I've been showing so far look very simple, and they are!. The goal is to have small parsers and combine them to create more interesting ones. The next operation is called concatParsers2 and it's goal is create a new parser that will execute two parsers in sequence. That is, apply the first one and with the resulting state, execute the second one.

For example, the following parser for recognizing symbols use the readWithConditionOnChar parser with the readZeroOrMoreChars in order to create a PSymbol with the recognized element:

   let symbol =
       concatParsers2
          (readWithConditionOnChar  (fun c -> System.Char.IsLetter(c, 0)))      
          (fun initialChar ->
               concatParsers2
                  (readZeroOrMoreChars (fun c -> System.Char.IsLetter(c) || System.Char.IsDigit(c)))
                  (fun suffixString -> (preturn (PSymbol (initialChar + suffixString))))
           )

In our little language a symbol is represented by :

a letter
zero or more letters or numbers

We can use the readWithConditionOnChar and readZeroOrMoreChars to archive these goals, for example:

> parse "hello" (readWithConditionOnChar  (fun c -> System.Char.IsLetter(c, 0)))- ;; 
val it : (string * ReaderState) option = Some ("h", {Data = "hello";
                                                     Position = 1;
                                                     Indentation = [0];})
> parse "hi1and2and3" (readZeroOrMoreChars (fun c -> System.Char.IsLetter(c) || - System.Char.IsDigit(c)));;
val it : (string * ReaderState) option =
  Some ("hi1and2and3", {Data = "hi1and2and3";
                        Position = 11;
                        Indentation = [0];})

Notice that we cannot use just readZeroOrMoreChars since it will recognize symbols starting with numbers (like '13hello').

We can create a small parser for recognizing a letter (readWithConditionOnChar) and we can use readZeroOrMoreChars to read the rest. We could combine these two blocks to get another parser. The key for composing our parsers is the concatParsers2 function. This function has the following signature:

> concatParsers2;;
val it :
  ((ReaderState -> ('a * ReaderState) option) ->
     ('a -> ReaderState -> ('b * ReaderState) option) -> ReaderState ->
     ('b * ReaderState) option) = <fun:clo@21-1>

This signature means that concatParsers2 receives a parser that produces a value of type 'a. The second argument is a function that receives a value of type 'a (the result of the first parser) and returns a parser of another type 'b. The result of concatParsers2 is itself another parser that produces a value of type 'b.

The implementation looks like this:

   let concatParsers2 (parser1 : (ReaderState ->  (('a * ReaderState) option )))
                      (parser2N : ('a ->  (ReaderState ->  (('b * ReaderState) option )))) =
       fun input -> match (parser1 input) with
                    | Some (matchedTxt, restState) -> (parser2N matchedTxt) restState
                    | _ -> None

This small function lets us create parsers using small blocks. But before we do that, we need to create another useful operation.

let preturn aValue (state : ReaderState ) = Some (aValue, state)

This operation is very simple, but really important!. It lets us prepare a result. By taking a look at the signature, we can see that it matches our parser signature. We can now used it in conjunction with concatParsers2 to produce results:

>  parse "AXXXC" 
-        (concatParsers2 recognizeABC                     
-                      (fun firstChar ->
-                           (concatParsers2
-                                recognizeXs              
-                                (fun xs -> 
-                                     (concatParsers2
-                                         recognizeABC
-                                         (fun lastChar ->
-                                              preturn (firstChar, xs, lastChar)- ))))));; 
val it : ((string * string * string) * ReaderState) option =
  Some (("A", "XXX", "C"), {Data = "AXXXC";
                            Position = 5;
                            Indentation = [0];})

Given that we define recognizeABC and recognizeXs as follows:

> let recognizeABC = readWithConditionOnChar (fun c -> c = "A" || c = "B" || c= "C");;

val recognizeABC : (ReaderState -> (string * ReaderState) option)

> let recognizeXs = readZeroOrMoreChars (fun c -> c = 'X' ) ;;    

val recognizeXs : (ReaderState -> (string * ReaderState) option)

Using the concatParsers2 function seems a little verbose. We can borrow inspiration from the Haskell's Monad type class https://hackage.haskell.org/package/base-4.9.1.0/docs/Control-Monad.html#t:Monad by defining versions of the >>= operator as follows:

   let inline (>>=) (parser1 : (ReaderState ->  (('a * ReaderState) option )))
                    (parser2N : ('a ->  (ReaderState ->  (('b * ReaderState) option )))) = concatParsers2 parser1 parser2N

Now using this operator we can write:

parse "AXXXC" ( recognizeABC >>= (fun firstChar -> 
                recognizeXs  >>= (fun xs -> 
                recognizeABC >>= (fun lastChar ->
                preturn (firstChar, xs, lastChar)))))

Ignoring output from previous parsers

Another useful operation allows us to connect two parsers but

   let concatParsers (parser1 : (ReaderState ->  (('a * ReaderState) option )))
                     (parser2 : (ReaderState ->  (('b * ReaderState) option ))) =
       fun input -> match (parser1 input) with
                    | Some (_, restState) -> parser2 restState
                    | _ -> None

Notice that it's similar to the concatParsers2 but it ignores the result of parser1. This is useful for discarting whitespace:

let whitespaceNoNl =
    readZeroOrMoreChars
        (fun c -> System.Char.IsWhiteSpace(c) && c <> '\n')

let colon  =
    concatParsers whitespaceNoNl (readSpecificChar ':')

As with the concatParsers2 we can use another well known operator for this operation:

   let inline (>>) (parser1 : (ReaderState ->  (('a * ReaderState) option )))
                   (parser2 : (ReaderState ->  (('b * ReaderState) option ))) = concatParsers parser1 parser2

Useful parsing operations

We need to create more operations that help us create complex parsers. For example, we can create a small operation for optional elements:

   let optionalP (parser : (ReaderState ->  (('a * ReaderState) option ))) (defaultValue:'a) =
       fun input -> match (parser input) with
                    | result & Some _ -> result
                    | _ -> (Some (defaultValue, input))

We can use this new operator to parse numbers as follows:

   let number =
              ( (optionalP (readSpecificChar '-') "") >>= (fun neg -> 
                digitP  >>= (fun firstChar -> 
                (readZeroOrMoreChars (fun c ->  System.Char.IsDigit(c))) >>= (fun chars ->
                (optionalP decimalPartP "") >>= (fun dec ->                                                                               
                preturn (PNumber (neg + firstChar + chars + dec))))

Choosing from two options

The following operation helps us to try to use one parser and fallback to another if it didn't work:

   let disjParser (parser1 : (ReaderState ->  (('a * ReaderState) option )))
                  (parser2 : (ReaderState ->  (('a * ReaderState) option ))) =
       fun input -> match (parser1 input) with
                    | result & Some _ -> result
                    | _ -> parser2 input

A simple scenario using this parser looks like this:

> parse "39" (disjParser symbol number);; 
val it : (PExpr * ReaderState) option =
  Some (PNumber "39", {Data = "39";
                       Position = 2;
                       Indentation = [0];})
> parse "hello" (disjParser symbol number);;
val it : (PExpr * ReaderState) option =
  Some (PSymbol "hello", {Data = "hello";
                          Position = 5;
                          Indentation = [0];})

Repetitions

Identifying sequences of elements identified by a parser is very useful. We can define the following operations

   let rec oneOrMore parser accumulated =
       parser >>=
           (fun lastResult ->
              let result = lastResult::accumulated
              in disjParser (oneOrMore parser result) (preturn (List.rev result)))

   let zeroOrMore parser accumulated =
       disjParser (oneOrMore parser accumulated) (preturn accumulated)

A simple example of using this operation looks like this:

> parse "1   2 3  4" (zeroOrMore (whitespace >> number) []);;
val it : (PExpr list * ReaderState) option =
  Some
    ([PNumber "1"; PNumber "2"; PNumber "3"; PNumber "4"],
     {Data = "1   2 3  4";
      Position = 10;
      Indentation = [0];})

The next post will show how to create more interesting things with the pieces we have so far.

A simple language with indentation based blocks (part 0)

2017-01-29T09:29:00.001-06:00

One of the first things I noticed when reading about Python, is the use of indentation for defining code blocks. For example:

def foo(x):
   while x < 20:
      if x > 10:
         print "argument is greater than 10"
         print "value: " + x
      else:
         print "the argument is less than or equal to 10"
         print "its value is : " + x

If you're only familiar to C-based language this seems a bit strange. Not using characters like '{}' or words like BEGIN/END seems fragile at first. For example one could expect something like:

def foo(x) {
   while x < 20 {
      if x > 10 {
         print "argument is greater than 10"
         print "value: " + x
      } else {
         print "the argument is less than or equal to 10"
         print "its value is : " + x
   }
}

I've always wondered how do you create a parser for this kind of a language. In the following series of posts I'm going to record my experiences writing a parser for a simple little language. This language will use a style similar to Python. I'm going to write the code using F# .

Posts

Solving a small primary school problem with Prolog

2016-08-13T10:37:00.000-06:00

A couple of days ago my small son came home with math homework from school. The problem: add parenthesis to the following arithmetic expression so it makes sense.

14 * 3 - 8 / 2 = 17

When I saw that, I thought it was a nice little programming exercise. Also Prolog seems like an appropriate language to write the a solution for this problem.

To solve this problem we need at least to:

Choose a representation for the input formula and the results
A way to generate all possible combinations of arithmetic expressions
Something to evaluate the arithmetic expression so we can get the result
Let Prolog find the answer we need!

First, we need to generate all possible expressions from given the problem .

Input representation

We're going to represent the input formula as a list of the parts of the expression.

For example, given the following expression:

14 * 3 - 8 / 2

The input representation for this formula is the following:

[ 14, '*', 3, '-', 8, '/', 2 ]

To represent the output formula I'm going to use a term with the form op(operator, left, right).

For example, to represent the following possible groupings:

(9*(6+(6/(6-9))))

It will be represented as:

 op(*, 9, op(+, 6, op(/, 6, op(-, 6, 9))))

Generating expression groupings

Given the representation of the problem we can write a predicate to generate all possible groupings of these operations.

After some unsuccessful attempts I came with the following predicate:

arith_op([X], X) :- number(X),!.
arith_op(Arr, op(Op, X, Y)) :-
    append(First, [Op | Second], Arr),
    arith_op(First, X),
    arith_op(Second, Y).

What I really like about Prolog is that with relative few words we can find a solution for problems like this.

Now I can take advantage from Prolog's backtracking mechanism and find all possible solutions for the following input.

?- arith_op([ 1, '*', 2, '+', 3, '/', 4]  ,X).
X = op(*, 1, op(+, 2, op(/, 3, 4))) ;
X = op(*, 1, op(/, op(+, 2, 3), 4)) ;
X = op(+, op(*, 1, 2), op(/, 3, 4)) ;
X = op(/, op(*, 1, op(+, 2, 3)), 4) ;
X = op(/, op(+, op(*, 1, 2), 3), 4) ;
false.

Evaluating the arithmetic expressions

Having a way to evaluate the expression is useful so we can verify the result of the operation. A simple way to implement it looks like this:

eval(op(Op,X,Y),Result) :-
     eval(X,R1),eval(Y,R2),
     ( (Op = '+',  Result is (R1 + R2))
     ; (Op = '-', Result is (R1 - R2))
     ; (Op = '*', Result is (R1 * R2))
     ; (Op = '/', Result is (R1 / R2))), !.
eval(X, X).

With this predicate we can get the result of an operation. For example:

?- eval(op('+', op('*', 34, 23), 34), R).
R = 816.

Solving the problem

With these two predicates we can solve the problem like this:

?- arith_op([ 14, '*', 3,'-', 8, '/', 2 ]  ,Operation), eval(Operation, 17).
Operation = op(/, op(-, op(*, 14, 3), 8), 2) ;
false.

Now it is useful to present the results using infix notation with parenthesis. To do this we can write the following predicate:

forprint(op(Op,X,Y)) :-
    writef("("),
    forprint(X),
    writef(Op),
        forprint(Y),
    writef(")"),!.
forprint(X) :-
    write(X),!.

Now we can write:

arith_op([ 14, '*', 3,'-', 8, '/', 2 ]  ,Operation), eval(Operation, 17), forprint(Operation).
(((14*3)-8)/2)
Operation = op(/, op(-, op(*, 14, 3), 8), 2) ;
false.

I can also use this predicate to generate samples of results for other groupings. For example:

?- arith_op([ 14, '*', 3,'-', 8, '/', 2 ]  ,Operation), eval(Operation, Result), Result > 0, forprint(Operation).
((14*3)-(8/2))
Operation = op(-, op(*, 14, 3), op(/, 8, 2)),
Result = 38 ;
(((14*3)-8)/2)
Operation = op(/, op(-, op(*, 14, 3), 8), 2),
Result = 17 ;
false.

Some things I learned while creating a small program in Mercury

2016-05-21T21:45:00.000-06:00

Some time ago I started creating a program using the Mercury programming language to create images using the Escape Time algorithm. The goal was to learn about the language by solving a small problem.

The current result of this experiment can be found here https://github.com/ldfallas/graphicswithmercury/. Here's a list of things I learned.

Terms for configuration files

For this program I wanted to have a configuration file to specify :

The resolution of the final image
The coordinates used to render the fractal
The formula to use with the escape time algorithm
The palette to be used to render the final image

To create this configuration file I could use XML or create a special file format and parse it using Mercury's DCG. However I chose to use a different alternative, which is to use the term syntax.

Here's an example of the configuration file:

  fractal_config(
   image_resolution(320,200),
   top_left(-2.0, 1.5),
   bottom_right(1.0 , -1.5),
   formula(z*z + z + c),
   palette(
      single(10,20,30),
      range(from(10, 30, 40),  to(30, 50, 76),127),
      range(from(200, 100, 50),to(150, 0, 0),100),
      range(from(200, 100, 50),to(150, 10, 10),27),
      single(0,0,0)
   )
).

Here I'm saying that:

The image will have a 320px by 200px resolution
The real coordinates of this image are between -2.0 and 1.0 in the X axis and 1.5 and 1.5 in the Y axis
The formula used in the escape time algorithm will be z*z + z + c
The palette will be constructed with the given ranges of colors

In order to read these term I used the term and parser modules which provides an easy interface for reading terms.

Here's a code snippet showing how the file is being loaded.

:- pred read_fractal_configuration_from_file(
            string::in,
            maybe_error(fractal_configuration)::out,
            io::di, io::uo) is det.

read_fractal_configuration_from_file(FileName, Configurati      onResult, !IO) :-
    parser.read_term_filename( FileName,  ReadTermResult, !IO),
    ((ReadTermResult = term(_, Term),
         term_to_fractal_configuration(Term, ConfigurationResult))
      ; (ReadTermResult = error(ErrMessage, _),
          ConfigurationResult = error(ErrMessage))
      ; (ReadTermResult = eof,
          ConfigurationResult = error("Empty file"))
     ).

The parser.read_term_filename reads these terms to a term data structure. The term_to_fractal_configuration predicate creates a configuration structure from these terms. An error is returned if the file doesn't have the expected structure. This is archived using the maybe_error data type.

Here's an example of how the first part of the configuration is loaded:

:- pred term_to_fractal_configuration(
            term(string)::in,
            maybe_error(fractal_configuration)::out) is det.

term_to_fractal_configuration(Term, Result) :-
    (if Term = functor(atom("fractal_config"),Args,_) then
        term_to_fractal_config_resolution(Args, Result)
     else
        error_message_with_location("Expecting 'fractal_config'",Term, Message),
        Result = error(Message)
    ).

One interesting feature of the term library is that it stores line number information. This makes it easy to report errors that occurred in a specific line of the input file:

:- pred error_message_with_location(
            string::in,
            term(string)::in,
            string::out) is det.

error_message_with_location(Msg, functor(_, _, context(_, Line)), ResultMsg) :-
     string.append(", at line:",string.int_to_string(Line),TmpString),
     string.append(Msg, TmpString, ResultMsg).
error_message_with_location(Msg, variable(_, context(_,Line)), ResultMsg) :-
     string.append(", at line:",string.int_to_string(Line),TmpString),
     string.append(Msg, TmpString, ResultMsg).

Now the main predicate for reading the documentation from terms is the following:

:- pred term_to_fractal_config_resolution(
            list(term(string))::in, 
            maybe_error(fractal_configuration)::out).

term_to_fractal_config_resolution(Terms, Result) :-
   (if Terms = [functor(atom("image_resolution"),
                     [ functor(integer(Width), _, _),
                       functor(integer(Height), _, _) ],
                     _)|Rest1] then
       (if  Rest1 = [functor(atom("top_left"),
                     [ functor(float(LeftX), _, _),
                       functor(float(TopY), _, _) ],
                     _)|Rest2] then
                (if  Rest2 = [functor(atom("bottom_right"),
                     [ functor(float(RightX), _, _),
                       functor(float(BottomY), _, _) ],
                     _)|Rest3] then
                    
                    (if Rest3 = [functor(atom("formula"), [Term], _)|Rest4], term_to_expression(Term, ok(Expr)) then
                        (if Rest4 = [PaletteConfig], term_to_palette_config(PaletteConfig, ok(Palette)) then 
                              Result  = ok(config( { Width, Height },
                                                   { LeftX, TopY },
                                                   { RightX, BottomY },
                                                   Expr,
                                                   Palette
 ))
                          
                           else
                              Result = error("Error reading palette")
                        )
                    else
                      Result = error("Error reading formula"))
                 else
                    Result = error("Error expecting: bottom_right(float,float)")
        )

        else
           Result = error("Error expecting: top_left(float,float)")
        )
    else
       Result = error("Error expecting: image_resolution(int,int)")
    ).

One improvement opportunity here is to separate this predicate into several parts to avoid this nesting structure.

As shown above our final goal is to create a result of the following type:

:- type fractal_configuration --->
    config( { int, int },              % image resolution
            { float, float },          % top left cartesian coordinates
            { float, float },          % bottom right cartesian coordinates
            expression,                % formula
            array({ int, int, int })). % palette

This structure provides the necessary information to render the fractal. One special datatype here is expression which is used to store the formula used with the escape time algorithm.

This data type looks like this:


:- type operator ---> times ; plus ; minus ; division.

:- type expression ---> 
     literal_num(float)
     ; var(string)
     ; imaginary
     ; bin_operation(expression, operator, expression).

Since the term library parser can parse arithmetic expressions, I can write simple code that translates terms to these abstract datatype.

Here's the definition of the predicate that does the translation:

:- pred term_to_expression(term(string)::in, maybe_error(expression)::out) is det.

term_to_expression(functor(atom(AtomStr),Ops,_), Expr) :-
   (if (Ops = [Op1,Op2],
        op_to_string(Operator,AtomStr),
        term_to_expression(Op1, ok(Op1Expr)),
        term_to_expression(Op2, ok(Op2Expr))) then
      Expr = ok(bin_operation(Op1Expr, Operator, Op2Expr))
    else
      (if Ops = [] then
          (if AtomStr = "i" then
             Expr = ok(imaginary)
           else
             Expr = ok(var(AtomStr)))
       else
          Expr = error("Error"))
   ).
term_to_expression(functor(float(X),_,_), Expr) :-
   Expr = ok(literal_num(X)).
term_to_expression(functor(integer(X),_,_), Expr) :-
   Expr = ok(literal_num(float(X))).
term_to_expression(functor(big_integer(_,_),_,_), error("Error")).
term_to_expression(functor(string(_),_,_), error("Error")).
term_to_expression(functor(implementation_defined(_),_,_), error("Error")).
term_to_expression(variable(_,_), error("Error")).

Reading the palette

The palette used by the escape time algorithm is just an array of colors. The configuration file allows two kinds of elements for specifying the colors :

single(RED, GREEN, BLUE) a single color for the current entry
range(from(RED1, GREEN1, BLUE1), to(RED1, GREEN2, BLUE2), COUNT) to create a range of colors of COUNT steps between the two colors.

The following code reads the palette configuration:

:- pred terms_to_palette(
            list(term(string))::in, 
            list({int, int, int})::in,
            maybe_error(array({int,int,int}))::out) is det.

terms_to_palette([],TmpResult, ok(ResultArray)) :-
   list.reverse(TmpResult, ReversedList),
   array.from_list(ReversedList, ResultArray).

terms_to_palette([Term|Rest],TmpResult,Result) :-
   (if Term = functor(atom("single"),
                     [functor(integer(R),_,_),
                      functor(integer(G),_,_),
                      functor(integer(B),_,_)],
                     _) then
       terms_to_palette(Rest, [{R,G,B}|TmpResult], Result)
     else
      (if Term = functor(atom("range"),[
                            functor(atom("from"),
                                    [functor(integer(R1),_,_),
                                     functor(integer(G1),_,_),
                                     functor(integer(B1),_,_)],_),
                            functor(atom("to"),
                                    [functor(integer(R2),_,_),
                                     functor(integer(G2),_,_),
                                     functor(integer(B2),_,_)],_),
                            functor(integer(Count),_,_)
                         ], _) then
           int_interpolate_funcs(R1, R2, 1, Count, _, R2RFunc),
           int_interpolate_funcs(G1, G2, 1, Count, _, G2GFunc),
           int_interpolate_funcs(B1, B2, 1, Count, _, B2BFunc),
           gen_colors_for_range(1, Count, R2RFunc, G2GFunc, B2BFunc, [], RangeList),
           list.append(TmpResult, RangeList,  NewTmpResult),
           terms_to_palette(Rest, NewTmpResult, Result)
       else
           Result = error("Problem reading palette configuration"))
).

Given the following configuration:

...
   palette(
       range(from(20,244,100), to(200,0,56), 15),
       single(0,0,0)
...

We can generate the following palette:

1. {20, 244, 100}
2. {32, 226, 96} 
3. {45, 209, 93} 
4. {58, 191, 90} 
5. {71, 174, 87} 
6. {84, 156, 84} 
7. {97, 139, 81} 
8. {110, 122, 78} 
9. {122, 104, 74} 
10. {135, 87, 71} 
11. {148, 69, 68}
12. {161, 52, 65} 
13. {174, 34, 62} 
14. {187, 17, 59}
15. {200, 0, 56}
16. {0, 0, 0}

Determinism categories in Mercury

2016-04-10T19:44:00.001-06:00

When you define a predicate in Mercury, you have to specify if it can fail or succeed more than once. This is called a determinism category.

The category is specified as part of a predicate (or function) declaration. For example:

:- pred get_partitions(
          list(int)::in,
              list(list(int))::out) is multi.

The is multi section says that this predicate belongs to the multi category.

The following tables describe the main determinism categories.

Maximum number of solutions:

Mode	1	> 1
det	x
multi		x
semidet	x
nondet		x

Failure:

Mode	Can fail?
det	no
multi	no
semidet	yes
nondet	yes

(Based on information from https://mercurylang.org/information/doc-latest/mercury_ref/Determinism-categories.html#Determinism-categories ).

Other categories exist : erroneous , failure, cc_mult and cc_nondet. These are not discussed in this post, see the link above for more info.

Now a some of each category:

det

These predicates in must always succeed . For example:

:- pred list_length(list(int)::in, int::out) is det.

list_length([_|R], Length) :-
   list_length(R,SubListLength),
   Length = SubListLength + 1.
list_length([], 0).

In this case this predicate is going to calculate the size of a list. It should not fail.

semidet

These predicates can either succeed or fail. For example the following code shows the definition of a first_even_number predicate.

:- pred first_even_number(list(int)::in, int::out) 
           is semidet.

first_even_number([X|R], N) :-
   (if (X mod  2) = 0 then
       N = X
    else
       first_even_number(R,N)).

Notice that this predicate can fail for a couple of reasons:

The input list may be empty
The input list may not contain an even number

I think it's pretty nice that you these situations will be handled without explicitly writing code for it.

A use of this predicate looks like this:

...
   (if first_even_number([3,41,5,32,342], EvenNumber) then
       io.write_string("First even number: ", !IO),
       io.write(EvenNumber, !IO)
    else
       io.write("Not even number found", !IO)
    ).

Another think that's pretty nice about Mercury is that, the compiler is going to detect inconsistent determinism annotations. For example, if I change the declaration of the predicate to:

:- pred first_even_number(list(int)::in, int::out) 
           is det.

The compiler is going to fail with the following error:

testsolutions.m:093: In `first_even_number'(in, out):
testsolutions.m:093:   error: determinism declaration not satisfied.
testsolutions.m:093:   Declared `det', inferred `semidet'.

multi

The multi category is used for predicates that succeed in multiple ways. At least one solution exists.

For example the following predicate is used to get a pair of lists that, when concatenated together, result in the input list (ex. [1,2,3] result in { [1],[2,3] }) .

:- pred partitions(list(int)::in, 
           {list(int), list(int)}::out) is multi.


partitions(InputList,Output) :-
(
   Output = {[] ,  InputList}
 ;
   InputList = [A|TMP],
   partitions(TMP,{R,B}),
   Output = { [A|R], B }
).

This predicate is multi because we can get several different pairs of lists that from an input list. For example:

main(!IO) :-
   solutions(partitions([513,242,355,4]),Pairs1),
   io.write(Pairs1,!IO),
   io.nl(!IO).

Running this program result in the following output:

[{[], [513, 242, 355, 4]}, {[513], [242, 355, 4]}, {[513, 242], [355, 4]}, {[513, 242, 355], [4]}, {[513, 242, 355, 4], []}]

In the case we use the solutions/2 Mercury predicate to get a list from the generated solutions.

It's important to note that multi predicates must not fail.

nondet

Finally the nondet category is used for predicates that result on 0 or many solutions.

For example the following predicate result on an even number that is part of the input list.

:- pred even_member(list(int)::in, int::out) is nondet.

even_member([X|_], X) :-
   (X mod  2) = 0.
even_member([_|Rest], X) :-
  even_member(Rest,X).

This predicate is nondet because :

The list may be empty
It may not contain an even number

Here's an example of how to use it:

main(!IO) :-
   solutions(even_member([513,242,18,355,12,4]),Pairs1),
   io.write(Pairs1,!IO).

Running this program result in the following output:

[4, 12, 18, 242]

A couple of quick notes on Mercury #1

2016-02-28T20:36:00.000-06:00

I'm trying to learn about Mercury programming language. Here's a quick list of things I learned recently about it.

Getting a Windows version of the Mercury compiler

I was able to get a version of the Mercury compiler by downloading a "Release of the day" version from here: http://dl.mercurylang.org/index.html.

I just followed the instructions from INSTALL file. The only requirement is to have a Cygwin version installed.

Operations are associated with a data type

For example the predicate for performing an operation N times is defined in the int module (int.fold_up, int_fold_down).

Predicates can be partially applied

Mercury supports a mechanism similar to currying in other languages.

For example, in the following invocation to int.fold_up we're not specifying values or variables for the last three arguments(these are resolved by the application inside int.fold_up):


:- pred write_pixel_data(
        io.binary_output_stream::in,
 array(int)::in,
        int::in,  % width
        int::in,  % height
        int::in,  % padding
        int::in,  % idx
        io::di, io::uo) is det.
.
.
.
int.fold_up(write_pixel_data(Stream,
            ImgData,
            Width * 3,
            Height,
            (RowWidth - (Width * 3))),0,(RowWidth*Height) - 1,!IO)

Bitwise operators look nice!

For example bitwise "and" look like:

   io.write_byte(Stream, (IntValue >> 8) /\ 0xFF,!IO),

Code for this post can be found here

Using traits to reuse code in Pharo

2015-10-18T20:49:00.000-06:00

While working on a small Tetris-like clone in Pharo, I found an opportunity to reuse some code.

It's common for Tetris implementations to show a small preview of the tetrimino that comes next. Here's how it looks:

I wanted to create a separate Morphic component to show this preview. But I didn't want to duplicate the code required to paint the tetriminos. Sharing this code will allow me to have just one place to change the way tetriminos look. Also I didn't want to create a base class on top of both the game and the preview Morphs.

While reading about Pharo and Squeak I found that it supports Traits. The Pharo collaborActive book provides a nice explanation of how to use traits. Here's a quick definition from this document:

... traits are just groups of methods that can be reused in different classes.

This is exactly what I needed for reusing the tetrimino matrix drawing code. The following code shows the code defined in the game matrix Morph component.

drawOn: canvas
   "Draws the current game state"
   |rows columns currentValue rectangle currentColor cellWidth cellHeight|

   rows := gameState size x.
   columns := gameState size y.

   super drawOn: canvas.

   cellWidth :=   ((self width) / columns) asFloat truncated.
   cellHeight :=   ((self height) / rows) asFloat truncated.
   1 to: rows do: [ :row |
      1 to: columns do: [ :column|
         currentValue := gameState at: row at: column .
         currentValue ~= 0 ifTrue: [
                 currentColor := (colors at: currentValue).
                 rectangle := Rectangle left: (self bounds left)



                 canvas frameAndFillRectangle: rectangle
                                 fillColor:  currentColor
                                 borderWidth:  1
                                 borderColor: (Color white).
                  ]
          ]
       ].

Moving this code to a trait implies that I have to pass all instance state variables as an argument of the draw method.

The definition of the trait looks like this:

Trait named: #TTetriminoDrawing
    uses: {}
    category: 'TryTrix'

And the definition of the method inside this trait looks like this:

drawGameStateOn: canvas 
      width: areaWidth height: areaHeight 
      columns: columns rows: rows 
      morphBounds: morphBounds
      matrix: contentMatrix
      colors:  colorPalette
   "Draw the contents of the specified matrix on the given canvas"
   |cellWidth cellHeight currentValue currentColor rectangle|
      cellWidth :=   (areaWidth / columns) asFloat truncated.
   cellHeight :=   (areaHeight / rows) asFloat truncated.
   1 to: rows do: [ :row |
      1 to: columns do: [ :column|
         currentValue := contentMatrix at: row at: column .
         currentValue ~= 0 ifTrue: [ 
            currentColor := (colorPalette at: currentValue).
            rectangle := Rectangle left: (morphBounds left) + ((column - 1)*cellWidth) 
                                    right: (morphBounds left) + ((column - 1)*cellWidth) + cellWidth
                                    top: (morphBounds top) + ((row - 1)*cellHeight )
                                    bottom: (morphBounds top) + ((row - 1)*cellHeight ) + cellHeight.
            canvas frameAndFillRectangle: rectangle
                  fillColor:  currentColor
                  borderWidth:  1
                  borderColor: (Color white).
             ]
          ]
       ].

Now I can use this trait inside each morph definition:

Morph subclass: #TryTrixMorph
    uses: TTetriminoDrawing
    instanceVariableNames: 'gameState colors eventHandlers'
    classVariableNames: ''
    category: 'TryTrix'


Morph subclass: #TetriminoPreview
    uses: TTetriminoDrawing
    instanceVariableNames: 'matrix'
    classVariableNames: ''
    category: 'TryTrix'

This code can be found in here.

Exploring Beautiful Languages

A quick look at functors in OCaml

OCaml functors

The example: Code generation via method calls or operators

Haskell 'newtype' and the record syntax

Some considerations for using closures in Rust/WASM

Passing a closure that is going to outlive the current function call

The closure is dropped before ‘requestAnimationFrame’ does its job

Resources not being released

Not following the requirement of using FnMut for the closure

Conclusions

Implementing WHILE in a toy BASIC interpreter

1.2 Strategy

1.3 Additional problems

Executing code from a buffer with Rust on Windows

Exploring a Webpack stats file with Prolog

A small programming exercise in APL #2: Combinations

Strategy

Generating the boolean arrays

Selecting desired elements

Final words

A small programming exercise in APL #1

A possible APL solution

Reading APL #1: Roman numerals to decimal

Reading APL

The example

Reading the code

A small experiment with fragment shaders

1.1 The experiment

1.2 The program

1.3 Strategy for rendering the Mandelbrot set

1.4 Zooming in

1.5 Conclusion

First programming language: GW-BASIC

A quick note about programming in Prolog

Going the other way

Several alternatives for converting between switch and if

Using Racklog for parsing

1.1 How it looks

1.2 Creating parsing predicates

1.3 Sequences

1.4 Optionals

1.5 Multiple possible ASTs

1.6 Code

A small Tetris-like clone using J and ncurses. Part 2

Creating the ncurses UI

A small Tetris-like clone using J and ncurses. Part 1

Tetriminos

Tetrimino rotation

Tetrimino placement

Checking if space is available

End words

A simple language with indentation based blocks. Part 4: Improving error messages

Failure information

Example

Expected failure

Next steps

A simple language with indentation based blocks. Part 3: Blocks

Identifying blocks by using indentation

Blocks

Next steps

A simple language with indentation based blocks. Part 2: Expressions

Simple atomic expressions

Binary operations

Arithmetic operations

Recursion

Next steps

A simple language with indentation based blocks. Part 1: Parsing combinators

Connecting parsers

Ignoring output from previous parsers

Useful parsing operations

Choosing from two options

Repetitions

A simple language with indentation based blocks (part 0)

Posts

Solving a small primary school problem with Prolog

Input representation

Generating expression groupings

Evaluating the arithmetic expressions

Solving the problem

Several alternatives for converting between `switch` and `if`