Sunday, December 26, 2021

A small programming exercise in APL #2: Combinations

In this post I'm going to continue my APL exploration by working in a possible solution for the problem of generating all the combinations of 'n' elements of an array.

Generating the 'n' combinations of an array means generating a sequence of all possible 'n' unique elements from the original array . For example, given the array 12, 34, 35, 65 all possible '2' combinations of this array are:

• 12, 34
• 12, 35
• 12, 65
• 34, 35
• 34, 65
• 35, 65

Notice that order is not important. That is "12, 34" is considered to be the same as "34, 12". Generating all 'n' permutations of the elements of an array may be an interesting problem for a future post.

One of my goals was to be as idiomatic as possible (with my limited APL knowledge!). Because of this, I will try avoid using explicit loops or conditionals and instead use array operators.

Strategy

The general strategy for solving this problem was to calculate all possible boolean arrays having 'n' bits and use Compress to extract the elements.

For example, in the array 12, 34, 35, 65 all possible boolean vectors having two bits on are:

• 1 1 0 0
• 1 0 1 0
• 1 0 0 1
• 0 1 1 0
• 0 1 0 1
• 0 0 1 1
Using these vectors in APL we get the elements we need:

a
12 34 35 65
1 1 0 0 / a
12 34
1 0 1 0 / a
12 35
1 0 0 1 / a
12 65
0 1 1 0 / a
34 35
0 1 0 1 / a
34 65
0 0 1 1 / a
35 65

This strategy is not efficient in space or time but it allowed me to explore a solution without using imperative constructs or recursion.

Generating the boolean arrays

To generate the boolean arrays we start by generating all integer values required to encode n-bit values:

a ← 12 34 35 65
⍳ ( ¯1 + 2 * ⍴ a)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

When converting these numbers to binary we get all possible boolean vectors for 4-bit values. This is same as the same as the length of our sample array. We can see these arrays if we use Encode and Each:

{ ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ⍳ ( ¯1 + 2 * ⍴ a)
0 0 0 1  0 0 1 0  0 0 1 1  0 1 0 0  0 1 0 1  0 1 1 0  0 1 1 1  1 0 0 0\
1 0 0 1  1 0 1 0  1 0 1 1  1 1 0 0  1 1 0 1  1 1 1 0  1 1 1 1

I can reshape this array to get a better representation:

15 1 ⍴ { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ⍳ ( ¯1 + 2 * ⍴ a)
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1

Now we are only interested in boolean arrays with 'n' number of '1' bits. For n = 2 we can get only those elements using the Compress operator:

seq ← ⍳ ( ¯1 + 2 * ⍴ a)
n ← 2
({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
3 5 6 9 10 12

Again we can visualize these values using reshape and encode:

6 1 ⍴ { ((⍴ a) ⍴ 2) ⊤ ⍵ } ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0

Selecting desired elements

Now that we have our boolean arrays we can pick all the values using Compress:

{(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
35 65  34 65  34 35  12 65  12 35  12 34

And again we can reshape this array to see it better:

6 1 ⍴ {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ ({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq
35 65
34 65
34 35
12 65
12 35
12 34

Finally we can pack these expressions in a function:

∇r←a ncombinations1 n;seq
seq ← ⍳ ( ¯1 + 2 * ⍴ a)
r ← {(((⍴ a) ⍴ 2) ⊤ ⍵)/a} ¨ (({ n =  +/ ((⍴ a) ⍴ 2) ⊤ ⍵} ¨ seq) / seq\
)
∇

We can use it like this:

1 2 3 4 5 ncombinations1 3
3 4 5  2 4 5  2 3 5  2 3 4  1 4 5  1 3 5  1 3 4  1 2 5  1 2 4  1 2 3

APL has a build-in Binomal operator which allows us to calculate the number of combinations. For example:

a ← 1 2 3 4 5
(3 ! ⍴ a) = ⍴ a ncombinations3 3
1

Final words

It was very interesting (and difficult) to try to write a solution of this problem using only array operations. Reading the definion of the ncombinations1 I noticed that there are many expressions nested in parenthesis (maybe related to my limited APL knowledge). I know that APL developers value terseness, so I tried to reduce the size of the expressions. Here is a new version:

∇r←a ncombinations3 n;seq
s ← ⍳ ( ¯1 + 2 * ⍴ a)
b ← 2⍴⍨⍴a
r ← {a/⍨b⊤⍵}¨s/⍨{n=+/b⊤⍵}¨s
∇

I was able to remove a couple of parenthesis by taking advantage of the Commute operator.

GNU APL was used to create the examples in this post.

Monday, July 26, 2021

A small programming exercise in APL #1

This post shows a possible solution written in APL for the following programming problem:
Given an array, determine if a value 'number' is found in every consecutive segment of 'size' elements.

For example, given this array:

1,2,1,3,4,1

This predicate is true for number = 1 and size = 2. Because '1' is found in (1,2), (1,3) and (4,1).

A possible APL solution

This is a possible solution to this problem (using my limited APL knowledge).

∇ z←array arraysegments args
number ← args
size ← args
z ← ∧/ ({number∊⍵} ⍤1) (((⍴ array) ÷ size) , size) ⍴ array
∇

This function is called arraysegments and is a dyadic function. It receives the array to validate as the left argument and on the right argument a two-value array with the value ('number') to find inside the array segment and the length of the segments.

An example of using this function

1 2 1 3 4 1 arraysegments 1 2
1
1 2 1 3 4 1 arraysegments 2 2
0
1 2 1 3 4 1 arraysegments 1 3
1
300 200 300 300 400 500 arraysegments 300 3
1
300 200 300 300 400 500 arraysegments 400 3
0

To see how this function works I'm going to start by defining some sample data:

array ← 1 2 1 3 4 1
number ← 1
size ← 3

1. First, start by reshaping the input array into a matrix with each row being

(((⍴ array) ÷ size) , size)
2 3
(((⍴ array) ÷ size) , size) ⍴ array
1 2 1
3 4 1

2. Now that we have the 'groups' as the rows of the new matrix, we apply the membership operation to each element of the matrix. To do this, we use the rank operator in conjunction with an inline function to test the membership:

({number∊⍵} ⍤1) (((⍴ array) ÷ size) , size) ⍴ array
1 1

As a final step we apply the reduce operator '/' with the And operator to see if the input number exists in every group:

∧/ ({number∊⍵} ⍤1) (((⍴ array) ÷ size) , size) ⍴ array
1

There are some problem with this solution. For example if the array cannot be splitted in groups of the specified size you get a runtime error.

Sunday, June 13, 2021

Reading APL #1: Roman numerals to decimal

As part of a new recurring series I’m going to take a small APL function or program and try to break it down in to its parts. I hope this is going to help me get a better understanding of the language.

Order of evaluation: Expressions in APL are evaluated right-to-left, and there is no hierarchy of function precedence

Two additional concepts are useful to understand when trying to read code:

1. A monadic function : an operation applied to the right expression
2. A dyadic function : a function applied to two arguments ( left and right )

One example of these elements is the following:

2 2 ⍴⍳⍴ 8 8 8 8
1 2
3 4

Here is an example of evaluating this expression from right to left.

myarray←8 8 8 8
myarray
8 8 8 8
length_of_my_array← ⍴ myarray
length_of_my_array
4
new_array← ⍳ length_of_my_array
new_array
1 2 3 4
2 2 ⍴ new_array
1 2
3 4

Note that here we use both as a monadic function (Shape) and as a dyadic function (Reshape).

The example

For this post I’m going to use a small function I found in the APL Utils repository by Blake McBride . This function takes a string representing a roman number and converts it back to an integer:

∇z←Romanu a
z←+/(¯1+2×z≥1↓z,0)×z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]
∇

An example of executing this function:

Romanu 'XVII'
17
Romanu 'IX'
9

Let’s start by reading the function definition:

∇z←Romanu a
...
∇

This code represents the definition of the Romanu function with an a argument.

Now, the interesting part is reading the body of the function which performs the actual calculations.

z←+/(¯1+2×z≥1↓z,0)×z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

As described in the previous section we need to read this code from right to left. To do this we need to ‘parse’ it and extract the largest complete expression from the right:

(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

This expression is a particular example of an bracked indexing expression(https://aplwiki.com/wiki/Bracket_indexing). A simpler example of this is:

(10 20 30)
10
(10 20 30)
20
(10 20 30)
30
(10 20 30)[3 1 1 2]
30 10 10 20
a←10 20 30
a[3 1 1 2]
30 10 10 20

As shown above this expression is used to select elements of an array into a new array.

In the Romanu function this expression is used to assign a value to eacho of the roman numeral 'digits':

'IVXLCDM'⍳a

Here the dyadic form of or ‘Index of’(https://aplwiki.com/wiki/Index_Of) . This expression returns the indices of the left side array of the elements on the right. Here are some examples of this expression in action:

(10 20 30) ⍳ (20 20 10 20 30 10)
2 2 1 2 3 1

When using this expression with the roman digits string we get the equivalent positions:

'IVXLCDM' ⍳ 'XVII'
3 2 1 1

As shown above we get the positions in the roman digits string. Returning to our original expression we use the selection expression to get the value of each roman digit in decimal form:

(1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'XVII']
10 5 1 1

As you can see here we almost have the result for the conversion of XVII. We just have to sum these numbers up to get 17.

Going back to the example and continuing reading for right to left we find and assignment to the z variable.

z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

The rest of the expression is going to take care of subtracting the value required for converting numbers like: ‘IX’.

Reading the expression from right to left we find:

(¯1+2×z≥1↓z,0)×z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]

This expression is a multiplication (×). For arrays it applies to operation element wise :

2 3 × 4 5
8 15

The interesting part here is the expression being used for the left operand of the multiplication:

(¯1+2×z≥1↓z,0)

Again we are going to analyze this expression from right to left.

z,0

This expression returns the an array with a zero at the end:

(10 20 30),0
10 20 30 0
z←(1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'IX']
z,0
1 10 0

Then the following operation drops the first element of the array:

1↓z,0
10 0

The drop expression returns an array without the first ‘n’ elements:

3 ↓ 20 30 40 30 20 10
30 20 10

The following section is very interesting . We use the greater equal expression to determine which elements of the array are greater or equal than the second array:

10 20 30 ≥ 5 21 30
1 0 1

Applying this to our example for ‘IX’:

z←(1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'IX']
z≥1↓z,0
0 1

This is interesting: we get an array that indicates which entries are lower than its successor. In the case of IX we are getting 0 1 saying that the first digit needs to be subtracted from the second. The final part performs this operation.

z←(1 5 10 50 100 500 1000)['IVXLCDM' ⍳ 'IX']
z≥1↓z,0
0 1
¯1+2×z≥1↓z,0
¯1 1
(¯1+2×z≥1↓z,0)×z
¯1 10

Now we can finally sum all the numbers on the resulting array and get the decimal number. This is performed with the reduce operation / (https://aplwiki.com/wiki/Reduce) with the + operator.

+/ 10 20 30
60
10 + 20 + 30
60

Applying this operation to our partial expression returns the expected value:

(¯1+2×z≥1↓z,0)×z
¯1 10

+/(¯1+2×z≥1↓z,0)×z
9

We can see this process in action by executing all the parts with an interesting number like MMCXXIX (2129).

z←(1 5 10 50 100 500 1000)['IVXLCDM'⍳a]
z
1000 1000 100 10 10 1 10
z,0
1000 1000 100 10 10 1 10 0
1↓z,0
1000 100 10 10 1 10 0
z≥1↓z,0
1 1 1 1 1 0 1
¯1+2×z≥1↓z,0
1 1 1 1 1 ¯1 1
+/(¯1+2×z≥1↓z,0)×z
2129

Saturday, June 5, 2021

A small experiment with fragment shaders

I wanted to work on an experiment that allowed me to learn a little bit about modern graphics programming with OpenGL (with shaders that is). A nice choice is the 'hello world; of graphics programming: rendering the Mandelbrot set.

In this post I'm going to record my experiences from zero to a basic rendering of the Mandelbrot set. Here is how the final product looks like:

1.1 The experiment

To get the "native" feel I decided to create the example as a C++ Windows program (but using mostly C). There are many options to do this, but the following combination worked for me:

I'm quite impressed with the VSCode C++ support .

1.2 The program

The structure of the program it's very simple since it is a hello world program. We start with some initialization boilerplate (quite short since GLFW abstracts away many windowing system details):

GLFWwindow *window;

glfwSetErrorCallback(error_callback);

if (!glfwInit())
{
return -1;
}

window = glfwCreateWindow(
800,
600,
NULL,
NULL);

if (!window)
{
glfwTerminate();
return -1;
}
glfwMakeContextCurrent(window);

This code creates a 800x600 window ready to use with OpenGL. To access OpenGL I'm using a loaded called GLAD.

1.3 Strategy for rendering the Mandelbrot set

In this experiment I am going to use the "escape time algorithm" to render the image of the Mandelbrot set. This algorithm is very simple and easy to implement using fragment shaders.

Before we start implementing this algorithm we are going to define geometry to and apply a fragment shader to it. Since we are going to render a 2D image, the easiest way to do this is to create two triangles that fill the viewport. Since OpenGL uses normalized coordinates we can define this triangles using values between -1.0 and 1.0 using the following code:

GLfloat vertices[] = {
-1.0f, 1.0f,
1.0f, 1.0f,
1.0f, -1.0f,

1.0f, -1.0f,
-1.0f, -1.0f,
-1.0f, 1.0f};

GLuint vertex_buffer;
glGenBuffers(1, &vertex_buffer);
glBindBuffer(GL_ARRAY_BUFFER, vertex_buffer);
glBufferData(
GL_ARRAY_BUFFER,
sizeof(vertices),
vertices,
GL_STATIC_DRAW);

This code is going to define two triangles that are highlighted here:

This code is also making these coordinates as the active ones (glBindBuffer).

Now we need both a vertex shader to apply transformations to the vertex data and a fragment shader to provide color to our pixels.

Our vertex shader is really simple since we are not performing transformations to the triangles:

#version 110
attribute vec2 pos;
void main() {
gl_Position = vec4(pos, 0.0, 1.0);
}

In this vertex shader we can manipulate the vertices of the triangles that we are going to draw. The X and Y values of the vertices are passed using the pos attribute. This is not automatic, we need to specify the data that is passed to the vertex shader in the C++ program for the pos attribute. This is accomplished by using the following code:

GLuint vpos_location;
vpos_location = glGetAttribLocation(program, "pos");
glEnableVertexAttribArray(vpos_location);
glVertexAttribPointer(
vpos_location,
2,
GL_FLOAT,
GL_FALSE,
sizeof(float) * 2, 0);

One of the most interesting aspects of this snippet is the [glVertexAttribPointer](https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glVertexAttribPointer.xhtml). This function specifies the way the values are going to be extracted from the array data. Here we specify:

1. vpos_location the attribute that we are configuring
2. 2 the number of components (remember that pos is vec2)
3. GL_FLOAT the data type of the data element
4. GL_FALSE the data is not normalized. This seems to be important for integer data (here we use floating point data, more info here: https://gamedev.stackexchange.com/questions/10859/glvertexattribpointer-normalization).
5. sizeof(float) * 2 The offset between consecutive elements. This is useful when the array has different kinds of data elements. For example vertices and normals mixed in the same array. This parameter must be used to skip undesired data elements. In our case 0 is also a valid value since we only have vertex data.

The boilerplate code to compile this shader is the following:

{
std::cout << "Vertex shader not compiled";
GLchar message;
GLsizei log_size;
std::cout << message;
}

When you are starting with shaders, the call to glGetShaderiv is useful . This code returns error information that occurred when compiling the shader.

There is an important part of the process that is always confusing to me. Many things in the OpenGL API change a global state aspect of the functionality. For example the glBindBuffer function used above is going to determine the set of vertices used by the glDrawArrays function.

Here's the code of the fragment shader:

#version 110
uniform vec4 boundaries;
void main()  {
float x0, y0,x ,y;
x0 = gl_FragCoord.x*((boundaries.z - boundaries.x)/800.0) + boundaries.x ;
y0 = gl_FragCoord.y*((boundaries.w - boundaries.y)/600.0) + boundaries.y ;
int maxIteration, iteration;
maxIteration = 256;
iteration = 0;
while((x*x + y*y <= 4.0) && (iteration < maxIteration)) {
float tmp;
tmp = x*x - y*y + x0;
y = 2.0*x*y + y0;
x = tmp;
iteration = iteration + 1;
}
gl_FragColor = vec4(vec3(0.0, float(iteration)/256.0, 0.0 ),1.0);
}

This code contains a simple implementation of the escape time algorithm described in Wikipedia here: https://en.wikipedia.org/wiki/Plotting_algorithms_for_the_Mandelbrot_set#Unoptimized_naïve_escape_time_algorithm .

For simplicity this shader chooses only between shades of green for the colors.

1.4 Zooming in

As part of this experiment I wanted to have the possibility to zoom a particular area of the Mandelbrot set. To add support for this we need to pass the top-left and bottom right coordinates of the area when want to render. This is accomplished by declaring a uniform to pass this value from the C++ program.

Here is the code that uses this parameter in the fragment shader:

uniform vec4 boundaries;
...
x0 = gl_FragCoord.x*((boundaries.z - boundaries.x)/800.0) + boundaries.x ;
y0 = gl_FragCoord.y*((boundaries.w - boundaries.y)/600.0) + boundaries.y ;

And here is the code that specify the value of the boundaries uniform in the C++ program:

int coordinatesUniformLocation = glGetUniformLocation(program, "boundaries");
glUniform4f(coordinatesUniformLocation, boundaries.x1, boundaries.y1, boundaries.x2, boundaries.y2);

We can manipulate the values of the boundaries array using a mouse click handler like this:

glfwSetMouseButtonCallback(window, mouseCallback);
...
void mouseCallback(GLFWwindow* window, int button, int action, int mods) {
if (action == GLFW_PRESS) {
double xpos, ypos;
glfwGetCursorPos(window, &xpos, &ypos);

ypos = 600 - ypos;
double xclick, yclick;
xclick = xpos*((boundaries.x2 - boundaries.x1)/800.0) + boundaries.x1;
yclick = ypos*((boundaries.y2 - boundaries.y1)/600.0) + boundaries.y1;

double currentWidth = (boundaries.x2 - boundaries.x1) - (boundaries.x2 - boundaries.x1)/10;
double currentHeight = (boundaries.y2 - boundaries.y1) - (boundaries.y2 - boundaries.y1)/10;
boundaries.x1 = (float)( xclick - currentWidth/2);
boundaries.x2 = (float)( xclick + currentWidth/2);

boundaries.y1 = (float)( yclick - currentHeight/2);
boundaries.y2 = (float)( yclick + currentHeight/2);
}
}

Finally we define the code draw loop like this:

while (!glfwWindowShouldClose(window))
{
int width, height;
glfwGetFramebufferSize(window, &width, &height);
glViewport(0, 0, width, height);
glClear(GL_COLOR_BUFFER_BIT);
glUseProgram(program);
int coordinatesUniformLocation = glGetUniformLocation(program, "boundaries");

glUniform4f(coordinatesUniformLocation, boundaries.x1, boundaries.y1, boundaries.x2, boundaries.y2);

glDrawArrays(GL_TRIANGLES, 0, 6);
glfwSwapBuffers(window);
glfwPollEvents();
}

1.5 Conclusion

This was a fun experiment!. I got a small glimpse on how to work with OpenGL. Future posts are going to explore even further and maybe use other programming languages to explore OpenGL.

Code for this experiment can be found here: https://github.com/ldfallas/openglmandelbrot .

Sunday, February 23, 2020

First programming language: GW-BASIC

Like many developers my age, the first programming language I ever used was BASIC. Specifically, GW-BASIC in a nice Epson "Abacus" XT machine with a green-on-black monitor back in primary school.

For me, the experience of writing my first program was magical. Having feedback from the computer with just few key strokes was an essential part of it.

10 PRINT "hola"
RUN
hola

Writing a small/toy implementation of GW-BASIC may be a good exercise and a homage to this programming language. In the following posts I'm going to write about the ongoing experience of doing this.

As with other posts in this blog I'm going to try to write this program using a language I'm learning. For this task I'll be using Rust. Rust is a very interesting language with many concepts to learn. For me the only way of learning a new programming language is trying to use it to implement something.

The repo for work in this small project it's located here: https://github.com/ldfallas/rgwbasic .

The current implementation still lacks of most of the features to make it a useful (or usable) implementation. But at least you can write:

\$ cargo run
...
Finished dev [unoptimized + debuginfo] target(s) in 0.05s
Running `target/debug/rgwbasic`
Ok
10 print "hello"
Ok
20 goto 10
Ok
run
"HELLO"
"HELLO"
"HELLO"
"HELLO"
"HELLO"
...

Sunday, March 10, 2019

A quick note about programming in Prolog

Prolog predicates work in different ways depending on how you use them. A simple example is the append predicate. One may say that append is used to get the result of appending two lists. For example:

?- append([1,2,3], [4,5,6], NewList).
NewList = [1, 2, 3, 4, 5, 6].

But we can also use append to get the prefix of a list given a suffix.

?- append(Prefix, [5,6], [1,2,3,4,5,6]).
Prefix = [1, 2, 3, 4] ;

Or we can get all the possible combinations of lists that can concatenated produce a specified list:

?- append(First, Second, [1,2,3,4,5,6]).
First = [],
Second = [1, 2, 3, 4, 5, 6] ;
First = ,
Second = [2, 3, 4, 5, 6] ;
First = [1, 2],
Second = [3, 4, 5, 6] ;
First = [1, 2, 3],
Second = [4, 5, 6] ;
First = [1, 2, 3, 4],
Second = [5, 6] ;
First = [1, 2, 3, 4, 5],
Second =  ;
First = [1, 2, 3, 4, 5, 6],
Second = [] ;

This is a powerful concept that is accomplished by Prolog's execution mecanism which involves unification, resolution and backtracking. This mechanism tries to find a way to bind free variables so the goal makes sense.

Recently, I found an example of this behavior while writing some quick experiments with a JavaScript parser I'm working on (which deserves a separate post).

It all started while trying to define a predicate to translate from a JavaScript switch statement to an equivalent sequence of if/else statements. An example of this conversion it's take the following statement:

switch(myVar) {
case 1:
print('one');
break;
case 2:
print('two');
break;
}

And convert it to something like:

if (myVar == 1) {
print('one');
} else if (myVar == 2) {
print('two');
}

A simple version of a predicate that performs this conversion is the following:

switch_if(js_switch(js_identifier(Variable, _),
Cases,
_),
IfStat) :-
switch_if_cases(Variable, Cases, IfStat).

switch_if_cases(Variable,
[js_case(Value, Body, _)|Rest],
js_if(Comparison,
js_block(NoBreakBody, _),
RestIf,
_)) :-
value_comparison(Comparison, Variable, Value),
body_with_no_break(Body, NoBreakBody),
switch_if_cases(Variable, Rest, RestIf).

switch_if_cases(Variable,
[js_default( Body, _)],
js_block(NoBreakBody, _)) :-
body_with_no_break(Body, NoBreakBody).

value_comparison(js_binary_operation(
equals_op,
js_identifier(Variable, _),
Value,
_),
Variable, Value).

body_with_no_break([js_break(_)], []).
body_with_no_break([Stat|Rest1], [Stat|Rest2]) :-
body_with_no_break(Rest1, Rest2)

Here's an example of using the switch_if predicate to convert a small code snippet. The input contains a switch statement. The result is unified in the IfString variable.

?- parse_js_stat_string("switch(w) {
case 1:
print('first');
break;
case 2:
print('second');
break;
default:
print('Nothing');
break;
}", SwitchAst), switch_if(SwitchAst, IfAst), print_js_ast_to_string(IfAst, IfString), writef(IfString).
if (w == 1) {
print('first', );
} else if (w == 2) {
print('second', );
} else {
print('Nothing', );
}
SwitchAst = js_switch(...),
IfString = "if (w == 1) {\n print('first', );\n} else if (w == 2) {\n print('second', );\n} else {\n print('Nothing', );\n}"

The parse_js_stat_string predicate parses a string containing a statement. On success this predicate generates a simple AST representation of the input element as a Prolog term. For example:

?- parse_js_stat_string("switch(w) {
case 1:
print('first');
break;
case 2:
print('second');
break;
default:
print('Nothing');
break;
}", SwitchAst).
SwitchAst = js_switch(js_identifier(, lex_info(1, [])), [js_case(js_literal(number, , lex_info(2, [ws(19, false)])), [js_expr_stat(js_call(js_identifier([112, 114|...], lex_info(3, [ws(..., ...)])), js_arguments([js_literal(string, [...|...], lex_info(..., ...))], lex_info(3, [], 3, [])), null)), js_break(lex_info(4, [ws(43, true)]))], lex_info(2, [ws(11, true)])), js_case(js_literal(number, , lex_info(5, [ws(63, false)])), [js_expr_stat(js_call(js_identifier([112|...], lex_info(6, [...])), js_arguments([js_literal(..., ..., ...)], lex_info(6, [], 6, [])), null)), js_break(lex_info(7, [ws(..., ...)]))], lex_info(5, [ws(55, true)])), js_default([js_expr_stat(js_call(js_identifier([...|...], lex_info(..., ...)), js_arguments([...], lex_info(..., ..., ..., ...)), null)), js_break(lex_info(10, [...]))], lex_info(8, [ws(100, true)]))], lex_info(1, []))

Going the other way

An awesome surprise it's that the switch_if predicate can be used in "the order direction" without any modification. That is, we can specify an if AST and get a switch version of it. Of course this can only be done if the conversion makes sense. For example:

?- parse_js_stat_string("
if (x == 10) {
print('10 val');
} else if (x == 20) {
print('20 val');
} else {
print('None');
}", IfAst), switch_if(SwitchAst, IfAst), print_js_ast_to_string(SwitchAst, SwitchString), writef(SwitchString).
switch(x) {
case 10:
print('10 val', );
break;
case 20:
print('20 val', );
break;
default:
print('None', );
break;
}
IfAst = js_if(js_binary_operation(equals_op,...),
SwitchAst = js_switch(...),
SwitchString = "switch(x) {\n case 10:\n  print('10 val', );\n  break;\n case 20:\n  print('20 val', );\n  break;\n default:\n   print('None', );\n  break;\n}\n"

Several alternatives for converting between switch and if

One of the most intriguing characteristics of Prolog is the posibility of giving different valid results for a given goal.

For example we can give another posibility to the value_comparison predicate:

value_comparison(js_binary_operation(
equals_op,
js_identifier(Variable, _),
Value,
_),
Variable, Value).
value_comparison(js_binary_operation(
equals_op,
Value,
js_identifier(Variable, _),
_),
Variable, Value).

By doing this what we are saying is that either x == 1 and 1 == x are valid possibilities for the condition of the if statement replacing a case section.

If we do this we can get more alternatives:

?- quick_switch_if_test("switch(w) {
case 1:
print('first');
break;
case 2:
print('second');
break;
default:
print('Nothing');
break;
}").
if (w == 1) {
print('first', );
} else if (w == 2) {
print('second', );
} else {
print('Nothing', );
}
true ;
if (w == 1) {
print('first', );
} else if (2 == w) {
print('second', );
} else {
print('Nothing', );
}
true ;
if (1 == w) {
print('first', );
} else if (w == 2) {
print('second', );
} else {
print('Nothing', );
}
true ;
if (1 == w) {
print('first', );
} else if (2 == w) {
print('second', );
} else {
print('Nothing', );
}
true ;
false.

Code for this post can be found here.

Saturday, December 29, 2018

Using Racklog for parsing

This post shows a small experiment of creating parsing predicates using the Racket's Racklog package . This package enables the use of logic programming features inside Racket.

Logic programming languages, like Prolog, use the Definite clause grammars syntax for this purpose. In this post this technique is not directly used, the goal is to express the grammar in the same way the DCG syntax is expanded. A nice experiment for a future post is to create macros for hiding state change variables.

As always the goal is to experiment with a concept without worrying too much about performance!

1.1 How it looks

Here's an example of a predicate for parsing an if statement for a fictional language.

(define %if-stat
(%rel (ctxt new-ctxt)
[(ctxt new-ctxt)
(%let (ifkw-ctxt lpar-ctxt rpar-ctxt
cond-ctxt then-ctxt
stats-ctxt else-ctxt end-ctxt)
(%and
((%literal-id "if")   ctxt ifkw-ctxt)
(%lpar                ifkw-ctxt lpar-ctxt)
(%expression          lpar-ctxt cond-ctxt)
(%rpar                cond-ctxt rpar-ctxt)
((%literal-id "then") rpar-ctxt then-ctxt)
(%stat                then-ctxt stats-ctxt)
((%opt %else-block)   stats-ctxt else-ctxt)
(%is new-ctxt (with-result
`(-if ,(parse-context-result cond-ctxt)
,(parse-context-result stats-ctxt)
,(parse-context-result else-ctxt))
else-ctxt))))]))

Running this predicate from the REPL using a string with sample code produces the following output:

racklogexperiments.rkt> (parse-context-result
(cdar
(%which (result-ctxt)
(%if-stat (string-context "if (c) then
foo(c);
else
moo(d);")
result-ctxt))))
'(-if
(-id "c")
(-call-stat (-call "foo" ((-id "c"))))
(-else (-call-stat (-call "moo" ((-id "d"))))))
racklogexperiments.rkt>

1.2 Creating parsing predicates

An important piece that we need to define our parsers it's the "parsing context". This element is represented as as simple structure which keeps track of :

1. The result or output of the last parser.
2. The text used as input for the parser.
3. The current position inside the string as a zero-based index.
(struct
parse-context
(result text position))

(define (string-context str)
(parse-context '() str 0))

(define (with-result value ctxt)
(struct-copy parse-context ctxt [result value]))

The way we define parsers using Racklog predicates is that we transform an input parsing context to new context. The new context is going to have a the result of the last parsing operation and it will move the position as many characters as consumed by the last parsing operation.

We use a couple of parsers as the foundation for all must all other parsers. The first one, %a-char has the next available character as the result.

(define %a-char
(%rel (a-char ctxt new-ctxt)
[(a-char ctxt new-ctxt)
(%is #t (has-more-chars? ctxt))
(%is new-ctxt (get-char-from-context ctxt))
(%is a-char (parse-context-result new-ctxt))
]))

The code for the utility functions used in these parsers is the following:

(define (get-char-from-context ctxt)
(struct-copy
parse-context
ctxt
[result (string-ref (parse-context-text ctxt) (parse-context-position ctxt))]
[position (+ 1 (parse-context-position ctxt))]))

(define (has-more-chars? ctx)
(<
(parse-context-position ctx)
(string-length (parse-context-text ctx))))

The get-char-from-context function creates a new context with the current character as result and advances the context to the next position. We use Racklog unification and %and to chain together the parsing contexts. The following interaction illustrates this:

racklogexperiments.rkt> (define result (%which (c1 result-ctxt)
(%and (%a-char #\a (string-context "abc") c1)
(%a-char #\b c1 result-ctxt))))
racklogexperiments.rkt> result
'((c1 . #<parse-context>) (result-ctxt . #<parse-context>))
racklogexperiments.rkt> (assoc 'result-ctxt result)
'(result-ctxt . #<parse-context>)
racklogexperiments.rkt> (parse-context-result (cdr (assoc 'result-ctxt result)))
#\b
racklogexperiments.rkt> (parse-context-position (cdr (assoc 'result-ctxt result)))
2
racklogexperiments.rkt>

Here we create a very simple parser that recognizes the sequence: "ab" . As presented above, the position of the resulting parsing context is 2 which is the zero based position inside the string after 'b'.

1.3 Sequences

A very useful parser is one that let's you apply another parser zero or more times. The parser that archives this is the following:

(define %p-simple-sequence
(λ (pred)
(%rel (ctxt new-ctxt)
[(ctxt new-ctxt)
(%let (tmp-ctxt tmp-result)
(%or
(%and
(pred ctxt tmp-ctxt)
(%is tmp-result (with-result
(cons (parse-context-result tmp-ctxt)
(parse-context-result ctxt))
tmp-ctxt))
((%p-simple-sequence pred) tmp-result new-ctxt))
(%is new-ctxt ctxt)))])))

(define %p-sequence
(λ (pred)
(%rel ( ctxt new-ctxt)
[(ctxt new-ctxt)
(%let (tmp-ctxt)
(%and
(%is tmp-ctxt (with-result '() ctxt))
((%p-simple-sequence pred)
tmp-ctxt new-ctxt)
))]
)))

We can apply this parser as follows:

racklogexperiments.rkt> (parse-context-result
(cdar (%which (result-ctxt)
((%p-sequence
(%rel (c r) [(c r) (%a-char #\a c r)]))
(string-context "aaaa") result-ctxt)))  )
'(#\a #\a #\a #\a)
racklogexperiments.rkt>

1.4 Optionals

Another useful element we need is a way to parse optional elements. We used this in our if example above for the else section.

To implement this we use %or to try to parse the optional parser first or succeed with an empty result. Using this technique will enable multiple solutions (see an example of this below).

(define %opt
(λ (parser)
(%rel (ctxt new-ctxt)
[(ctxt new-ctxt)
(%let (tmp-ctxt)
(%and
(%or (parser ctxt new-ctxt)
(%is new-ctxt (with-result '() ctxt)))))])))

1.5 Multiple possible ASTs

One interesting possibility of using a Racklog (or Prolog's DCGs) is that you can get many possible interpretations of the grammar. Although it may not be of practical use it looks rather interesting.

An example of these shows up when parsing an if statement with a https://en.wikipedia.org/wiki/Dangling_else .

(define sample-code
"if (x) then if (y) then foo(); else goo();")

Here there are two possible valid interpretations of this if statement. The default one:

racklogexperiments.rkt> (parse-context-result
(cdar
(%which (result-ctxt)
(%if-stat (string-context sample-code)
result-ctxt))))

'(-if
(-id "x")
(-if
(-id "y")
(-call-stat (-call "foo" ()))
(-else (-call-stat (-call "goo" ()))))
())
racklogexperiments.rkt>

Here's an alternative visualization of this tree:

We can now ask Racklog for another solution using the %more function. See the result here:

racklogexperiments.rkt> (parse-context-result (cdar (%more)))
'(-if
(-id "x")
(-if (-id "y") (-call-stat (-call "foo" ())) ())
(-else (-call-stat (-call "goo" ()))))
racklogexperiments.rkt>

Here's the other alternative visualization:

1.6 Code

The code for this post can be found here.