Wednesday, March 26, 2008

Pattern matching with Tom

In this post I'm going to show a small overview to the Tom pattern matching compiler for Java.

Tom

Tom is an extension to the Java language which adds a constructs for describing data structures and powerful pattern matching features for manipulating those data structures.

Tom distribution includes a command line compiler and a Eclipse plugin. Also nice documentation is found and examples are found in its website.

The documentation available from the website includes a nice step by step introduction to the features available in the product. Here I'm going to show a little overview of some of them.

All the snippets for this post were created using the Eclipse plugin.

Data structure definition

Tom provides a way for data structures that could be easily instantiated and manipulated with pattern matching. These data types are also called algebraic data types which are very similar to those available in languages such as Haskell.

These data types could be defined inside a Java class by using the %gom { } block or defining the types in a file with .gom extension. Instructions on how to do this are available in the Separating Gom from Tom section of the documentation.

For the rest of this post we're going to use the following definitions to show examples with Tom.


module Company
imports int String
abstract syntax

Person = Person(companyId:int,name:String,age:int)

Persons = Persons(Person*)

Worker = Employee(personId:int)
| Contractor(name:String)

Workers = Workers(Worker*)

Group = DepartmentGroup(department:Department)
| Committee(name:String, members:Workers)

Department = SimpleDepartment(name:String, leader:Worker, members:Workers)
| MultiDepartment(name:String, leader:Worker, subDepartments:Departments)

Departments = Departments(Department*)

Groups = Groups(Group*)

Company = Company(name:String,members:Persons,groups:Groups)


Reading for the bottom to the top, these definitions describe a Company data structure which has name, a collection of persons and a collection of groups inside the company. Groups could be Departments or a Committees. Departments could be simple (SimpleDepartment) or have sub departments(MultipleDepartment). Also the members of the Committees or Departments could be employees or contractors.

In order to use these definitions we need to create a file with .t extension. This file will have both Java and Tom elements mixed together. For example here's the creation of an instance of the data structures defined above in Main.t :

   
import langexplr.tomtests.company.types.*;

public class Main{
%include{ int.tom }
%include{ company/Company.tom }

static Company createCompanyData() {
return `Company("MyCompany",
Persons(Person(1,"John",43),
Person(2,"Kim",33),
Person(3,"Maria",27),
Person(4,"Luis",34),
Person(5,"Mary",34),
Person(6,"Bill",24)),
Groups(
DepartmentGroup(
SimpleDepartment(
"HR",
Employee(2),
Workers(Employee(1),
Employee(6),
Contractor("Lisa")))),
DepartmentGroup(
SimpleDepartment(
"Finance",
Employee(5),
Workers(Contractor("William")))),
DepartmentGroup(
MultiDepartment(
"Systems",
Employee(5),
Departments(
SimpleDepartment(
"Development",
Employee(3),
Workers(Contractor("Charlie"))),
SimpleDepartment(
"QA",
Employee(4),
Workers(Contractor("Lily")))))),
Committee("Party",Workers(Employee(2),Employee(5)))
));
}
...


Notice the backquote character used at the beginning of the call to the Company constructor. This element is used to differentiate the Java and Tom syntax elements.

Matching

Tom provides a powerful pattern matching mechanism that eases the manipulation of complex tree structures. In order to use this feature, a %match statement is provided. This statement is barely similar to a switch statement in Java or C# but using patterns instead of literals. A simple example of this feature is the following:


static void nameTest(Company c) {
String name = "";
%match (c){
Company(theName,_,_) -> { name = `theName; }
}
System.out.println(name);
}


In this example the name of the company is extracted and assigned to the name variable. The %match statement receives an element, in this case the instance of the Company datatype. This element is tested against one or more patterns, in this case only one pattern is provided. If the pattern matches, then the code on the left of the arrow ( -> ) is executed.

The pattern Company(theName,_,_) says, that we're expecting an instance of the Company datatype and that the name( the first argument of the constructor) will be assigned to the theName variable. It also says that we will ignore the other two elements of the Company datatype. The variable theName could be used at the left side of this entry by using a backquote character at the beginning of the name.

List matching

One of the most interesting pattern matching elements provided by Tom is list matching which allows the creation of patterns on sequences of elements.

For example, say that we want to create a function that retrieves the name of an Person given its id and an instance of the Company. As a first version of this function, we can write:


static String getPersonNameByIdV1(int id,Company c) {
String name = null;
%match (c){
Company(_,Persons(_*,Person(empId,empName,_),_*),_) -> {
if (id == `empId) {
name = `empName;
}
}
}
return name;
}


It is important to remember that in the definitions of the data structures, the Persons element is defined as Persons = Persons(Person*) which means that its constructor receives a any number of Person instances.

Now the Company(_,Persons(_*,Person(empId,empName,_),_*),_) pattern says: match a person inside a company and assign empId and empName to its id and name .

Given that "_*" is a wildcard, notice that there are many ways to match this pattern. As explained in the List matching section of the documentation, since there's several ways to match this pattern, the block will be executed with every possible match. This means that empId and empName will be bound to the id and the name of every person. Given this we can ask if (id == `empId) { ... } and assign the name that we found.

Non-linear patterns

Another powerful pattern matching element that Tom provides is the ability to use one variable several times in the same pattern.The first use will bind the variable to a value, the other uses will compare the current element with the bound value. This feature, called Non-linear patterns, introduced here. It allows the creation of more expressive patterns, for example we can rewrite the previous method the following way:


static String getPersonNameByIdV2(int id,Company c) {
String name = null;
%match (int id,c){
empId,Company(_,Persons(_*,Person(empId,empName,_),_*),_) -> {
name = `empName;
}
}
return name;
}


Here we specify two arguments to the %match construct. The first is the requested id and the second is the Company instance. Notice that in the pattern we match the id with the empId free variable the first time, and then we use it to match the requested person. Now we're putting a restriction on the Person we're looking for, so we can remove the if statement from the body.


A final example

For the final example I'll write a function that prints the names of all the members of a given department . Here's the code:


static void printNameOfPeopleWorkingInDepartment(String departmentName,Company c) {
System.out.println("People working in "+departmentName);
%match(String departmentName,c) {
deptName,
Company(_,
persons,
Groups(_*,
DepartmentGroup(
SimpleDepartment(
deptName,
_,
Workers(_*,worker,_*))),
_*)) -> {
%match(worker) {
Employee(id) -> { System.out.println(getEmployeeNameById(`id,c));}
Contractor(name) -> { System.out.println(`name);}
}
}
}
}


This pattern will first look for a SimpleDepartment with the specified name. Then for each worker it will extract the name given if it is an Employee or a Contractor.

MultiDepartment instances are not considered. In future posts another Tom feature will be used to deal with these elements.

Code for this post can be found here.

Tom provides lots of features not covered here. Its distribution includes documentation and a lot of examples.