Tuesday, January 31, 2012

A quick look at Haskell records

While reading the Parsec documentation I learned about a nice technique used to create language definitions. For example:
idSymbol = oneOf ":!$%&*+/<=>?@\\^|-~"

schemeLanguageDef :: LanguageDef st
schemeLanguageDef = emptyDef { 
                   commentLine = ";;"
                   , identStart = letter <|> idSymbol
                   , identLetter = alphaNum <|> idSymbol
                   , opStart = parserZero
                   , opLetter = parserZero
                 }

In this case we are creating a small language definition based on emptyDef. We're saying that we want all what emptyDef has but changing the values of commentLine, identStart, identLetter, opStart and opLetter.

The Text.Parsec.Language package contains several predefined language definitions which you can use to create yours. Some of these definitions are based on each other, for example the haskell98Def and haskellStyle. This is accomplished using Haskell record syntax for defining data types.

Records in Haskell allow the creation of abstract data type definitions that contain named fields . For example, here's a small definition of a data type to store the information of an editor color theme:
data Theme = ColorTheme {
                     keywordsColor :: Color,
                     backgroundColor :: Color,
                     fontSize :: Int,
                     operatorsColor :: Color,
                     literalsColor :: Color
                }
        deriving Show
An instance of this record can be created as follows:
   baseTheme = ColorTheme { keywordsColor = Black,
                            backgroundColor = White,
                            fontSize = 10,
                            operatorsColor = Black,
                            literalsColor = Black }

One nice feature of Haskell records is that it allows the creation of a new record instance based on a existing one. Back to our artificial example, say that we want to create a dark theme which is the same as the base but with colors inverted.
   darkTheme = baseTheme {
                           keywordsColor = White,
                           backgroundColor = Black,
                           operatorsColor = White,
                           literalsColor = White
                         } 
Here we're saying that we want a copy of baseTheme with keywordsColor, backgroundColor, operatorsColor and literalsColor changed to another value.

Pattern matching can be used to extract parts of the record. For example, say that we want to define a function to increase the current font size of a theme:
  increaseFontSize :: Int -> Theme -> Theme
  increaseFontSize amount theme@ColorTheme { fontSize = currentFontSize} =
        theme { fontSize = currentFontSize + amount } 

We can use this definition :

*Tests> increaseFontSize 5 baseTheme
ColorTheme {keywordsColor = Black, backgroundColor = White, fontSize = 15, operatorsColor = Black, literalsColor = Black}
Also accesor functions are defined for each part of the record. For example:
*Tests> backgroundColor baseTheme
White

For future posts I'm going to explore similar features that exist in other programming languages.

Monday, January 23, 2012

Creating a small parser using Parsec

In this post I'm going to show a small parser for a Scheme-like language written in Haskell using the Parsec parsing combinator library.

We're going to start by defining an AST for Scheme-like expressions:

data Expr = ScSymbol String
            | ScString String
            | ScNumber Integer
            | ScDouble Double 
            | ScCons Expr Expr
            | ScNil
            | ScBool Bool
            | ScQuote Expr
     deriving Show

For this experiment I'm going to use the Text.Parsec.Token and Text.Parsec.Language packages which have useful functions for creating language parsers.

idSymbol = oneOf ":!$%&*+/<=>?@\\^|-~"

schemeLanguageDef :: LanguageDef st
schemeLanguageDef = emptyDef { 
                   commentLine = ";;"
                   , identStart = letter <|> idSymbol
                   , identLetter = alphaNum <|> idSymbol
                   , opStart = parserZero
                   , opLetter = parserZero
                 }

schemeTokenParser = makeTokenParser schemeLanguageDef

TokenParser {
   identifier = idParser,
   reservedOp = opParser,
   stringLiteral = stringLiteralParser,
   parens = parParser,
   lexeme = lexParser,
   naturalOrFloat = naturalOrFloatParser
} = schemeTokenParser


Given these definitions we get parsers for identifiers, numbers, string literals "for free".

Boolean literals are parsed using the following parser:

boolLiteral = lexParser (
                 do 
                   char '#'
                   val <- (char 't') <|> (char 'f')
                   return $ ScBool $ val == 't'
             )


Basic quotations are parsed as follows:

quoteParser = lexParser (
              do 
                char '\''
                val <- expressionParser
                return $ ScQuote val
          )


Atomic expressions are parsed as follows:

atomParser =
   (do 
      id <- idParser
      return $ ScSymbol id)
   <|>
    (do 
       fnumber <- naturalOrFloatParser
       return $ case fnumber of
                Left num -> ScNumber num
                Right num -> ScDouble num)
   <|>        
    (do str <- stringLiteralParser
        return $ ScString str)
   <|> boolLiteral
   <|> quoteParser



Next we need to consider S-Expressions. This kind of expressions are combinations of the above elements surrounded by parentheses.

For example:
(+ 1 2 3)

dottedSuffixParser =     
   do 
      dotParser
      finalExpr <- expressionParser
      return finalExpr

parExpressionParser = 
   do (exprs, last) <- parParser 
                (do 
                   seq <- many expressionParser
                   dottedSuffix <- optionMaybe dottedSuffixParser 
                   return (case dottedSuffix of
                            Just lastExpr -> (seq, lastExpr)
                            Nothing -> (seq, ScNil)))
      return $ foldr ScCons last exprs

As shown above, the parser need to verify if there was a dot ('.') before the last element for example:
(1 . 2)
More details on the dot syntax can be found here.
Finally we define expressions:

       expressionParser =
            atomParser <|> parExpressionParser


       parseIt input = parse expressionParser "" input


We can use this function to parse this expressions as follows:

*SCParser> parseIt "(- x 1)"
Right (ScCons (ScSymbol "-") (ScCons (ScSymbol "x") (ScCons (ScNumber 1) ScNil)))


There's a great a series of articles on implementing a Scheme interpreter in Haskell called "Write Yourself a Scheme in 48 Hours/Parsing" which is a great source of information on this topic.

Monday, January 2, 2012

A call-with-current-continuation example

The call-with-current-continuation (or call/cc for short) function in Scheme allows the creation of powerful control structures. It basically allows you to capture the current state of a computation as a first class value. This value could be assigned to a variable and invoked later.

For a great introduction to this function see the call/cc entry in the Community-Scheme-Wiki .

The following toy example shows how this function is used to do an incremental search inside a tree. Say that you have the following tree structure definition:


(define tree-data
'(html
(head (title "test"))
(body
(p "foo" (i "goo"))
(ol
(li "xoo" (i "moo"))
(li (i "loo")))
(p))))


We want to define a function to search for elements identified by a predicate. For example, if looking for a "i" element we can write:


(define (is-italic? element)
(and (list? element)
(not (null? element))
(eq? 'i (car element))))


We want to create a function to perform an incremental search inside a tree structure. That is, we want to find the first element and then request the next element if needed.

Here's a demo of the function we want to create:

> (define hit (search is-italic? tree-data))

> hit
((i "goo") #<continuation>)
> (set! hit (get-next-result hit))
> hit
((i "moo") #<continuation>)
> (set! hit (get-next-result hit))
> hit
((i "loo") #<continuation>)
> (set! hit (get-next-result hit))
> hit
(() #<continuation>)


This function is defined using call-with-current-continuation as follows:

(define (search pred? tree)
(call-with-current-continuation
(lambda (c)
(searching pred? tree c))))

(define (searching pred? tree return)
(set! return
(call-with-current-continuation
(lambda (continuation)
(if (pred? tree)
(return (list tree continuation))
return))))
(if (list? tree)
(for-each
(lambda (item)
(set! return (cadr (searching pred? item return))))
tree))
(list '() return))



The entry point is the search function which captures the current continuation in the return variable. This continuation value allows us to jump to the point where call-with-current-continuation was called. We pass this value to the searching function which contains the logic to search inside the tree.

The first expression of the searching function seems confusing since it contains another call to call-with-current-continuation and an assignment of return. This code is used to do the following things:
  1. Return a search result (remember that return contains a continuation)
  2. Capture the current state of the tree traversal (this is done by the call-with-current-continuation call)



...
(set! return
(call-with-current-continuation
(lambda (continuation)
(if (pred? tree)
(return (list tree continuation))
return))))
...


Notice that a search result contains two things:

  1. The tree fragment that makes the predicate true
  2. The continuation of the point where the element was found (which can be used to restore the traversal)



One key thing to understand is that we change the value of return to the result of the call/cc call. This is done this way because we will change the return continuation to retrieve new results. This is done using the get-next-result function as follows:


(define (get-next-result result)
(if (not (null? (car result)))
(call-with-current-continuation (cadr result))
result))


Calling call-with-current-continuation with the last captured continuation will result on setting it to the return variable in searching.