This post shows a small experiment of creating parsing predicates using the Racket's Racklog package . This package enables the use of logic programming features inside Racket.
Logic programming languages, like Prolog, use the Definite clause grammars syntax for this purpose. In this post this technique is not directly used, the goal is to express the grammar in the same way the DCG syntax is expanded. A nice experiment for a future post is to create macros for hiding state change variables.
As always the goal is to experiment with a concept without worrying too much about performance!
1.1 How it looks
Here's an example of a predicate for parsing an if
statement for a fictional language.
(define %if-stat (%rel (ctxt new-ctxt) [(ctxt new-ctxt) (%let (ifkw-ctxt lpar-ctxt rpar-ctxt cond-ctxt then-ctxt stats-ctxt else-ctxt end-ctxt) (%and ((%literal-id "if") ctxt ifkw-ctxt) (%lpar ifkw-ctxt lpar-ctxt) (%expression lpar-ctxt cond-ctxt) (%rpar cond-ctxt rpar-ctxt) ((%literal-id "then") rpar-ctxt then-ctxt) (%stat then-ctxt stats-ctxt) ((%opt %else-block) stats-ctxt else-ctxt) (%is new-ctxt (with-result `(-if ,(parse-context-result cond-ctxt) ,(parse-context-result stats-ctxt) ,(parse-context-result else-ctxt)) else-ctxt))))]))
Running this predicate from the REPL
using a string with sample code produces the following output:
racklogexperiments.rkt> (parse-context-result (cdar (%which (result-ctxt) (%if-stat (string-context "if (c) then foo(c); else moo(d);") result-ctxt)))) '(-if (-id "c") (-call-stat (-call "foo" ((-id "c")))) (-else (-call-stat (-call "moo" ((-id "d")))))) racklogexperiments.rkt>
1.2 Creating parsing predicates
An important piece that we need to define our parsers it's the "parsing context". This element is represented as as simple structure which keeps track of :
- The result or
output
of the last parser. - The text used as input for the parser.
- The current position inside the string as a zero-based index.
(struct parse-context (result text position)) (define (string-context str) (parse-context '() str 0)) (define (with-result value ctxt) (struct-copy parse-context ctxt [result value]))
The way we define parsers using Racklog
predicates is that we transform an input parsing context to new context. The new context is going to have a the result of the last parsing operation and it will move the position as many characters as consumed by the last parsing operation.
We use a couple of parsers as the foundation for all must all other parsers. The first one, %a-char
has the next available character as the result.
(define %a-char (%rel (a-char ctxt new-ctxt) [(a-char ctxt new-ctxt) (%is #t (has-more-chars? ctxt)) (%is new-ctxt (get-char-from-context ctxt)) (%is a-char (parse-context-result new-ctxt)) ]))
The code for the utility functions used in these parsers is the following:
(define (get-char-from-context ctxt) (struct-copy parse-context ctxt [result (string-ref (parse-context-text ctxt) (parse-context-position ctxt))] [position (+ 1 (parse-context-position ctxt))])) (define (has-more-chars? ctx) (< (parse-context-position ctx) (string-length (parse-context-text ctx))))
The get-char-from-context
function creates a new context with the current character as result and advances the context to the next position. We use Racklog unification and %and
to chain together the parsing contexts. The following interaction illustrates this:
racklogexperiments.rkt> (define result (%which (c1 result-ctxt) (%and (%a-char #\a (string-context "abc") c1) (%a-char #\b c1 result-ctxt)))) racklogexperiments.rkt> result '((c1 . #<parse-context>) (result-ctxt . #<parse-context>)) racklogexperiments.rkt> (assoc 'result-ctxt result) '(result-ctxt . #<parse-context>) racklogexperiments.rkt> (parse-context-result (cdr (assoc 'result-ctxt result))) #\b racklogexperiments.rkt> (parse-context-position (cdr (assoc 'result-ctxt result))) 2 racklogexperiments.rkt>
Here we create a very simple parser that recognizes the sequence: "ab" . As presented above, the position of the resulting parsing context is 2
which is the zero based position inside the string after 'b'.
1.3 Sequences
A very useful parser is one that let's you apply another parser zero or more times. The parser that archives this is the following:
(define %p-simple-sequence (λ (pred) (%rel (ctxt new-ctxt) [(ctxt new-ctxt) (%let (tmp-ctxt tmp-result) (%or (%and (pred ctxt tmp-ctxt) (%is tmp-result (with-result (cons (parse-context-result tmp-ctxt) (parse-context-result ctxt)) tmp-ctxt)) ((%p-simple-sequence pred) tmp-result new-ctxt)) (%is new-ctxt ctxt)))]))) (define %p-sequence (λ (pred) (%rel ( ctxt new-ctxt) [(ctxt new-ctxt) (%let (tmp-ctxt) (%and (%is tmp-ctxt (with-result '() ctxt)) ((%p-simple-sequence pred) tmp-ctxt new-ctxt) ))] )))
We can apply this parser as follows:
racklogexperiments.rkt> (parse-context-result (cdar (%which (result-ctxt) ((%p-sequence (%rel (c r) [(c r) (%a-char #\a c r)])) (string-context "aaaa") result-ctxt))) ) '(#\a #\a #\a #\a) racklogexperiments.rkt>
1.4 Optionals
Another useful element we need is a way to parse optional elements. We used this in our if
example above for the else
section.
To implement this we use %or
to try to parse the optional parser first or succeed with an empty result. Using this technique will enable multiple solutions (see an example of this below).
(define %opt (λ (parser) (%rel (ctxt new-ctxt) [(ctxt new-ctxt) (%let (tmp-ctxt) (%and (%or (parser ctxt new-ctxt) (%is new-ctxt (with-result '() ctxt)))))])))
1.5 Multiple possible ASTs
One interesting possibility of using a Racklog
(or Prolog's DCGs) is that you can get many possible interpretations of the grammar. Although it may not be of practical use it looks rather interesting.
An example of these shows up when parsing an if
statement with a https://en.wikipedia.org/wiki/Dangling_else .
(define sample-code "if (x) then if (y) then foo(); else goo();")
Here there are two possible valid interpretations of this if
statement. The default one:
racklogexperiments.rkt> (parse-context-result (cdar (%which (result-ctxt) (%if-stat (string-context sample-code) result-ctxt)))) '(-if (-id "x") (-if (-id "y") (-call-stat (-call "foo" ())) (-else (-call-stat (-call "goo" ())))) ()) racklogexperiments.rkt>
Here's an alternative visualization of this tree:
We can now ask Racklog
for another solution using the %more
function. See the result here:
racklogexperiments.rkt> (parse-context-result (cdar (%more))) '(-if (-id "x") (-if (-id "y") (-call-stat (-call "foo" ())) ()) (-else (-call-stat (-call "goo" ())))) racklogexperiments.rkt>
Here's the other alternative visualization:
1.6 Code
The code for this post can be found here.