Monday, November 23, 2009

Adding automatic semicolon insertion to a Javascript parser

A couple of weeks ago I wrote a blog post about a Javascript parser written using the Newspeak parsing combinators. As mentioned in that post, no semicolon insertion was supported. This post shows how the feature was added.

Automatic semicolon insertion



As detailed in section 7.9 of the ECMA 262 document[PDF], in Javascript you can use newline as statement separator in some scenarios. For example a semicolon is "implicitly inserted" if expression-statements are separated by line terminators:


if (condition) {
print("A")
print("B")
}


This code snippet is equivalent to:



if (condition) {
print("A");
print("B");
}


Solution



In the a original post about the parser, espin pointed me out to a paper[PDF] by A. Warth that mentions how the semicolon insertion problem was solved in a Javascript parser written in OMeta. The solution presented is this post is based on the one from the paper.


I wanted to isolate the code that performs this function. So in order to add this functionality I created a subclass that overrides the productions that get involved in this process. This way we can have both a parser with and without the feature. Here's the code:


class JSGrammarWithSemicolonInsertion = JSGrammar (
"Parser features that add automatic semicolon insertion"
|

specialStatementTermination = ((( cr | lf ) not & whitespace ) star,
(semicolon | comment | lf | cr | (peek: $})) )
wrapper: [ :ws :terminator | | t | t:: Token new. t token: $;. t].

returnStatement = return, (specialStatementTermination |
(expression , specialStatementTermination)).

breakStatement = break, (specialStatementTermination |
(identifier , specialStatementTermination)).

continueStatement = continue, (specialStatementTermination |
(identifier , specialStatementTermination)).

whitespaceNoEOL = (( cr | lf ) not & whitespace ) star,
(((peek: (Character cr)) | (peek: (Character lf))) not) .

throwStatement = throw, whitespaceNoEOL , expression , specialStatementTermination.

expressionStatement = (((function | leftbrace) not) & expression), specialStatementTermination.

variableStatement = var, variableDeclarationList, specialStatementTermination.
|
)


The result of parsing the following code:


var x = 0
while (true) {
x++
document.write(x)
if ( x > 10)
break
else continue
}


... is presented using the utility created for the previous post:



Code for this post is available here.