The grammar looks like this:
class JSGrammar = ExecutableGrammar (
"Experiment for JS grammar based on the description from http://www.ecma-international.org/publications/standards/Ecma-262.htm"
|
doubleQuote = (char: $").
backslash = (char: $\).
str = doubleQuote,((backslash, ( char: $" )) |
(backslash, ( char: $/ )) |
(backslash, backslash) |
(backslash, ( char: $r )) |
(backslash, ( char: $n )) |
(backslash, ( char: $t )) |
(charExceptFor: $")) star, doubleQuote.
string = tokenFor: str.
tilde = char: $~.
exclamation = char: $!.
starChar = char: $*.
slash = char: $/.
modulo = char: $%.
pipe = char: $|.
amp = char: $&.
cir = char: $^.
question = char: $?.
colon = char: $:.
semicolon = char: $;.
negSign = (char: $-).
plusSign = (char: $+).
digit = (charBetween: $0 and: $9).
dot = (char: $. ) .
lt = char: $<.
gt = char: $>.
eq = char: $=.
num = negSign opt, digit, digit star, dot opt,digit star, ((char: $e) | (char: $E)) opt, (plusSign | negSign) opt,digit star.
number = tokenFor: num.
tQuestion = tokenFor: question.
tColon = tokenFor: colon.
tplusSign = tokenFor: plusSign.
tnegSign = tokenFor: negSign.
tmodulo = tokenFor: modulo.
tslash = tokenFor: slash.
tstarChar = tokenFor: starChar.
texclamation = tokenFor:exclamation.
tdot = tokenFor:dot.
tLt = tokenFor: lt.
tGt = tokenFor: gt.
tEq = tokenFor: eq.
tAmp = tokenFor: amp.
tPipe = tokenFor: pipe.
tCir = tokenFor: cir.
tSlash = tokenFor: slash.
tSemicolon = tokenFor: semicolon.
tStarEq = tstarChar,eq.
tModEq = tmodulo,eq.
tSlashEq = tSlash,eq.
tPlusEq = tplusSign,eq.
tMinusEq = tnegSign,eq.
tAmpAmp = tAmp,amp.
tPipePipe = tPipe,pipe.
tLtEq = tLt,eq.
tGtEq = tGt,eq.
tleftShift = tLt,lt.
trightShift = tGt,gt.
tsRightShift = tGt,gt,gt.
tEqEq = tEq,eq.
tEqEqEq = tEq,eq,eq.
tNotEq = texclamation,eq.
tNotEqEq = texclamation,eq,eq.
tleftShiftEq = tleftShift,eq.
trightShiftEq = trightShift,eq.
tsRightShiftEq = tsRightShift,eq.
tAmpEq = tAmp,eq.
tPipeEq = tPipe,eq.
tCirEq = tCir,eq.
lineTerminator = (char: (Character lf)) | (char: (Character cr)).
regularExpressionLiteral =
tslash,
( ((backslash, ( charExceptForCharIn: { (Character lf). (Character cr). })) |
(charExceptForCharIn: { (Character lf). (Character cr). $/.})) plus),
slash, (identifierStart star).
leftparen = tokenFromChar: $(.
rightparen =tokenFromChar: $).
leftbrace = tokenFromChar: ${.
rightbrace =tokenFromChar: $}.
comma = tokenFromChar: $,.
propertyName = string | identifier | number.
propertyNameAndValue = propertyName,tColon,value.
obj = leftbrace, (propertyNameAndValue starSeparatedBy: comma),rightbrace.
object = obj.
leftbracket = tokenFromChar: $[.
rightbracket = tokenFromChar: $].
arr = leftbracket, (value starSeparatedBy: comma), rightbracket.
array = tokenFor: arr.
comment = (slash,starChar,blockCommentBody,starChar,slash) | (slash,slash, lineCommentBody).
ttrue = tokenFromSymbol: #true.
tfalse = tokenFromSymbol: #false.
null = tokenFromSymbol: #null.
function = tokenFromSymbol: #function.
tnew = tokenFromSymbol: #new.
break = tokenFromSymbol: #break.
case = tokenFromSymbol: #case.
catch = tokenFromSymbol: #catch.
continue = tokenFromSymbol: #continue.
default = tokenFromSymbol: #default.
delete = tokenFromSymbol: #delete.
do = tokenFromSymbol: #do.
else = tokenFromSymbol: #else.
finally = tokenFromSymbol: #finally.
for = tokenFromSymbol: #for.
if = tokenFromSymbol: #if.
in = tokenFromSymbol: #in.
instanceof = tokenFromSymbol: #instanceof.
return = tokenFromSymbol: #return.
switch = tokenFromSymbol: #switch.
this = tokenFromSymbol: #this.
throw = tokenFromSymbol: #throw.
try = tokenFromSymbol: #try.
typeof = tokenFromSymbol: #typeof.
var = tokenFromSymbol: #var.
void = tokenFromSymbol: #void.
while = tokenFromSymbol: #while.
with = tokenFromSymbol: #with.
letter = (charBetween: $a and: $z) | (charBetween: $A and: $Z).
identifierStart = letter | (char: $$) | (char: $_).
identifier = accept: (tokenFor: (identifierStart), (identifierStart | digit) star) ifNotIn: keywords .
value = assignmentExpression .
literal = null | ttrue | tfalse | number | string | regularExpressionLiteral.
primaryexpression = this | literal | identifier | array | object | parenthesized.
parenthesized = leftparen,expression,rightparen.
functionexpression = function , identifier opt,
leftparen,formalParameterList , rightparen ,
leftbrace,sourceElements,rightbrace.
formalParameterList = identifier starSeparatedBy: comma.
memberexpression = (simplememberexpression ),
(( leftbracket, expression, rightbracket) | ( tdot, identifier)) star.
simplememberexpression = primaryexpression |
functionexpression |
simpleNewExpression.
simpleNewExpression = tnew,memberexpression, arguments.
callExpression = (simplememberexpression ),
( arguments |
( leftbracket, expression, rightbracket) |
( tdot, identifier) ) star.
simpleCallExpression = memberexpression , arguments.
arguments = leftparen ,
(assignmentExpression, (comma, assignmentExpression) star) opt,
rightparen.
newExpression = memberexpression | simpleNewMemberExpression.
simpleNewMemberExpression = tnew, memberexpression.
plusPlus = plusSign,plusSign.
minusMinus = negSign,negSign.
leftHandSideExpression = callExpression | newExpression .
postfixExpression = leftHandSideExpression ,
((plusPlus | minusMinus) star).
unaryExpression = postfixExpression | complexUnaryExpression.
complexUnaryExpression =
(typeof, unaryExpression) |
(delete, unaryExpression) |
(void, unaryExpression) |
(plusPlus, unaryExpression) |
(minusMinus, unaryExpression) |
((tokenFor: ( plusSign | negSign | tilde | exclamation )), unaryExpression).
multiplicativeExpression =
unaryExpression, ((tstarChar | tslash | tmodulo), unaryExpression) star.
additiveExpression =
multiplicativeExpression, ((tplusSign | tnegSign), multiplicativeExpression) star.
shiftExpression =
additiveExpression, ((tsRightShift | tleftShift | trightShift), additiveExpression) star.
relationalExpression =
shiftExpression, (( tLtEq | tGtEq | tLt | tGt | instanceof | in) , shiftExpression) star.
relationalExpressionNoIn =
shiftExpression, (( tLtEq | tGtEq | tLt | tGt | instanceof ) , shiftExpression) star.
equalityExpression =
relationalExpression, ((tEqEqEq | tEqEq | tNotEqEq | tNotEq ), relationalExpression) star.
equalityExpressionNoIn =
relationalExpressionNoIn, ((tEqEqEq | tEqEq | tNotEqEq | tNotEq ), relationalExpressionNoIn) star.
bitwiseANDExpression =
equalityExpression,(tAmp, equalityExpression) star.
bitwiseANDExpressionNoIn =
equalityExpressionNoIn,(tAmp, equalityExpressionNoIn) star.
bitwiseXORExpression =
bitwiseANDExpression,(tCir, bitwiseANDExpression) star.
bitwiseXORExpressionNoIn =
bitwiseANDExpressionNoIn,(tCir, bitwiseANDExpressionNoIn) star.
bitwiseORExpression =
bitwiseXORExpression,(tPipe, bitwiseXORExpression) star.
bitwiseORExpressionNoIn =
bitwiseXORExpressionNoIn,(tPipe, bitwiseXORExpressionNoIn) star.
logicalAndExpression =
bitwiseORExpression, (tAmpAmp,bitwiseORExpression) star.
logicalAndExpressionNoIn =
bitwiseORExpressionNoIn, (tAmpAmp,bitwiseORExpressionNoIn) star.
logicalOrExpression =
logicalAndExpression, (tPipePipe,logicalAndExpression) star.
logicalOrExpressionNoIn =
logicalAndExpressionNoIn, (tPipePipe,logicalAndExpressionNoIn) star.
assignmentOperator =
tEq | tStarEq | tSlashEq | tModEq | tPlusEq | tMinusEq |
tleftShiftEq | tsRightShiftEq | trightShiftEq | tAmpEq | tPipeEq |
tCirEq.
conditionalExpression =
logicalOrExpression, (tQuestion, assignmentExpression,tColon,assignmentExpression) opt.
assignmentExpression =
conditionalExpression, (assignmentOperator,conditionalExpression) star.
conditionalExpressionNoIn =
logicalOrExpressionNoIn, (tQuestion, assignmentExpressionNoIn,tColon,assignmentExpressionNoIn) opt.
assignmentExpressionNoIn =
conditionalExpressionNoIn, (assignmentOperator,conditionalExpressionNoIn) star.
expression = assignmentExpression, (comma , assignmentExpression) star.
expressionNoIn = assignmentExpressionNoIn, (comma , assignmentExpressionNoIn) star.
statement = block | variableStatement | emptyStatement | expressionStatement |
ifStatement | iterationStatement | withStatement | switchStatement |
labelledStatement | tryStatement | throwStatement |
breakStatement | returnStatement.
block = leftbrace, statementList , rightbrace.
statementList = statement star.
variableStatement = var, variableDeclarationList, tSemicolon.
variableDeclarationList = (variableDeclaration plusSeparatedBy: comma).
variableDeclaration = identifier, (tEq,assignmentExpression) opt.
variableDeclarationListNoIn = (variableDeclarationNoIn plusSeparatedBy: comma).
variableDeclarationNoIn = identifier, (tEq,assignmentExpressionNoIn) opt.
emptyStatement = tSemicolon.
expressionStatement = (((function | leftbrace) not) & expression), tSemicolon.
ifStatement = if, leftparen,expression,rightparen,statement,(else,statement) opt.
iterationStatement = doStatement | forStatement | forStatementNoVar |
whileStatement | forInStatement | forInStatementNoVar.
doStatement = do, statement, while, leftparen,expression,rightparen,tSemicolon.
forStatement = for, leftparen,
var, variableDeclarationListNoIn ,
tSemicolon,
(expression opt),
tSemicolon,
(expression opt),
rightparen, statement.
forStatementNoVar = for, leftparen,
(expression opt),
tSemicolon,
(expression opt),
tSemicolon,
(expression opt),
rightparen, statement.
forInStatement = for, leftparen,
var, variableDeclarationListNoIn ,
in,
expression ,
rightparen, statement.
forInStatementNoVar = for, leftparen,
leftHandSideExpression ,
in,
expression ,
rightparen, statement.
whileStatement = while,leftparen,expression,rightparen,statement.
continueStatement = continue, (identifier opt), tSemicolon.
breakStatement = break, (identifier opt), tSemicolon.
returnStatement = return, (expression opt), tSemicolon.
withStatement = with, leftparen, expression ,rightparen, statement.
switchStatement = switch ,leftparen,expression,rightparen, clauseBlock.
clauseBlock = leftbrace,(clause star),(defaultClause opt),rightbrace.
clause = case, expression, tColon,statementList.
defaultClause = default,tColon, statementList.
labelledStatement = identifier,tColon,statement.
throwStatement = throw,expression,tSemicolon.
tryStatement = try, block, (catchBlock opt), (finallyBlock opt).
catchBlock = catch, leftparen, identifier,rightparen, block.
finallyBlock = finally, block.
functionDeclaration = function,identifier,
leftparen,formalParameterList,rightparen,
leftbrace,sourceElements,rightbrace.
sourceElements = (statement | functionDeclaration ) star.
program = sourceElements.
|
)
...
I really like the way you can separate the grammar from the AST creation (as described in the Executable Grammars[PDF] paper). As you can see there's no code specified for this purpose. Along with the source code there's is a 'testing AST' and a parser that inherits from this grammar which is used by the unit tests .
Almost all the grammar was written using the parser combinators from the library. Only
charExceptFor:
and accept: ifNotIn:
were created. There are still a lot work to do with this parser:
- Clean up the code
- Work on performance issues
- Find a solution for the "Automatic Semicolon Insertion" feature (see section 7.9) of the Ecma document)
- Get rid of some repetition (for example the 'NoVar' productions which are also present in the document)
- Better AST creation
- See if unicode support is possible
- More tests!
In order to see the result of using this parser I created a little program to display the 'testing AST'. For example:
The parser with tests and the other code mentioned can be found here .