Thursday, April 2, 2009

Parsing JSON with Newspeak

In this post I'm going to show a JSON parser written using the Newspeak parser combinator library.

Newspeak



Newspeak is a new programmming language. From its webpage http://newspeaklanguage.org/:


Newspeak is a new programming language in the tradition of Self and Smalltalk. Newspeak is highly dynamic and reflective - but designed to support modularity and security. It supports both object-oriented and functional programming.


In The Newspeak Programming Platform the authors give a nice introduction to the language and platform.

The first time I heard about Newspeak was by watching the Lang.NET 2008 symposium presentation video by Gilad Braha (available here). In this video, a nice parser combinator library is presented . This parser combinator library is described in the Executable Grammars in Newspeak paper by Gilad Bracha.

Code in this post was written using the Newspeak prototype released February 27 2009.

JSON



In order to learn about the language and platform I decided to create a little parser for JSON (Javascript Object Notation).

JSON is a simple data-interchange format defined in http://www.json.org/ .An example of it:


[{ "name": "Wiston Smith",
"description" :"Protagonist"},
{ "name": "Julia",
"description" :"Lover"},
{ "name": "O Brien",
"description" :"Goverment agent"}]


Parser structure



The parser is defined as a single class with a couple of nested classes. The following image shows the definition of JSON Parser in the Newspeak environment.

definition of the JSONParser class

As described in the "Modularity" section of the The Newspeak Programming Platform paper, top-level classes (in this case JSONParser) doesn't have access to its surrounding scope, it only has access to its own or inherited definitions. This is the reason why the JSONParser has the following definitions:


class JSONParser withParserLib: parserLibrary usingLib: platform = (
"Experiment for JSON parser based on the description from http://www.json.org/fatfree.html "
|
ExecutableGrammar = parserLibrary ExecutableGrammar.
CharParser = parserLibrary CharParser.
PredicateTokenParser = parserLibrary PredicateTokenParser.
Dictionary = platform Dictionary.
OrderedCollection = platform OrderedCollection.
Number = platform Number.
|
)
...


The "withParserLib: parserLibrary usingLib: platform" part defines parameters for the construction of JSONParser. These parameters are used to 'import' classes defined elsewhere.

The following code shows a way to create an instance of the JSONParser class:

|platform parser|
platform:: Platform new.
parser = (JSONParser withParserLib: (BlocklessCombinatorialParsing usingLib: platform) usingLib: platform).
...


The JSONParser nested classes are the following:

  1. CharExceptForParser which is a parser that accepts any character except for the one specified (this is for internal use)
  2. JSONGrammar The definition of the JSON grammar
  3. JSONGrammarWithAST which defines the way the AST is created
  4. JSONObject which is used in the representation of the JSON AST



Grammar



The following code shows the JSON grammar defined using the parsing combinators:


class JSONGrammar = ExecutableGrammar (
"Experiment for JSON grammar based on the description from http://www.json.org/fatfree.html "
|
doubleQuote = (char: $").
backslash = (char: $\).
str = doubleQuote,((backslash, ( char: $" )) |
(backslash, ( char: $/ )) |
(backslash, backslash) |
(backslash, ( char: $r )) |
(backslash, ( char: $n )) |
(backslash, ( char: $t )) |
(charExceptFor: $")) star, doubleQuote.
string = tokenFor: str.

negSign = (char: $-).
plusSign = (char: $+).
digit = (charBetween: $0 and: $9).
dot = (char: $. ) .
num = negSign opt, digit, digit star, dot opt,digit star, ((char: $e) | (char: $E)) opt, (plusSign | negSign) opt,digit star.
number = tokenFor: num.

leftbrace = tokenFromChar: ${.
rightbrace =tokenFromChar: $}.
colon = tokenFromChar: $:.
comma = tokenFromChar: $,.
definition = string,colon,value.
obj = leftbrace, (definition starSeparatedBy: comma),rightbrace.
object = tokenFor: obj.

leftbracket = tokenFromChar: $[.
rightbracket = tokenFromChar: $].
arr = leftbracket, (value starSeparatedBy: comma), rightbracket.
array = tokenFor: arr.

ttrue = tokenFromSymbol: #true.
tfalse = tokenFromSymbol: #false.
null = tokenFromSymbol: #null.

value = string | number | object | array | ttrue | tfalse | null.

|
)
...




For more information on the how this library works, check the Executable Grammars in Newspeak paper.


AST construction



We need to define a way to represent the tree structure(AST) parsed by JSONGrammar. As described in the "Executable Grammars is Newspeak" paper, one of the nice things about Newspeak is that we don't have the modify the grammar definition to add AST construction code. We can do that by inheriting from the original grammar:


class JSONGrammarWithAST = JSONGrammar(
"Parses a JSON File and generates and Ast"
|

|
)
('as yet unclassified'
array = (
^super array wrapper: [:a | (a token at: 2) ].
)


null = (
^ super null wrapper: [:o | nil].
)

number = (
^super number wrapper: [:o | Number readFrom: (flattenCharCollectionToString: (o token)) ].
)

object = (
^super object wrapper:
[:obj | JSONObject withContent:
(Dictionary newFrom: ((obj token at: 2) collect: [:e | (e at: 1) -> (e at: 3)]))].
)

parse: input = (
^super value parse: input.
)

string = (
^super string wrapper:
[:t | flattenCollectedString: (t token at: 2)].
)

tfalse = (
^super tfalse wrapper: [:o | false].
)

ttrue = (
^super ttrue wrapper: [:o | true].
)

...
)


As shown here (omitting some method definitions) the arrays are converted to Ordered collections, the numbers,strings,booleans to its equivalents and JSON objects to instances of JSONObject(described below).

JSONObject



In order to make it easy to use a JSON object in Newspeak the JSONParser class was defined:


class JSONObject withContent: dContent = (
"Instances of this class represent JSON objects."
|
content = dContent.
|
)
('as yet unclassified'
doesNotUnderstand: message = (
| fieldName |
fieldName:: message selector string.
(fieldName beginsWith: 'json_')
ifTrue: [fieldName:: fieldName allButFirst: 5].
^content at: fieldName ifAbsent: [nil].
)

)


This class receives a Dicionary as parameter. This dictionary contains all the name/value pairs of the JSON object definition. We create a definition of the doesNotUnderstand method which as in Smalltalk is called when a message sent to an object doesn't have a explicit way to respond it. We take the name of the message being called and check it against the dictionary.

If the message is prefixed by 'json_', the string after it is used as the key in the dictionary. This is defined this way because JSONObject has definitions inherited from Object (such as 'name').


...
| parsed |
parsed:: parserWithAST
parse: (streamFromString: '[{ "name": "Wiston Smith",
"description" :"Protagonist"},
{ "name": "Julia",
"description" :"Lover"},
{ "name": "O Brien",
"description" :"Goverment agent"}]'
).
assert:[((parsed at: 2) description) = 'Lover'].
assert:[((parsed at: 3) json_name) = 'O Brien'].




In the following post I'm going to use this parser to explore the GUI library provided with Newspeak.

Code for this post can be found here.

2 comments:

Gilad Bracha said...

Hi Luis,

A really nice post. I was curious about to minor points.

1. Your F# postings lead me to believe you use a PC. But your screen shot is taken from the non-native GUI. How come?

2. You tokenize everything when processing JSON - even non-lexical elements like objects and arrays. This surprised me, as it isn't necessary (nor the intent of the tokenizing support). But perhaps you saw some advantage to this?

Luis Diego Fallas said...

Thanks for your comments!

1. I'm running the Newspeak prototype on openSUSE Linux, that's why I'm using the the non-native GUI. I have a dual boot machine so I can experiment with both Linux and Windows.

2. Oops! thanks for pointing that out. I added the "tokenFor:" call to both "array" and "object" while trying to find a problem with the code and forgot to remove them later! .

Thanks again,
Luis