Showing posts with label interpreter. Show all posts
Showing posts with label interpreter. Show all posts

Sunday, July 19, 2009

JASI v1 is done

JASI - Just Another Scheme Interpreter

code is here.
other posts related to jasi are here.

I'm done writing java implementation of the scheme interpreter that started just to better my understanding of interpreters in general and to learn what it takes to write one.

This implementation of mine doesn't solve any practical purpose(as of now atleast) and not an optimized one, I gave more importance to modularity and easy readability of the code.

It complies to a subset of R4RS and passes a subset of R4RS tests written by Aubrey Jaffrey, that subset is available with the code in the file tests.scm in top level directory.

Code Organization:
It consists of 4 main packages..

1. jasi.parser
This package contains the classes responsible reading the scheme expressions and provide the functionality of "read". Tokenizer.java does the job of lexical analysis and Reader.java gets tokens from it, reads S-expressions.

2. jasi.datatypes
The classes here represent various scheme datatypes.

3. jasi.symantics
This is the engine of the interpreter. Scheme.java has the "eval" to evaluate s-exps and this is where the language behavior is realized. Environment.java contains the code for maintaining environment and also initial global enironment that becomes available to the user in the interactive mode. If one wants to add another primitive procedure(that can't be implemented in scheme using existing primitives), then he needs to add binding for same in the global environment in Environment.initGlobalEnv() method.

4. jasi.symantics.procedure
CompoundProcedure.java contains the behavior of how user defined procedures are evaluated. PrimitiveProcedure.java contains implementation of (almost)all the primitive procedures.

I've tried to implement only the essential functionality in java, other primitives are written in scheme itself in the file src/main/resources/preloadedscm.scm that is loaded at interpreter startup. If one needs to add another primitive that can be written using the existing primitives, this is the place.

For more info, Please read the "readme" included in the code.

Saturday, June 27, 2009

scheme parser = "read"

This is another article for my ongoing learnings with writing a scheme interpreter(hosted here). As I described in my previous post that I started with writing a formal EBNF grammar and a recursive descent parser for it.
When I came to implementing "read" function of scheme, then I realised that it essentially is the parser of scheme and list data structure is AST. Moreover when I write the formal grammar then the syntactical rules of the language are fixed at the time of parsing itself and its very difficult now to provide macros. Instead the parser or the "read" should just read *any* valid s-expression and "eval" only should derive any meaning from it.
So I've now changed my implementation and written a "read" that reads any valid s-expression and "eval" provides meaning to it.

introducing jasi and my first parsing strategy

JASI - Just Another Scheme Interpreter.
I'm doing an experiment of writing a scheme interpreter(hosted here, code discussed in this post is available only upto revision#10).
I've read sometimes about basics of compilers, grammars, ast etc and followed that path to start with. So I started with creating a EBNF grammar(not strictly correct) for a subset of scheme. Then a parser. Basically there are two kind of parsers, Bottom-up(e.g. shift-reduce) and Top-down(e.g. recursive descent). Since this is an experiment, So I wanted to write the parser by hand. Its easy to write a recursive descent parser(algo is simple, each non-terminal symbol has a method to create its part of ast, we start with calling that method of the start non-terminal of the grammar and that in turn recursively calls same method for other non-terminals) by hand, So I tried to write the grammar in a way so that its easy to write a top-down recursive descent parser for it.I removed left recursion, tried to left factorize it so that starter symbol of any non-terminal for each production rule is distinct(from all other production rules for that non-terminal) and single look-ahead is enough to parse. Here is that grammar I wrote to start with...
expr :=
|CHAR ;charExpr
|STRING ;stringExpr
|NUMBER ;numberExpr
|VARIABLE ; variableExpr
|LPAREN DEFINE VARIABLE expr RPAREN ;definitionExpr
|LPAREN IF expr expr expr? RPAREN ;ifExpr
|LPAREN LAMBDA LPAREN variable* RPAREN expr+ RPAREN ;lambdaExpr
|LPAREN variable expr* RPAREN ;applicationExpr

CHAR := #\any-non-white-space-char
NUMBER := DIGIT(DIGIT)*
DIGIT := 0|1|2|3|4|5|6|7|8|9
STRING := "any-characters"
LPAREN := '('
RPAREN := )
VARIABLE := [^#0-9()] any-non-white-space-char

Notice that, I gave labels to each production rule and each label is mapped to a AST in the code. So here is the (classes of)tree structure that I created for ast...
AST
CharAST
StringAST
NumberAST
VariableAST
DefinitionAST
IfAST
LambdaAST
ApplicationAST

I wrote a class called Tokenizer to make tokens from stream of characters. Its implemented in terms of a finite state machine. And wrote another Parser class(a recursive descent top down parser) that gets tokens from Tokenizer and builds the AST. The most important method of Parser is here and the beauty is its simplicity(it derives straightforward from the grammar and indeed this is the reason parsers can be generated. :)
public AST parseExpr() {
Token currentToken = Tokenizer.getNextToken();
switch(currentToken.getType()) {
case Constants.TOKEN_TYPE_CHAR:
return parseCharExpr(currentToken);
case Constants.TOKEN_TYPE_NUMBER:
return parseNumberExpr(currentToken);
case Constants.TOKEN_TYPE_STRING:
return parseStringExpr(currentToken);
case Constants.TOKEN_TYPE_VARIABLE:
return parseVariableExpr(currentToken);
case Constants.TOKEN_TYPE_LPAREN:
currentToken = Tokenizer.getNextToken();
switch(currentToken.getType()) {
case Constants.TOKEN_TYPE_DEFINE:
return parseDefinitionExpr(currentToken);
case Constants.TOKEN_TYPE_IF:
return parseIfExpr(currentToken);
case Constants.TOKEN_TYPE_LAMBDA:
return parseLambdaExpr(currentToken);
case Constants.TOKEN_TYPE_VARIABLE:
return parseApplicationExpr(currentToken);
default:
throw new RuntimeException("Could not" +
" parse:" + currentToken.getType() +
":" + currentToken.getSpelling());
}
default:
throw new RuntimeException("Could not " +
"parse at:" + currentToken.getType() +
":" + currentToken.getSpelling());
}
}