The Elements of Computing  Systems / Nisan & Schocken / www.idc.ac.il/tecs  

Project 10:
Compiler I (Syntax Analysis)

Objective: Build a compiler for Jack -- a modern, object-based, Java-like language.  The compiler construction spans two projects: 10 (Syntax Analysis) and 11 (Code Generation).  In this project we build a syntax analyzer that parses programs according to the Jack grammar, producing an XML file that reflects the program's structure.  In the next project, the XML code will be replaced with executable machine code.

Resources: The main tool in this project is the programming language in which you will implement the syntax analyzer. You will also need the TextComparer utility (available in your tools directory, assuming that you've installed the TECS software suite), which allows comparing the output files generated by your analyzer to the compare files supplied by us. You may also want to inspect the generated and supplied output files using an  XML viewer (any standard web browser should do the job). All the test files necessary for this project are available in project 10.zip.  Start by creating a directory named projects/10 on your computer, then extract this file into it (preserving the directory structure embedded in the zip file).

Contract: Write the syntax analyzer program in two stages: tokenizing and parsing. Use it to parse all the .jack files supplied below. For each source .jack file, your analyzer should generate an .xml output file. The generated files should be identical to the .xml compare-files supplied by us.

Test Programs

A natural way to test your analyzer it is to have it parse some representative Jack programs. We supply two such test programs: Square Dance and Array Test. The former includes all the features of the Jack language except for array processing, which appears in the latter. We also provide a simpler version of the Square Dance program, as explained below.

Square Dance:  A simple interactive game, described in Project 9. The game implementation is organized in three classes:

Source class files

Description

Tokenizer output

Parser output

Main.jack

Initializes and starts a new "square moving session."

MainT.xml

Main.xml

Square.jack

Implements an animated square. A square object has a screen location and size properties, and methods for drawing, erasing, moving, and size changing.

SquareT.xml

Square.xml

SquareGame.jack

Runs the program according to the game rules.

SquareGameT.xml

SquareGame.xml

Note: The three source Jack files comprising the Square Dance game are identical to those stored in the projects/09/Square directory.  For completeness, an identical copy of these files is also available in the projects/10/Square directory.

Expressionless Square Dance: This is a version of Square Dance in which each expression in the original source code was replaced with a single identifier (some variable name in scope). This version of the program was designed for one purpose only: unit-testing the analyzer’s ability to parse everything except for expressions. Note that the replacement of expressions with variables has resulted in a nonsensical program that cannot be compiled by the supplied Jack Compiler. Still, it follows all the Jack grammar rules. The expressionless files have the same names as those of the original files, but they are located in a separate directory (projects/10/ExpressionlessSquare):

Source class files

Tokenizer output

Parser output

Main.jack 

MainT.xml

Main.xml

Square.jack

SquareT.xml

Square.xml

SquareGame.jack

SquareGameT.xml

SquareGame.xml

Array test: A single-class Jack program that computes the average of a user-supplied sequence of integers using array notation and array manipulation:

Source class files

Tokenizer output

Parser output

Main.jack 

MainT.xml

Main.xml

Experimenting with the test programs: If you want, you can compile the Square Dance and Test Array programs using the supplied Jack compiler, then use the supplied VM Emulator to run the compiled code. These activities are completely irrelevant to this project, but they serve to highlight the fact that the test programs are not just plain text (although this is perhaps the best way to think about them in the context of this project).

 

Stage I: Tokenizer

 

First, implement the JackTokenizer module specified in the chapter 10. When applied to a text file containing Jack code, the tokenizer should produce a list of tokens, each printed in a separate line along with its classification: symbol, keyword, identifier, integer constant, or string constant. The classification should be recorded using XML tags.  Here is an example:

 

Source Code

Tokenizer output

if (x < 153) {let city = ”Paris”;}

<tokens>

  <keyword> if </keyword>

  <symbol> ( </symbol>

  <identifier> x </identifier>

  <symbol> &lt; </symbol>

  <integerConstant> 153 </integerConstant>

  <symbol> ) </symbol>

  <symbol> { </symbol>

  <keyword> let </keyword>

  <identifier> city </identifier>

  <symbol> = </symbol>

  <stringConstant> Paris </stringConstant>

  <symbol> ; </symbol>

  <symbol> } </symbol>

</tokens>

Note that in the case of string constants, the tokenizer throws away the double quote characters. That’s OK.

The tokenizer’s output has two “features” dictated by XML conventions. First, an XML file must be enclosed in some begin and end tags, and that’s why the  <tokens> and </tokens> tags were added to the output. Second, four of the symbols used in the Jack language (<, >, ", &) are also used for XML markup, and thus they cannot appear as data in XML files. To solve the problem, we require the tokenizer to output these tokens as &lt;, &gt;, &quot;, and &amp;, respectively. For example, in order for the text “<symbol> < </symbol>” to be displayed properly in a web browser, the source XML should be written as “<symbol> &lt; </symbol>”.

Testing Your Tokenizer:

 

Stage II: Parser

Next, implement the CompilationEngine module specified in chapter 10. Write each method of the engine, as specified in the API, and make sure that it emits the correct XML output. We recommend to start by writing a compilation engine that handles everything except expressions, and test it on the expressionless Square Dance program only. Next, extend the parser to handle expressions as well, and proceed to test it on the Square Dance and Array Test programs.

Testing Your Parser:

  1. Apply your analyzer to the supplied test programs, then use the supplied TextComparer utility to compare the generated output to the supplied .xml compare files.

  2. Since the output files generated by your analyzer will have the same names and extensions as those of the supplied compare files, we suggest putting them in separate directories.

  3. Note that the indentation of the XML output is only for readability. Web browsers and the supplied TextComparer utility ignore white space.