Assignment 7 File: asst7.txt Author: CS 51 (Bob Walton) Version: 1 This is the first of three assignments in which you program the guts of an interpreter called lisp51 for a subset of the COMMONLISP language. In this assignment you will implement the lexical analyzer, the function to make symbol objects, the printer for cons-cells, and the final parts of the s-expression parser. In the next assignment you will implement the evaluator and a few COMMONLISP functions. In the last assignment you will implement the garbage collector and a hash table optimization for symbol lookup. All three assignments will be done in a single directory. To start this assignment: Make an asst7-9 directory. cd cp -p ~lib51/asst7-9/* . You will find that the following are defined (though some will not work until you have done the previous assignment): make Makes the lisp51 program. make asst7 Makes asst7 outputs. make print7 Prints asst7 files. make submit7 Submits for assignment 7. make sprint7 Prints asst7 for submission. make asst8 Ditto but for assignment 8. make print8 make submit8 make sprint8 make asst9 Ditto but for assignment 9. make print9 make submit9 make sprint9 Also, whenever you submit, the RCS ci program is automatically called to record your source files for each submission. It is a good idea to start by doing make ci Make backups of source files. to get used to this before you have to submit. When backing up a file the ci program often asks for a message to store in its log, consisting of several lines ending in a line containing only a period `.'. Just typing a period and a carriage return usually suffices as these messages are not typically worth the trouble of writing. The lisp51 interpreter is describe in class handouts that are separate from the assignments. You are given design documentation files, all the .h files arranged alphabetically, and all the .cc files arranged alphabe- tically. Some files important for this assignment are the .cc files: scanner.cc symbol.cc cons.cc parser.cc and the design documentation file: scanner.sp This assignment also includes a printout of the test input for `make asst7', namely the file test-scanner.in. In this assignment you are given a prototype for a scanner, and asked to write the full scanner. A prototype is a program similar to the final program but only intended to test basic concepts used in implementing the final program. In this case a method for coding DFA's (Deterministic Finite Automata) has been prototyped. Following the method of this prototype, you are to code the final lisp51 scanner. You have also been given two unimplemented functions to write: make_symbol, and cons::print. Lastly you have been given a parser whose grammar is incomplete and needs to be extended to handle quote ('), function quote (#'), and the square right bracket (]). The specific things you MUST do for assignment 7 are: (1) Read scanner.cc. Test it if you feel that will help you understand it (see use of gdb in (8) below). If you do make at this point, lisp51 will be made, and you can execute it by typing lisp51 It does not work very well for several reasons. First, the scanner thinks numbers are symbols, and thinks that every non-atom character is an error token. It also does not implement symbol escapes with |'s. Second the make_symbol routine just returns a dummy symbol with a print name that is a NULL char*, which always prints as (null). Third, though one will never see it given the other limitations, cons::print just prints: PRINTING CONS CELLS NOT IMPLEMENTED And lastly, the parser does not yet understand ' or #'. Read scanner.sp, the specification for the final scanner. Your task is to extend the scanner to fit this specification. (2) Revise the grammar at the beginning of scanner.cc to meet the specification. Revise the DFA after the grammar in scanner.cc to meet the grammar (i.e. the specification). This is most of the design part of this assignment. (3) Write the code for scanner.cc. The programmer who wrote the prototype left TBW signs, denoting `To Be Written', in the places you are likely to have to add or change code. After the code is done, you should be able to type numbers into lisp51 and get the numbers back. As part of this you must finish the scanner_state class definition. This has been implemented except for the fact that it does not save the accepted characters and return them in a string when output() is called. If more than MAXSYMSIZE (defined in const.h) characters are in a symbol, you should call error ( "Symbol too long", ), but remember to terminate the buffer with a `\0' first. If you type symbols in, you will get (null) back as make_symbol has not been implemented. If you type lists in you will get: PRINTING CONS CELLS NOT IMPLEMENTED back, as cons::print has not been implemented. (4) Write make_symbol in symbol.cc. Use make_cons in cons.cc as a guide. The components of the symbol class instance are to be set as follows: name to a copy of the name newly allocated with the `new' operator value to NULL symbol_function to NULL plist to NIL You cannot just store the name argument to make_cons in the name component of the new symbol as the argument may occupy memory in the stack that will disappear when various functions return. Another difference between make_cons and make_symbol is that symbols are supposed to be unique: there is only one in all of memory with a given name. Each object is kept on a used list such as SYMBOL_TYPE->used_list listing all objects of a given type. So when make_symbol is called, it should first search the symbol used list to see if the symbol already exists, and if so, return the existing symbol, instead of making a new symbol. It would be incorrect for make_cons to do this. The system has been programmed so that make_fixnum may or may not do this as it chooses (it currently does not). After you finish make_symbol, typing a symbol at lisp51 will type the symbol back at you. This is because the eval function in lisp51 currently just returns the argument it was passed, so (eval x) ===> x and no quoting is necessary (until the evaluator is properly written in assignment 8). At this point you should start running make asst7 and looking at test-scanner.out for bugs in the scanner. Bugs involving the escape character are common: check for correct answers by comparing with scanner.sp or running ~lib51/bin/lisp. Tests inputing lists and ', #', and ] will not work yet. (5) Next write cons::print so that if you type in lists, the interpreter will echo them properly. Be sure to use proper spacing and syntax. (A B) is correct (A B ) is NOT correct (A B . NIL) is NOT correct Note that cons::print is NOT supposed to treat (QUOTE FOO) or (FUNCTION FOO) in any special way: they should print as (QUOTE FOO) and (FUNCTION FOO) and not as 'FOO and #'FOO. (6) Extend the grammar at the beginning of parser.cc to include processing of 'x == (QUOTE x) #'x == (FUNCTION x) ] == )...) where x is an s-expression, and ] is equivalent to just enough )'s to close all the open parentheses input beforehand. Here == means `are two ways of inputing the same thing'. Its not possible to write ] sensibly into the grammar equations (as far as we know). For it you will just have to add a note to the end of the grammar. This is the other design part of this assignment. The final result should treat 'foo as if (QUOTE FOO) had been input, #'foo as if (FUNCTION FOO) had been input, and #'(lambda () 3) as if (FUNCTION (LAMBDA NIL 3)) had been input. You SHOULD NOT modify the printer to handle printing quote constructs in a special way: it should print (QUOTE FOO). (7) Revise the code in parser.cc to include the grammar extensions above. You may NOT add any static variables to the code in doing this. You must keep track of the number of open parentheses by passing an extra argument to the various functions. You should also fix the following bug in the code given you. If you start lisp51 and type ( 8 control-D control-D lisp51 may crash, or at least get its memory into an unusual state with NULL object * pointers (control-D is an `end-of-file' in UNIX when you are typing at a terminal; but you must type control-D twice to end a file when ein is being used for input). Figure out why the NULL object * pointers appear. Given that you now have an open parenthesis count, it is easy to call error at the right moment instead of creating undesired NULL pointers. You should make the call: error ( "Premature end of file" ); at the right point. (8) Test the code by running: make asst7 and examining test-scanner.out. Note that test-scanner.in only works when eval is just returning its argument without trying to evaluate it. Use gdb as appropriate. Try gdb lisp51 br get_token run n .. more n's until first s.get() is executed ... foo n ... more n's until return statement is about to be executed ... p tok p {symbol}tok.value Here when the first s.get() is executed, the program waits for you to type a line, in this case foo At the end tok is to be returned. IF it is a SYMBOL_TOKEN, then p {symbol}tok.value prints the `symbol' class object at the address given by tok.value. The design part of this assignment is worth 10 points, and the code part is worth 65 points (this assignment is worth 75 points instead of the usual 60 points). There is NO separate theory part to this assignment: you need only hand in the make sprint output. The following are the SLOC counts for a well written solution. In C++ a SLOC is defined as a non-comment line containing a semi-colon. scanner.cc 69 make_symbol 15 cons::print 9 parser.cc 20 Half these SLOC counts will be the number of points that will be deducted if you do NOT do a function (they do NOT sum to 65). Hints: (1) You may find it appropriate to write your scanner a piece at a time, as the full thing is a bit complex. In general writing complex programs in stages is a good idea. Just get all-digit numbers and simple symbols working at first, and then write make_symbol so you can see your symbols. Then implement single character tokens and write cons::print. Then start figuring out the subtle stuff. What if + or - begins a token. +9 is a number, but + and +x are symbols. What about the symbol 9x. Then add escapes. What about the . token, and symbols such as `.xyz'. What if a token begins with #. What if the string |xyz is followed immediately by the end of file. Good Luck!