LISP51 INTERPRETER PRELIMINARY DESIGN File: lisp51.pd Author: course (walton@deas.harvard.edu) Version: 2 1. Goal The LISP51 interpreter implements a subset of COMMONLISP in a clean direct manner that makes writing pieces of this interpreter a good exercise for beginning computer science students. 2. Components 2.1 Basic Support Basic support provides basic I/O facilities and interfaces to error routines that are not specific to LISP. The ein input stream is defined that echos its input when reading from a file. Files: basic.h estream.h basic.cc estream.cc 2.2 Object Manager The object manager implements a LISP memory conforming to the LISP memory model and containing symbols, cons-cells, fixnums, and primitive functions. A fixnum is an integer that will fit into a C++ long value, which is usually a 32-bit signed integer. A primitive function is a LISP object that points at a C++ function that implements a LISP function. Primitive functions are provided for only the most basic operations of LISP, such as CONS, CAR, CDR, SYMBOL-FUNCTION, and COND. Files: object.h obtype.h cons.h fixnum.h function.h symbol.h obtype.cc cons.cc fixnum.cc function.cc symbol.cc 2.3 Scanner/Parser The lexical scanner reads tokens from input streams. Numbers and symbols produce tokens containing LISP number (fixnum) and symbol objects, while other tokens (e.g. '(') just contain a token type identifying what was read. Comments are skipped over. The parser reads s-expressions from input streams. It uses the lexical scanner to get tokens, and interprets these as representing s-expressions. Files: scanner.h scanner.cc parser.cc 2.4 Evaluator The evaluator handles the evaluation of s-expressions. The evaluator maintains a current environment that contains a list of the names and values of all the local variables (function formal arguments) that are currently defined. Files: eval.h eval.cc 2.5 Primitives The primitives implement basic LISP functions in C++. Thus there is a primitive named prim_car that implements CAR, and one named prim_cons that implements CONS. Some utilities that support these are also provided, such as a length function to compute the length of argument lists. Files: prims.h prims.cc length.cc 2.6 Catch/Throw Catch and throw implement the basic functionality of LISP CATCH and THROW (similar to C longjmp). This is used by the error routines to return to the top level. Files: catch.h catch.cc 2.7 Top Level The top level component provides the toplevel function to perform the read/eval/print loop at the lisp51 program's top level or while loading or running files. The top level component also provides the main function for the lisp51 program, and error functions for printing error messages. The error functions throw to the symbol TOP-LEVEL in order to get back to the top level after an error. The top level component also provides system wide constants, such as the maximum symbol length. Files: lisp.h const.h lisp.cc error.cc toplevel.cc 2.8 The Garbage Collector The garbage collector identifies LISP objects that cannot be reached by the program, and returns them to free lists. There is also a preserve system that keeps track of which objects are reachable from currently executing functions. Files: gc.h preserve.h gc.cc preserve.cc 2.9 LISP Definitions Only the most primitive functions, e.g. CAR and CONS, are defined by LISP51 in C++. Other COMMONLISP functions, e.g. LIST and APPEND, are implemented using these primitive functions. The file lisp51.lsp that is read in when the lisp51 program starts contains definitions of many COMMONLISP functions written in LISP. Files: lisp51.lsp 3. Object Manager The Object Manager implements a LISP memory consisting of LISP objects that conform to the LISP memory model. The following types of object are always supported: symbols conses (cons cells) fixnums (32-bit signed integers) functions (C++ functions that implement COMMONLISP functions) A major feature of the implementation is that new types of object can be added without changing any old code, except perhaps the parser if the new object types have their own special input formats. All object types inherit from the class `object'. 3.1 Data Grammar for Object Type and Its Descendents object ::= ( type, marked, link ) type ::= SYMBOL_TYPE | CONS_TYPE | FIXNUM_TYPE | FUNCTION_TYPE | ... marked ::= true | false link ::= pointer to object object virtual functions ::= ( print, scavenge ) symbol ::> object symbol ::= ( name, value, function, plist ) name ::= character* symbol::value ::= UNBOUND | pointer to object function ::= UNBOUND | pointer to object plist ::= pointer to object UNBOUND ::= NULL cons ::> object cons ::= ( car, cdr ) car, cdr ::= pointer to object fixnum ::> object fixnum ::= ( value ) fixnum::value ::= 32-bit signed integer [C++ long] function ::> object function ::= ( cfun, special ) cfun ::= object * (* ...) (object * args) pointer to function special ::= true | false 3.2 C++ Summary for Object Type and Its Descendents class object { object_type * type; bool marked; object * link; virtual ostream & print (ostream & s); virtual void scavenge (); } class cons : object { object * car; object * cdr; } class symbol : object { char * name; object * value; object * function; object * plist; } class fixnum : object { long value; } class function : object { c_function cfun; bool special; } typedef object * (* c_function) (object * arglist); 3.3 Summary of Object Making Functions: cons * make_cons (object * car, object * cdr); symbol * make_symbol (char * name); fixnum * make_fixnum (long value); function * make_function (c_function cfun, bool special); 3.4.1 Notes on Object Type: type is actually a pointer to an instance of the object_type class that contains data and virtual functions specific to the type. marked is a flag used by the garbage collector. link is used to build lists of used and free objects. print is called by <<. scavenge is called by the garbage collector function to mark the object. 3.4.2 Notes on Symbol Type: name is a character string separately allocated by new/delete. value can be NULL if value is unbound. Initialized to NULL == UNBOUND. function can be NULL if function is unbound. Initialized to NULL == UNBOUND. plist initialized to NIL. 3.4.3 Notes on Function Type: cfun is a pointer to a C++ function that implements a LISP function by taking an object * C++ argument that points at a LISP list of the LISP arguments, and by returning a LISP value. 3.5 Inline Helper Functions for the Object Type and Its Descendents object_type * type_of (object * ob) Returns type. ostream & operator << (ostream & s, object * ob) Calls print. void object::object_init (object_type * ty) Sets type = ty, marked = false, link = NULL. Functions to get object components. The names are mostly like those in COMMONLISP. object * car (cons * c) c->car object * cdr (cons * c) c->cdr char * symbol_name (symbol * s) s->name object * symbol_value (symbol * s) s->value object * symbol_function (symbol * s) s->function object * symbol_plist (symbol * s) s->plist long value (fixnum * i) i->value Versions of above that do unchecked conversion of object * to appropriate pointer type. object * unchecked_car (object * ob) object * unchecked_cdr (object * ob) long unchecked_value (object * ob) Mutators for above: object * set_car (cons * c, object * val) object * set_cdr (cons * c, object * val) object * set_symbol_value (symbol * s, object * val) object * set_symbol_function (symbol * s, object * val) object * set_symbol_plist (symbol * s, object * val) Functions that check the type an object. bool consp (object * ob) bool symbolp (object * ob) bool fixnump (object * ob) bool functionp (object * ob) Functions to convert object * to xxx * which return NULL if object is not of type xxx: cons * may_be_cons (object * ob) symbol * may_be_symbol (object * ob) fixnum * may_be_fixnum (object * ob) function * may_be_function (object * ob) Functions to convert object * to xxx * which call error if object is not of type xxx: cons * must_be_cons (object * ob) symbol * must_be_symbol (object * ob) fixnum * must_be_fixnum (object * ob) function * must_be_function (object * ob) Functions to convert object * to xxx * WITHOUT any type checking. cons * unchecked_must_be_cons (object * ob) symbol * unchecked_must_be_symbol (object * ob) fixnum * unchecked_must_be_fixnum (object * ob) function * unchecked_must_be_function (object * ob) Other: object ** cdr_pointer (cons * c) ===> & c->cdr object ** symbol_value_pointer (symbol * s) ===> & s->value bool null (object * ob) ===> ob == NIL bool unboundp (object * ob) ===> ob == NULL 3.6 Data Grammar for Object_Type Type object_type ::= ( name, free_list, used_list, link ) name ::= character* free_list ::= pointer to object used_list ::= pointer to object link ::= pointer to object_type object_type virtual functions ::= ( initialize, get_free, put_used, clear_marked, mark, sweep, gc_print ) object_type::list ::= pointer to object_type 3.7 C++ Summary for Object_Type Type class object_type { char * name; object * free_list; object * used_list; object_type * link; virtual initialize(); virtual object * get_free (); virtual void put_used (object * ob); virtual void clear_marked(); virtual void sweep(); virtual void gc_print(); } object_type * object_type::list; 3.8 Notes on Object_Type Type: name is "SYMBOL", "FIXNUM", etc. free_list heads a NULL terminated list of free objects of given type that are linked via object link's. used_list heads a NULL terminated list of used objects of given type that are linked via object link's. link is used for a NULL terminated list of all object_type instances headed by the static global object_type::list. initialize is called by main() during system startup. It may make symbols needed to implement the type, for example. get_free is called by the object make_... function to try to get an object from the free list for the type of the object. put_used is called when an object is initialized by make_... to put the object on the used list for the type of the object. clear_marked See garbage collector. mark sweep gc_print Each type of object, xxx, has a corresponding type, xxx_type, that is a descendent of object_type, and has a single instance of that type created by the statement: object_type * XXX_TYPE = new xxx_type; The class xxx_type declaration followed by the above statement is in the file xxx.cc. The class xxx_type supplies virtual functions as needed to override the definitions given for object_type. For example, the used list could be restructured as a hash table and a type specific put_used() function provided, along with all the many other virtual object_type functions that access the used list. All the instances of object_type are created when the .cc files are initialized. A list of all these is created at this time, headed by object_type::list. This list is used by the function initialize_all_object_types(), which creates NIL and calls all the object_type initialize virtual functions. 3.9 Data Subgraphs 3.9.1 Virtual Function Subgraph object vft ---------- object +---------------> | print | ------- | | scavenge | |vft * |-----+ ---------- |type |---+ |marked | | object_type object_type vft |link | | ----------- --------------- |... | +-->| vft * |---->| initialize | ------- | name | | get_free | | free_list | | put_used | | used_list | | clear_marked | | link | | sweep | | ... | | gc_print | ----------- --------------- Given an object, one can use its print and scavenge virtual functions directly, and get to other virtual functions, such as put_used, through the object's type. 3.9.2 Object Type List Subgraph SYMBOL_TYPE ----------- object_type::list -->| vft * | | name |--> "SYMBOL" | free_list | | used_list | +-----| link | | ----------- | | FIXNUM_TYPE | ----------- +---->| vft * | | name |--> "FIXNUM" | free_list | | used_list | +-----| link | | ----------- | | CONS_TYPE | ----------- +---->| vft * | | name |--> "CONS" | free_list | | used_list | +-----| link | | ----------- | | FUNCTION_TYPE | ----------- +---->| vft * | | name |--> "FUNCTION" | free_list | | used_list | | link | == NULL ----------- All the different object_type values are on a list, and can be accessed by going through the list. The list head is object_type::list, the link pointer is object_type::link, and link value for the last object_type on the list is NULL. For example, the initialization routines go through the list calling the initialization virtual functions for each object_type. 3.9.3 Used List (and Free List) Subgraph +------------------------------+ object | | ------- | object_type | |vft * |<--+ ----------- | |type |-------->+----->| vft * | | |marked | | | name | | |link |----+ | | free_list | | |... | | | | used_list |-----+ ------- | | | link | | | | ... | object | | ----------- ------- | | |vft * |<---+ | |type |-------->+ |marked | | |link |----+ | |... | | | ------- | | | | object | | ------- | | |vft * |<---+ | |type |-------->+ |marked | |link | == NULL |... | ------- All the objects with the same object_type are linked together on the used list of the object type. The head of the list is object_type:: used_list; the link is object::link; the last object on the list has a NULL link. Free objects of a given type are similarly linked on the free list of the object type. But free objects have their type component set to NULL to detect errors. The structure of the used list may be changed for selected object types in order to facilitate lookup of objects using the used list: e.g. for symbols the used list may be restructured as a hash table: i.e. a vector of lists in place of a single list. 3.10 Object Manager General Notes 3.10.1 The lists linked by the various `link' members are terminated by a NULL. 3.10.2 An object * cannot be NULL except in a very few places. The value and function of a symbol are such places. A return value of >> is another (indicates EOF). The `link' member at the end of a list is such a place, as are the heads of these lists, the free_list and the used_list. It is imperative to keep NULL object * pointers from escaping from these places to other places where they are not allowed. 4. Scanner/Parser 4.1 Scanner The scanner is a strict Deterministic Finite Automaton scanner that returns tokens as defined by the data grammar: token ::= ( token_type, value ) token_type ::= LPAREN_TOKEN | RPAREN_TOKEN | RBRACKET_TOKEN | DOT_TOKEN | QUOTE_TOKEN | FNQUOTE_TOKEN | NUMBER_TOKEN | SYMBOL_TOKEN | EOF_TOKEN | ERROR_TOKEN value ::= pointer to object The value is set only for SYMBOL_TOKEN and NUMBER_TOKEN types. The interface functions are: token get_token (istream & s); void backup_token (); where backup_token can backup at most one token in the input token stream. A specification for these functions and for the token strings accepted is given in scanner.sp. 4.2 Parser The parser is a recursive descent parser implementing a grammar that is an extension of: sexpr ::= atom | () | ( sexpr rest rest ::= ) | . sexpr ) | sexpr rest This grammar is designed so that the function to get a `rest' is called exactly when a cdr is needed for a make_cons. 5. Evaluator A precise prototype of the evaluator is given in meta-lisp.lsp. 6. Primitives Prototypes for the primitives are given in meta- lisp.lsp. These are precise except for error checking and primitives that require input/output or catch/throw. The helper function: int length (object * args) takes a prospective argument list, checks that it is an undotted list and calls error if it is dotted, and returns the length of the list otherwise. After applying this function, the functions: object * unchecked_car (object * args) object * unchecked_cadr (object * args) object * unchecked_caddr (object * args) object * unchecked_cadddr (object * args) object * unchecked_cdr (object * args) object * unchecked_cddr (object * args) object * unchecked_cdddr (object * args) object * unchecked_cddddr (object * args) may be used to get arguments or parts of the argument list. The function void initialize_primitive_functions(); installs all primitive functions in the symbol table. 7. Catch/Throw The catch/throw mechanism implements a facility similar to setjmp/longjmp, but adapted to LISP51. The code catch_area ca; object * result = lisp_catch (ca, tag); if (! result) { // Code that might `lisp_throw (tag, value)' . . . . result = ...; } // Use result implements the LISP catch and throw. Lisp_catch saves the environment stack pointer, current catch stack pointer, the tag, and a jmp_buf as per setjmp in the catch_area, and returns NULL. If a lisp_throw (tag, value) occurs, the values saved in the catch area are restored, and lisp_catch returns a second time with the value thrown. The catch area should be created in the standard C++ stack. Its destructor will pop it off the catch stack if it is at the top of that stack. 8. Top Level 8.1 Main The main function in lisp.cc: prints the welcome message calls enable_out_of_memory_error calls preserve_stack::initialize calls initialize_all_object_types calls preserve_all_existing_symbols calls initialize_primitive_functions sets environment to NIL calls catch_area::initialize calls toplevel to process the `lisp51.lsp' file if that file exists calls toplevel to process the standard input prints the goodbye message The current input and output streams are pointed at by the global variables: istream * in; ostream * out; 8.2 Toplevel The `toplevel' function performs the read-eval-print loop until the current input stream returns an end-of-file. It also sets up a catch on the symbol TOP-LEVEL to catch any throws from error routines. Toplevel saves the input and output stream pointers (`in' and `out'), environment stack pointer, preserve stack pointer, and value of the *TRACE* symbol, as they were when toplevel is called. These are restored on a catch of a throw. Also, if the input stream on a catch is different from that saved, it is deleted, and similarly for the output stream. The input and output streams are restored from the saved versions immediately after evaluating an expression that has been read from the input stream, if there is NO catch. Similarly after such an evaluation a check is made to be sure the environment stack and preserve stack pointers are equal to their settings before the expression was evaluated. 9. Garbage Collector The garbage collector is a standard mark and sweep collector. The main algorithm is: print the line: "<<<>>>" print before-collection statistics clear all marked flags mark all reachable objects sweep all unmarked objects to the free lists print after-collection statistics print the line: "<<<>>>" The beginning and ending printed lines must be EXACTLY what is given so test programs can filter out the GC printing. This permits test runs without frequent GC's to be compared to test runs with frequent GC's. A garbage collection is triggered automatically after each `gc_limit' calls to the make_... functions (make_cons, make_symbol, make_fixnum, etc.). Note however that a call to make_symbol that finds a pre-existing symbol does not count. Gc_limit is set and read by the functions: void set_gc_limit (int value); int gc_limit(); Garbage collection can also be triggered manually by calling void gc(); This is only done by the LISP51 GC function via prim_gc. The garbage collector can be tested by setting gc_limit to a small value and running tests of the rest of lisp51 to see if they give the same results after the GC printout has been filtered out. 9.1 Marking The marking algorithm uses the marked flags in each object (see description of the `object' type). 9.1.1 Clearing Marked Flags To clear the marked flags of all used objects, the garbage collector goes through the object_type::list of all object types and calls all the object_type::clear_ marked functions. These locate all objects of the type that are in use, and clear their marked flags. The default implementation of object_type::clear_marked (which is used by all object types in the basic implementation of lisp51) just goes through the object_type::used_list clearing all marked flags. 9.1.2 The Root Set The root set consists of the environment, objects reachable from C++ code, and objects identified by object_type::mark() functions. 9.1.2.1 The Environment The environment is reachable. 9.1.2.2 Objects Reachable from C++. Pointers to the objects currently reachable from C++ code are placed on a special stack called the preserve stack. The functions: object * preserve (object * ob); object * release (void); push and pop an object (and return the object pushed or popped in case that is convenient). The function void preserve_stack::initialize(); initializes the stack to empty. The pointer to the top of the preserve stack is object ** preserve_stack::current and can be saved, restored, or inspected, but only by friends, as it is private. 9.1.2.3 Other Reachable Objects Other reachable objects are identified and marked by the object_type::mark functions for each object_type. The default for this function is to do nothing. Currently the only non-default for this function is symbol_type::mark, which marks all symbols that are NOT identical to what they would be if they were freshly re-read (recreated by the scanner). Specifically a symbol is marked UNLESS it meets all these criteria: value is NULL function is NULL plist is NIL 9.1.2.4 Marking ALL Reachable Objects To mark ALL reachable objects, the garbage collector uses a function void gc_mark (object * ob) that marks a single object (see below). The collector calls gc_mark for the environment. The collector then calls gc_mark for all objects pointed at by the preserve stack. The collector lastly calls all object_type::mark functions. The symbol_type::mark function calls gc_mark on all the symbols it wishes to mark. 9.1.3 Marking and Scavenging Objects The function void gc_mark (object * ob) marks an object. If first checks whether the object is already marked, and does nothing in that case. If the object is NOT already marked, the object's marked flag is set first and then object::scavenge is called to scavenge the object. The marked flag is set first to avoid scavenging the object more than once in case there is a loop in the object graph. Scavenging an object is defined as finding all the object * point components in the object which are pointers in the object graph, and calling gc_mark for each of these. This is done by a type specific virtual function, object::scavenge, for each type of object. Note that the link pointer is not part of the object graph and is not marked. For example, the cons::scavenge function calls gc_mark for the car and cdr components, and the fixnum::scavenge function does nothing. The symbol::scavenge function must be careful not to mark an unbound (NULL) value component or function component. 9.2 Sweeping The purpose of sweeping is to move all the unmarked objects from used lists to free lists. Since the used and free lists may have object type specific implementations, sweeping is done completely by the object_type::sweep functions. The collector just calls all these functions. The default object_type::sweep function uses the default implementation of the used and free lists. It just goes through the object_type::used_list removing all unmarked objects and putting them on the object_type::free_list. This default implementation is used by all types in the basic lisp51 system. When the sweep puts an object on the free list, it sets the type of that object to NULL to catch garbage collector bugs in case a reachable object is freed and subsequently used. 9.3 Printing Statistics Printing memory statistics can be done by calling all the object_type::gc_print functions. The default object_type::gc_print function prints the line: " TTTT: free FFFF, used UUUU" with 4 initial spaces for indentation, where TTTT is the object_type::name, FFFF is the count of all objects on the object_type::free_list, and UUUU the count of all objects on the object_type::used_list. If the structure of these lists is changed for a particular type, the gc_print function for that type should be changed. 10. LISP Definitions When the lisp51 program starts up, the file `lisp51.lsp' is loaded (silently) if it exists. This file contains definitions for all the COMMONLISP functions implemented by lisp51 that are NOT primitives. The file is self-explanatory. 11. Overall Notes 11.1 The compiler in lisp51.lsp compiles away macro calls, but this does not help efficiency very much, and the compiler takes a very long time to compile everything in lisp51.lsp. 11.2 The principal inefficiency is probably garbage generated by the implementation of the environment. This could be fixed by replacing the environment with a stack like the preserve stack. The current environment operates like a vector whose elements are variable/value pairs. This vector has a current stack top and a current stack bottom: saving the environment and resetting it to NIL in the non-stack implementation is like saving the current stack bottom and resetting it to the current stack top; with restoration like resetting the current stack top to the current stack bottom and restoring the old value of the stack bottom. Closures cause complications to this scheme. When a closure binds the current environment, it is necessary to move the part of the current environment between the current stack bottom and the current stack top, the part bound, from the environment vector to the heap, so it will not be lost. There are several clever ways of managing this, which we will leave to the reader's imagination.