;;;; Assignment 6 ;;;; ;;;; File: asst6.txt ;;;; Author: CS 51 (Bob Walton) ;;;; Version: 2 This is the second of two assignments in which you program the heart and soul of the UNIX egrep(1) program: an algorithm for matching character strings (e.g. lines of a file) to regular expressions (REs). In this assignment you will implement an RE parser that will produce an NDFA, and you will implement the regular expression parser top level program, named `rep', that inputs RE's and invokes the parser to convert RE's to NDFA's, and that then inputs strings and invokes the NDFA run program of assignment 5 to match the strings against the current RE. To start this assignment: Make an asst6 directory. cd cp -p /* . cp -p ~lib51/asst6/* . Thus you will start with your asst5 files. Asst6 adds the following files which you must finish: parser.h Interface to the RE parser. parser.cc The RE parser. rep.cc The main program. In addition, asst6 contains the files: rep.sp Specification for the rep program. rep.in Input to test rep. Makefile The asst6 version replaces the asst5 version. The specific things you MUST do are: (0) Read the handouts, including the Parsing and Semantic Algorithms viewgraphs, the Parsing Exercises handout, and the rep.sp file containing the rep program specification. Also reread Section 1.8, Grammars, in the Introduction to Symbolic Computing book. You may optionally read Project Book Chapter 8, particularly Sections 8.4, 8.5, and 8.6.1. This is more complete in some ways than the other readings, but is not quite up-to-date with our code and terminology, or with the formal definition of recursive descent parsing. Also read the code files: parser.h, parser.cc, and rep.cc. (1) For the theory part of the assignment, do the following exercises: From the Introduction to Symbolic Programming book: 1.8.1(b) (mislabeled 1.8.0.5(b) on page 99) 1.8.2(b) (mislabeled 1.8.0.6(b) on page 101) From the Parsing Exercises handout: 1(d) 3(Af4) 3(Bf4) Remember to staple this part separately as it will be graded separately from the code. (2) Decide on the interface to your parser. Declare and document the interface in parser.h. This interface INCLUDES the interface to the your lexical analyzer, even though you may chose to implement the lexical analyzer in rep.cc. Document the interface to all functions and data clearly with comments. (3) Compute FIRST (RF) and type it into the beginning of parser.cc. (4) Using the specification in rep.sp, figure out what helper functions the main function in rep.cc will require. Declare and document these at the beginning of rep.cc, but do not write the code. Declare all data shared by rep.cc functions at the beginning of rep.cc. Document the interface to all functions and data clearly with comments. (5) Write the pseudo-code for main in rep.cc into that file, just before the main function. If you have a helper function that does a lot of work (e.g. one to do all processing on one or more strings), write the pseudo-code for that function in front of that function's actual code. You need not do this for functions of 5 SLOCS or less. In writing pseudo-code, you must clearly indicate all calls to functions other than small helper functions inside rep.cc. E.g. you should indicate calls to ein.getline. (6) Steps (2), (3), (4), and (5) above are the design part of this assignment. You must follow the specifications in rep.sp very carefully while doing this design. (7) Write the code in parser.cc and rep.cc. You must indent code properly. You MUST use ein.getline to get one line of input at a time. You MUST use ein.eof to check for an end of file on every yes-no input. (8) Test the code by running: make and closely studying the results in rep.out. You may also run make trace and examine rep-trace.out. Use gdb as appropriate. The theory part is worth 15 points, the design part is worth 5 points, and the code is worth 40 points. The following are the SLOC counts for a well written solution. In C++ a SLOC is defined as a non-comment line containing a semi-colon. parser.h 5 parser.cc 27 rep.cc 36 0.6 times these SLOC counts will be approximately the number of points that will be deducted if you do NOT do a function. Hints: (1) There is one place in writing the parser where you need to declare an ndfa variable but do not have a value to give the variable. The declaration ndfa foo; does NOT work, as there is no constructor for ndfa's that takes zero arguments. The declaration ndfa foo = ndfa (NULL, NULL); will work. (2) The lexical analyzer should be VERY simple. A lexeme is a character. The current line is the vector of all lexemes for the current RE. The `\0' character is the end-of-input lexeme. You will have a chance to write a complex lexical analyzer in another assignment. (3) Use ein.getline for all input. Use ein.eof to check whether the last ein.getline encountered an end of file. (4) To test whether the first UNIX argument to the rep program is "trace", you must test that argc >= 2 and strcmp ( argv[1], "trace" ) == 0 The first UNIX argument is the second argv string; the first argv string (argv[0]) is name of the program being executed. (5) To use the long jump feature of C/C++ and error.cc, write a piece of code of the form if ( ! setjmp ( error_jmp ) ) { // Do SOMETHING } else { // Come here if error is called while // doing SOMETHING. } When setjmp is called, it returns false (0). If longjmp is then called, setjmp returns a second time, returning the value true (1) this second time. All executing functions between the call to longjmp and the function that called setjmp are terminated. Write a small piece of the code that uses setjmp and tests its functionality before writing the rest of the code. This will get you familiar with setjmp in a program small enough that nothing else can be going wrong. Good Luck!