;;;; Assignment 6
;;;;
;;;; File:         asst6.txt
;;;; Author:       CS 51 (Bob Walton)
;;;; Version:      2

This is the second of two assignments in which you
program the heart and soul of the UNIX egrep(1) program:
an algorithm for matching character strings (e.g. lines
of a file) to regular expressions (REs).

In this assignment you will implement an RE parser that
will produce an NDFA, and you will implement the regular
expression parser top level program, named `rep', that
inputs RE's and invokes the parser to convert RE's to
NDFA's, and that then inputs strings and invokes the
NDFA run program of assignment 5 to match the strings
against the current RE.

To start this assignment:

	Make an asst6 directory.

	cd <your asst6 directory>

	cp -p <your asst5 directory>/* .

	cp -p ~lib51/asst6/* .

Thus you will start with your asst5 files.  Asst6 adds
the following files which you must finish:

    parser.h	Interface to the RE parser.

    parser.cc	The RE parser.

    rep.cc	The main program.

In addition, asst6 contains the files:

    rep.sp	Specification for the rep program.

    rep.in	Input to test rep.

    Makefile	The asst6 version replaces the
		asst5 version.


The specific things you MUST do are:

(0) Read the handouts, including the Parsing and
    Semantic Algorithms viewgraphs, the Parsing
    Exercises handout, and the rep.sp file containing
    the rep program specification.  Also reread
    Section 1.8, Grammars, in the Introduction to
    Symbolic Computing book.

    You may optionally read Project Book Chapter 8,
    particularly Sections 8.4, 8.5, and 8.6.1.  This
    is more complete in some ways than the other
    readings, but is not quite up-to-date with our code
    and terminology, or with the formal definition of
    recursive descent parsing.

    Also read the code files: parser.h, parser.cc, and
    rep.cc.

(1) For the theory part of the assignment, do the
    following exercises:

    From the Introduction to Symbolic Programming book:

	1.8.1(b)  (mislabeled 1.8.0.5(b) on page 99)
	1.8.2(b)  (mislabeled 1.8.0.6(b) on page 101)

    From the Parsing Exercises handout:

	1(d)
	3(Af4)
	3(Bf4)

    Remember to staple this part separately as it will
    be graded separately from the code.

(2) Decide on the interface to your parser.  Declare and
    document the interface in parser.h.

    This interface INCLUDES the interface to the your
    lexical analyzer, even though you may chose to
    implement the lexical analyzer in rep.cc.

    Document the interface to all functions and data
    clearly with comments.

(3) Compute FIRST (RF) and type it into the beginning of
    parser.cc.

(4) Using the specification in rep.sp, figure out what
    helper functions the main function in rep.cc will
    require. Declare and document these at the beginning
    of rep.cc, but do not write the code.

    Declare all data shared by rep.cc functions at the
    beginning of rep.cc.

    Document the interface to all functions and data
    clearly with comments.

(5) Write the pseudo-code for main in rep.cc into that
    file, just before the main function.

    If you have a helper function that does a lot of
    work (e.g. one to do all processing on one or more
    strings), write the pseudo-code for that function in
    front of that function's actual code.  You need not
    do this for functions of 5 SLOCS or less.

    In writing pseudo-code, you must clearly indicate
    all calls to functions other than small helper
    functions inside rep.cc.  E.g. you should indicate
    calls to ein.getline.

(6) Steps (2), (3), (4), and (5) above are the design
    part of this assignment.

    You must follow the specifications in rep.sp very
    carefully while doing this design.


(7) Write the code in parser.cc and rep.cc.  You must
    indent code properly.  You MUST use ein.getline to
    get one line of input at a time.  You MUST use
    ein.eof to check for an end of file on every yes-no
    input.

(8) Test the code by running:

	make

    and closely studying the results in rep.out.

    You may also run

	make trace

    and examine rep-trace.out.

    Use gdb as appropriate.


The theory part is worth 15 points, the design part is
worth 5 points, and the code is worth 40 points.

The following are the SLOC counts for a well written
solution.  In C++ a SLOC is defined as a non-comment
line containing a semi-colon.

	parser.h		5

	parser.cc		27

	rep.cc			36

0.6 times these SLOC counts will be approximately the
number of points that will be deducted if you do NOT
do a function.


Hints:

(1) There is one place in writing the parser where you
    need to declare an ndfa variable but do not have a
    value to give the variable.  The declaration

	ndfa foo;

    does NOT work, as there is no constructor for
    ndfa's that takes zero arguments.  The declaration

	ndfa foo = ndfa (NULL, NULL);

    will work.

(2) The lexical analyzer should be VERY simple.  A
    lexeme is a character.  The current line is the
    vector of all lexemes for the current RE.  The
    `\0' character is the end-of-input lexeme.

    You will have a chance to write a complex lexical
    analyzer in another assignment.

(3) Use ein.getline for all input.  Use ein.eof to
    check whether the last ein.getline encountered an
    end of file.

(4) To test whether the first UNIX argument to the rep
    program is "trace", you must test that

		argc >= 2
    and
    		strcmp ( argv[1], "trace" ) == 0

    The first UNIX argument is the second argv string;
    the first argv string (argv[0]) is name of the
    program being executed.

(5) To use the long jump feature of C/C++ and error.cc,
    write a piece of code of the form

	if ( ! setjmp ( error_jmp ) )
	{
	    // Do SOMETHING
	}
	else
	{
	    // Come here if error is called while
	    // doing SOMETHING.
	}

    When setjmp is called, it returns false (0).
    If longjmp is then called, setjmp returns a
    second time, returning the value true (1) this
    second time.  All executing functions between the
    call to longjmp and the function that called
    setjmp are terminated.

    Write a small piece of the code that uses setjmp and
    tests its functionality before writing the rest of
    the code.  This will get you familiar with setjmp in
    a program small enough that nothing else can be
    going wrong.

Good Luck!