;;;; Assignment 5
;;;;
;;;; File:         asst5.txt
;;;; Author:       CS 51 (Bob Walton)
;;;; Version:      2

This is the first of two assignments in which you will
program the heart and soul of the UNIX egrep(1) program:
an algorithm for matching character strings (e.g. lines
of a file) to regular expressions (REs).

The egrep algorithm translates REs into NDFAs, which are
like little computers whose goal in life is to match
character strings.  The NDFAs then run on input strings
to determine whether or not there is a match.

In this assignment you implement NDFAs.  In the next you
will implement an RE to NDFA translator and a top level
program.

For this assignment you are given:

    ndfa.h	Data and function declarations and
   		some inline code for implementing
		NDFAs.

    ndfa_test.cc
		An extensive test program for your
		NDFA implementation.

You are asked to complete (write most of):

    ndfa.cc	Implements the functions declared
		in ndfa.h.


The specific things you MUST do are:

(0) Read the handouts, including the Regular Expressions
    and Finite Automata viewgraphs, and the ndfa.pd
    file with the NDFA preliminary design.  Read
    the Regular Expression and Non-Deterministic Finite
    Automaton Exercises handout.

    You may optionally read Project Book Section 8.6.1,
    Regular Expressions and Pattern-Matching.  This is
    a more complete description, but is not quite
    up-to-date with our code and terminology.

(1) For the theory part of the assignment, do the
    following exercises from the Regular Expression and
    Non-Deterministic Finite Automaton Exercises
    handout:

	2(f)
	3(Af4)
	3(Bf4)
	3(Ad4)

    Remember to staple this part separately as it will
    be graded separately from the code.

(2) You are asked to define your helper functions
    clearly with comments, and write pseudo-code for the
    `run' function in ndfa.cc.  Do this next.  Your
    pseudo-code should reference your helper functions
    where they are to be called, and should also
    reference any functions declared in ndfa.h where
    they are called.

    You are required to implement the trace option in
    the `run' function.  Include this in your
    pseudo-code.  (See below for definition of trace.)

    There is pseudo-code in the Regular Expression and
    Non-Deterministic Finite Automaton Exercises handout
    that is good starting point.  In effect all you are
    being asked to do is transcribe this to use your
    helper functions and the ndfa.h declarations
    terminology and to include tracing.

    This is the design part of this assignment, and only
    requires editing ndfa.cc.

(3) The trace should do the following:

	At the beginning of a run, print a start message
	and the indices of the initial and final state.

	At the beginning of each step (loop iteration),
	print the step number (1, 2, ...), the current
	input character or `end of string' if the
	program has run out of input (input character is
	'\0'), and the list of indices of the currently
	active states (those in current_state_set).

	Whenever a new state is added to the next state
	set using a rule that transitions from state s1
	to state s2, print the line:

		s1 ---> s2

	Whenever a new state is added to the current
	state set using a rule that transitions from
	state s1 to state s2, print the line:

		s1 === s2

	At the end of the run print the returned value.

	Print blank lines where appropriate to make a
	more readable trace.

(4) Write your C++ code in ndfa.cc.

    You MUST indent your code in a proper readable
    fashion.  We recommend the vi settings

	:set autoindent showmatch shiftwidth=4

    which can be obtained by the .login file line

	setenv EXINIT ':set ai sm sw=4'

    which must replace any similar line you used for
    LISP.  This will give indentations that are
    multiples of 4 spaces.  Learn the vi commands
    << and >> that shift blocks of code left or
    right by the shiftwidth.

    Use of vi in this fashion leads to the style
    of indentation followed in ndfa.h.  If you use
    EMACS you will get a different but OK style of
    indentation that uses both 2 and 4 space indents.

(5) The state table output function should:

	Have a header separated by blank lines.

	List each state in the state table on a separate
	line along with information on the state.

	For each state give the character and
	character_transition state index if the
	character_transition is used.

	For each state give the empty_transition state
	index if the empty_transition is used.

	Format the above information in an easy to read
	way: e.g. `a ---> 9', `# ---> 7'.

(6) Test the code by running:

	make

    and closely studying the results in ndfa_test.out.

    There is a powerful C++ debugger, gdb, that will
    work on this code.  Try

	gdb ndfa_test
	b ndfa::run
	run
	list
	p s
	p trace
	n
	n
	n
	n
	n
	p current_state_set

    where the last may be different if you have used a
    different variable name than current_state_set.  See
    gdb.help.



The theory part of the problem set is worth 15 points,
the design part is worth 5 points, and the code is worth
40 points.

The following are the SLOC counts for a well written
solution.  In C++ a SLOC is defined as a non-comment
line containing a semi-colon.

	single_character	4
	concatenate		2
	alternate		7
	star			4
	plus			4
	run helpers		7 (3 in functions)
	run			38
	operator <<		5

These will also be the number of points that will be
deducted if you do NOT do a function (they do NOT sum to
40).



Hints:

(1) Maintain the following invariants:

	For any ndfa x that has not been used to build a
	larger ndfa, x.final.empty_transition is unused.

	After an ndfa x is used to build a larger ndfa,
	x is never used again.

    Then when connecting an ndfa x to another ndfa you
    can always use x.on_empty_goto (...).

(2) Carefully single step through the first executions
    of run using gdb, to see what is really going on
    with your code.  Have a printout of your code handy.

(3) Spend time looking at a printout of your code and
    comparing it with the Regular Expression and
    Non-Deterministic Finite Automaton Exercises handout
    pseudo-code to see if there is any discrepancy you
    do not want.

Good Luck!