;;;; Assignment 5 ;;;; ;;;; File: asst5.txt ;;;; Author: CS 51 (Bob Walton) ;;;; Version: 2 This is the first of two assignments in which you will program the heart and soul of the UNIX egrep(1) program: an algorithm for matching character strings (e.g. lines of a file) to regular expressions (REs). The egrep algorithm translates REs into NDFAs, which are like little computers whose goal in life is to match character strings. The NDFAs then run on input strings to determine whether or not there is a match. In this assignment you implement NDFAs. In the next you will implement an RE to NDFA translator and a top level program. For this assignment you are given: ndfa.h Data and function declarations and some inline code for implementing NDFAs. ndfa_test.cc An extensive test program for your NDFA implementation. You are asked to complete (write most of): ndfa.cc Implements the functions declared in ndfa.h. The specific things you MUST do are: (0) Read the handouts, including the Regular Expressions and Finite Automata viewgraphs, and the ndfa.pd file with the NDFA preliminary design. Read the Regular Expression and Non-Deterministic Finite Automaton Exercises handout. You may optionally read Project Book Section 8.6.1, Regular Expressions and Pattern-Matching. This is a more complete description, but is not quite up-to-date with our code and terminology. (1) For the theory part of the assignment, do the following exercises from the Regular Expression and Non-Deterministic Finite Automaton Exercises handout: 2(f) 3(Af4) 3(Bf4) 3(Ad4) Remember to staple this part separately as it will be graded separately from the code. (2) You are asked to define your helper functions clearly with comments, and write pseudo-code for the `run' function in ndfa.cc. Do this next. Your pseudo-code should reference your helper functions where they are to be called, and should also reference any functions declared in ndfa.h where they are called. You are required to implement the trace option in the `run' function. Include this in your pseudo-code. (See below for definition of trace.) There is pseudo-code in the Regular Expression and Non-Deterministic Finite Automaton Exercises handout that is good starting point. In effect all you are being asked to do is transcribe this to use your helper functions and the ndfa.h declarations terminology and to include tracing. This is the design part of this assignment, and only requires editing ndfa.cc. (3) The trace should do the following: At the beginning of a run, print a start message and the indices of the initial and final state. At the beginning of each step (loop iteration), print the step number (1, 2, ...), the current input character or `end of string' if the program has run out of input (input character is '\0'), and the list of indices of the currently active states (those in current_state_set). Whenever a new state is added to the next state set using a rule that transitions from state s1 to state s2, print the line: s1 ---> s2 Whenever a new state is added to the current state set using a rule that transitions from state s1 to state s2, print the line: s1 === s2 At the end of the run print the returned value. Print blank lines where appropriate to make a more readable trace. (4) Write your C++ code in ndfa.cc. You MUST indent your code in a proper readable fashion. We recommend the vi settings :set autoindent showmatch shiftwidth=4 which can be obtained by the .login file line setenv EXINIT ':set ai sm sw=4' which must replace any similar line you used for LISP. This will give indentations that are multiples of 4 spaces. Learn the vi commands << and >> that shift blocks of code left or right by the shiftwidth. Use of vi in this fashion leads to the style of indentation followed in ndfa.h. If you use EMACS you will get a different but OK style of indentation that uses both 2 and 4 space indents. (5) The state table output function should: Have a header separated by blank lines. List each state in the state table on a separate line along with information on the state. For each state give the character and character_transition state index if the character_transition is used. For each state give the empty_transition state index if the empty_transition is used. Format the above information in an easy to read way: e.g. `a ---> 9', `# ---> 7'. (6) Test the code by running: make and closely studying the results in ndfa_test.out. There is a powerful C++ debugger, gdb, that will work on this code. Try gdb ndfa_test b ndfa::run run list p s p trace n n n n n p current_state_set where the last may be different if you have used a different variable name than current_state_set. See gdb.help. The theory part of the problem set is worth 15 points, the design part is worth 5 points, and the code is worth 40 points. The following are the SLOC counts for a well written solution. In C++ a SLOC is defined as a non-comment line containing a semi-colon. single_character 4 concatenate 2 alternate 7 star 4 plus 4 run helpers 7 (3 in functions) run 38 operator << 5 These will also be the number of points that will be deducted if you do NOT do a function (they do NOT sum to 40). Hints: (1) Maintain the following invariants: For any ndfa x that has not been used to build a larger ndfa, x.final.empty_transition is unused. After an ndfa x is used to build a larger ndfa, x is never used again. Then when connecting an ndfa x to another ndfa you can always use x.on_empty_goto (...). (2) Carefully single step through the first executions of run using gdb, to see what is really going on with your code. Have a printout of your code handy. (3) Spend time looking at a printout of your code and comparing it with the Regular Expression and Non-Deterministic Finite Automaton Exercises handout pseudo-code to see if there is any discrepancy you do not want. Good Luck!