//// Specifications for get_token and backup_token
//// in scanner.cc
////
//// File:              scanner.sp
//// Author:            course
//// Version:           3
////


///////////////////////////////////////////////////////
//
// Function:            get_token
// Argument type:       istream &
// Return value type:   token
// Assumes:             information in scanner.h
//

Get_token() will read characters from its argument input
stream, and output an instance of the token class.  By
"the token string" we mean the string of characters that
represents the token in the input stream, excluding any
preceding white space or comments, and by "the token
instance" we mean the instance of the token class
returned by get_token.

A space character is defined as a character for which
the C function isspace() returns true (see <ctype.h>).

A symbolic character is defined as any character that
can occur in a Common LISP symbol name without being
escaped (see below).  For convenience, this is the same
as any character for which the C++ function symbolic()
returns true (see scanner.cc), except that symbolic()
also returns true for `|'.

Characters are read either in normal mode or in escaped
mode.  The vertical bar character `|' toggles the mode;
e. g. in the symbol name x|;|yabc|d|ef the characters
`;' and `d' are read in escaped mode.  A character read
in escaped mode is said to be an escaped character:
e. g. an escaped `;' or escaped `d'.  Note that all
lowercase letters are converted to uppercase by the
scanner unless they are escaped.

An atom representative is defined as a sequence of
consecutive atom characters, where an atom character is
a symbolic character or an escaped character or the
character `|'.  Thus the atom representative x|;|d has
five atom characters, the middle one being the escaped
character `;'.  When such an atom representative is
converted to a symbol, the `|' characters are deleted
and unescaped lower case letters are converted to upper
case.


(1) Characters will be read from the input stream s by
    c = s.get() or c = s.peek() and at most one gotten
    character may be put back by s.putback(c).  End of
    file will be detected by testing c == EOF (where EOF
    is a standard defined constant equal to -1).  Note
    that EOF is not an ASCII character and cannot be
    stored in a character string.  It also cannot be
    putback using s.putback(c), but it is desireable to
    simulate putting back an EOF by other means.

    Thus an input stream is a string of ASCII characters
    followed by an infinite sequence of EOF's.

(2) A token string will not contain unescaped spaces.
    Any unescaped spaces encountered in the input stream
    will be ignored.  Note, however, that unescaped
    spaces delimit a token string.

(3) An unescaped semi-colon `;' and all characters
    following it through the next line feed `\n' or end
    of file EOF, will be treated as a single unescaped
    space character, i. e. a comment.  Note that since
    comments are treated as a single unescaped space,
    they may also delimit token strings.

(4) The following single unescaped characters will be
    treated as complete one character token strings and
    the token instance will have the indicated
    token_type:

        (       LPAREN_TOKEN
        )       RPAREN_TOKEN
        ]       RBRACKET_TOKEN
        '       QUOTE_TOKEN

    For example, upon reading the (unescaped) character
    `(', get_token should return an instance of the
    token LPAREN_TOKEN.

(5) An EOF will be treated like a complete one character
    token string and the token instance will have the
    EOF_TOKEN token_type.

(6) The following pair of consecutive unescaped
    characters will be treated as a complete two
    character token string and the token instance will
    have the indicated token_type:

        #'      FNQUOTE_TOKEN

(7) A token string representing an atom begins with a
    symbolic character other than `#' or with the
    vertical bar character `|'.  The next non-escaped,
    non-symbolic, non-`|' character will end the token
    string and not be part of the token string.  The
    token instance may be empty, i.e. have no
    characters; e.g. on the input `||' or `||||'.

(8) If the token string is the atom `.', the token
    instance will have the token_type DOT_TOKEN.  Note
    `.' is a dot token, but `||.' and `.||' are symbols.
    
(9) If the token string is an atom consisting of an
    optional `+' or `-' sign followed by digits, with
    nothing else, and with at least one digit, then the
    token instance will have the token_type
    NUMBER_TOKEN.  Note that `||9' and `+998||' are
    symbols and not numbers.

(10) If the token string is an atom and the token
     instance does not have a token_type defined by rule
     (8) or (9), then the token instance will have the
     token_type SYMBOL_TOKEN.

(11) The # character cannot begin any token except #'.
     Thus #|| is NOT a legal symbol token.  However, ||#
     is a legal symbol token, and # can appear in an
     atom token string anywhere EXCEPT at the beginning.

(12) If any character is encountered in the input stream
     that cannot be processed according to rules
     (1)-(11), the character will be treated as a one
     character token string and the token instance will
     have the type ERROR_TOKEN.

     Thus in #x the # will be a one character token of
     type ERROR_TOKEN.  The x will be part of the next
     token.

     Similarly any unescaped " or ` will be a one
     character token of type ERROR_TOKEN.  (Our system
     does not handle these.)

(13) If an ESCAPED end of file is encountered while
     scanning an atom, the entire atom will be treated
     as a token instance of type ERROR_TOKEN.

(14) If the token instance has the token_type
     NUMBER_TOKEN, then the token instance will have a
     value component computed by first applying the C
     function atol() to the token string (see
     <stdlib.h>), then calling make_fixnum() on the
     result.

(15) If the token instance has the token_type
     SYMBOL_TOKEN, then the token instance will have a
     value component computed by applying make_symbol()
     to the token string after `|''s have been removed
     and after unescaped lower case letters have been
     converted to uppercase (use the C function
     toupper() in <ctype.h>).

(16) If the value component of the token instance cannot
     be determined according to rule (14) or (15), it
     will be undefined.

(17) If get_token() finds a token string (which it will
     always do, possibly finding an EOF), then it may
     read past the end of the token string by at most
     one character, and put back that one character into
     the input stream.  Putting back EOF's needs to be
     simulated somehow: since every EOF is followed by
     an EOF, all that is necessary is to avoid calling
     s.putback(c) if c == EOF.

(18) Get_token() will always save the token instance it
     is about to return in a static storage location for
     use by backup_token below.


///////////////////////////////////////////////////////
//
// Function:            backup_token
// Argument type:       none
// Return value type:   none
// Assumes:             information in scanner.h
//

When backup_token() is called, the next call to
get_token() will not read any characters from the input
stream.  Instead, get_token() will return the token
instance it returned the last time it was called (see
rule (18) above).

Only the immediately next call to get_token() will be
affected by a call to backup_token().  The effect of
calling backup_token() more than once between calls to
get_token() is undefined.

Neither backup_token() nor the subsequent call to
get_token() will operate on the input stream.