CPSC 460/560
Project Part 1: scan.c, minlex.l
Writing a Lexical Analyzer with Lex
Lexical Analysis
This is first part of the compiler development project that we will be
doing this semester. Your job is to build a compiler for the
minl programming language, starting with the lexical
analyzer. Minl is a minimal
programming language containing just enough high level features to
write useful code. (Remember that this is an approach to building a
compiler for a full version of a language.)
You are to write a Lex specification for minl using the following
set of rules for the tokens in this language.
-
All keywords are reserved words. Minl keywords
consist of the following: program, begin, end, if, else,
while, int, out, then, endif. The corresponding symbolic tokens
are: PROGSYM, BEGINSYM, ENDSYM, IFSYM, ELSESYM, WHILESYM,
INTSYM, OUTSYM, THENSYM, ENDIFSYM.
- The following delimiters are used in minl (along with
their symbolic tokens):
| ; | SEMISYM
|
| , | COMMANSYM
|
| ( | LPARSYM
|
| ) | RPARSYM
|
| == | EQSYM
|
| < | LTSYM
|
| + | PLUSSYM
|
| * | MULTSYM
|
| = | ASGNSYM
|
| - | MINUSSYM
|
- Comments start with // and end with the newline
character.
- An identifier is a sequence of upper or lower case letters,
digits, or underline, but it must begin with a letter. The
minl language is case-sensitive. The symbolic token
used for an identifier is IDSYM.
- An integer constant is an unsigned sequence of digits
representing a base 10 number. The symbolic token for an integer
constant is ICONSTSYM.
- All tokens are separated by blanks, tabs, newlines, comments, or
delimiters.
Scanner - Parser Interface
Recall that the scanner and parser communicate via the returned token,
a global variable, and the string table. Name the global variable
yylval and declare it to be an integer. For integer
constants, the value of yylval is the value of the integer; for
identifiers, it is a unique number identifying the actual lexeme.
The scanner is typically implemented as a function that is, in turn,
called by the parser. Unfortunately, the parser is part 2 of the
assignment and thus does not exist yet. Therefore, you will need to
write a temporary driver function that calls the lexical analyzer and
prints each token (symbolic name) along with its value as the input
is scanned. For identifiers, the value is the actual lexeme.
The token and the value are to be separated by a single tab character.
Sample Output
If the input file looks like:
if(if1<99)if1=0;
then the output should look like:
IFSYM if
LPARSYM (
IDSYM if1
LTSYM <
ICONSTSYM 99
RPARSYM )
IDSYM if1
ASGNSYM =
ICONSTSYM 0
SEMISYM ;
To hand in
You will need two files, your program that calls your lexical analyzer
(name that file scan.c), and your lex file
(name that file minlex.l).