A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition) (798439), страница 7
Текст из файла (страница 7)
We wish to simulate the straight-lineprogramming language's assignment statements without doing any side effects in theinterpreter itself. (The print statements will be accomplished by interpreter side effects,however.) The solution is to declare interpExp asclass IntAndTable {int i; Table t;IntAndTable(int ii, Table tt) {i=ii; t=tt;}}IntAndTable interpExp(Exp e, Table t) …The result of interpreting an expression e1 with table t1 is an integer value i and a new table t2.When interpreting an expression with two subexpressions (such as an OpExp), the table t2resulting from the first subexpression can be used in processing the second subexpression.PROGRAM STRAIGHT-LINE PROGRAM INTERPRETERImplement a simple program analyzer and interpreter for the straight-line programminglanguage. This exercise serves as an introduction to environments (symbol tables mappingvariable names to information about the variables); to abstract syntax (data structuresrepresenting the phrase structure of programs); to recursion over tree data structures, usefulin many parts of a compiler; and to a functional style of programming without assignmentstatements.It also serves as a "warm-up" exercise in Java programming.
Programmers experienced inother languages but new to Java should be able to do this exercise, but will needsupplementary material (such as textbooks) on Java.Programs to be interpreted are already parsed into abstract syntax, as described by the datatypes in Program 1.5.However, we do not wish to worry about parsing the language, so we write this program byapplying data constructors:Stm prog =new CompoundStm(new AssignStm("a",new OpExp(new NumExp(5),OpExp.Plus, new NumExp(3))),new CompoundStm(new AssignStm("b",new EseqExp(new PrintStm(new PairExpList(new IdExp("a"),new LastExpList(new OpExp(new IdExp("a"),21OpExp.Minus,new NumExp(1))))),new OpExp(new NumExp(10), OpExp.Times,new IdExp("a")))),new PrintStm(new LastExpList(new IdExp("b")))));Files with the data type declarations for the trees, and this sample program, are available inthe directory $MINIJAVA/chap1.Writing interpreters without side effects (that is, assignment statements that update variablesand data structures) is a good introduction to denotational semantics and attribute grammars,which are methods for describing what programming languages do.
It's often a usefultechnique in writing compilers, too; compilers are also in the business of saying whatprogramming languages do.Therefore, in implementing these programs, never assign a new value to any variable orobject field except when it is initialized. For local variables, use the initializing form ofdeclaration (for example, int i=j+3;)andfor each class, make a constructor function (like theCompoundStm constructor in Program 1.5).1.
Write a Java function int maxargs(Stm s) that tells the maximum number ofarguments of any print statement within any subexpression of a given statement. Forexample, maxargs(prog) is 2.2. Write a Java function void interp(Stm s) that "interprets" a program in thislanguage. To write in a "functional programming" style - in which you never use anassignment statement - initialize each local variable as you declare it.Your functions that examine each Exp will have to use instanceof to determine whichsubclass the expression belongs to and then cast to the proper subclass.
Or you can addmethods to the Exp and Stm classes to avoid the use of instanceof.For part 1, remember that print statements can contain expressions that contain other printstatements.For part 2, make two mutually recursive functions interpStm and interpExp. Represent a"table", mapping identifiers to the integer values assigned to them, as a list of id × int pairs.class Table {String id; int value; Table tail;Table(String i, int v, Table t) {id=i; value=v; tail=t;}}Then interpStm is declared asTable interpStm(Stm s, Table t)taking a table t1 as argument and producing the new table t2 that's just like t1 except that someidentifiers map to different integers as a result of the statement.For example, the table t1 that maps a to 3 and maps c to 4, which we write {a ↦ 3, c ↦ 4} inmathematical notation, could be represented as the linked list.22Now, let the table t2 be just like t1, except that it maps c to 7 instead of 4. Mathematically, wecould write,t2 = update (t1, c, 7),where the update function returns a new table {a ↦ 3, c ↦ 7}.On the computer, we could implement t2 by putting a new cell at the head of the linked list:, as long as we assume that the first occurrence of c in the listtakes precedence over any later occurrence.Therefore, the update function is easy to implement; and the corresponding lookup functionint lookup(Table t, String key)just searches down the linked list.
Of course, in an object-oriented style, int lookup(Stringkey) should be a method of the Table class.Interpreting expressions is more complicated than interpreting statements, becauseexpressions return integer values and have side effects. We wish to simulate the straight-lineprogramming language's assignment statements without doing any side effects in theinterpreter itself. (The print statements will be accomplished by interpreter side effects,however.) The solution is to declare interpExp asclass IntAndTable {int i; Table t;IntAndTable(int ii, Table tt) {i=ii; t=tt;}}IntAndTable interpExp(Exp e, Table t) …The result of interpreting an expression e1 with table t1 is an integer value i and a new table t2.When interpreting an expression with two subexpressions (such as an OpExp), the table t2resulting from the first subexpression can be used in processing the second subexpression.23Chapter 2: Lexical Analysislex-i-cal: of or relating to words or the vocabulary of a language as distinguished from itsgrammar and constructionWebster's DictionaryOVERVIEWTo translate a program from one language into another, a compiler must first pull it apart andunderstand its structure and meaning, then put it together in a different way.
The front end ofthe compiler performs analysis; the back end does synthesis.The analysis is usually broken up intoLexical analysis: breaking the input into individual words or "tokens";Syntax analysis: parsing the phrase structure of the program; andSemantic analysis: calculating the program's meaning.The lexical analyzer takes a stream of characters and produces a stream of names, keywords,and punctuation marks; it discards white space and comments between the tokens. It wouldunduly complicate the parser to have to account for possible white space and comments atevery possible point; this is the main reason for separating lexical analysis from parsing.Lexical analysis is not very complicated, but we will attack it with highpowered formalismsand tools, because similar formalisms will be useful in the study of parsing and similar toolshave many applications in areas other than compilation.2.1 LEXICAL TOKENSA lexical token is a sequence of characters that can be treated as a unit in the grammar of aprogramming language.
A programming language classifies lexical tokens into a finite set oftoken types. For example, some of the token types of a typical programming language areTypeExamplesfoon14 lastID73 0 00 515 082NUM66.1 .5 10. 1e67 5.5e-10REALifIFCOMMA ,NOTEQ !=LPAREN (RPAREN )Punctuation tokens such as IF, VOID, RETURN constructed from alphabetic characters arecalled reserved words and, in most languages, cannot be used as identifiers.24Examples of nontokens are/* try again */comment#include<stdio.h>preprocessor directive#define NUMS 5, 6preprocessor directiveNUMSmacroblanks, tabs, and newlinesIn languages weak enough to require a macro preprocessor, the preprocessor operates on thesource character stream, producing another character stream that is then fed to the lexicalanalyzer. It is also possible to integrate macro processing with lexical analysis.Given a program such asfloat match0(char *s) /* find a zero */{if (!strncmp(s, "0.0", 3))return 0.;}the lexical analyzer will return the streamFLOATLBRACECOMMARETURNID(match0)LPARENCHARSTARID(s)RPARENIFLPARENBANGID(strncmp)LPARENID(s)STRING(0.0)COMMANUM(3)RPARENRPARENREAL(0.0)SEMIRBRACEEOFwhere the token-type of each token is reported; some of the tokens, such as identifiers andliterals, have semantic values attached to them, giving auxiliary information in addition to thetoken-type.How should the lexical rules of a programming language be described? In what languageshould a lexical analyzer be written?We can describe the lexical tokens of a language in English; here is a description of identifiersin C or Java:An identifier is a sequence of letters and digits; the first character must be a letter.
Theunderscore _ counts as a letter. Upper- and lowercase letters are different. If the input streamhas been parsed into tokens up to a given character, the next token is taken to include thelongest string of characters that could possibly constitute a token. Blanks, tabs, newlines, andcomments are ignored except as they serve to separate tokens. Some white space is requiredto separate otherwise adjacent identifiers, keywords, and constants.And any reasonable programming language serves to implement an ad hoc lexer. But we willspecify lexical tokens using the formal language of regular expressions, implement lexersusing deterministic finite automata, and use mathematics to connect the two.
This will lead tosimpler and more readable lexical analyzers.2.2 REGULAR EXPRESSIONSLet us say that a language is a set of strings; a string is a finite sequence of symbols. Thesymbols themselves are taken from a finite alphabet.25The Pascal language is the set of all strings that constitute legal Pascal programs; the languageof primes is the set of all decimal-digit strings that represent prime numbers; and the languageof C reserved words is the set of all alphabetic strings that cannot be used as identifiers in theC programming language. The first two of these languages are infinite sets; the last is a finiteset.