K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 54
Текст из файла (страница 54)
The ad hoc framework provides no real advantagefor this problem.ProductionExprTermSyntax-Directed Actions→ Expr − Term{ $$ ← F+ ($1,$3) }|Expr − Term{ $$ ← F− ($1,$3) }|Term{ $$ ← $1 }→ Term × Factor{ $$ ← F× ($1,$3) }|Term ÷ Factor{ $$ ← F÷ ($1,$3) }|Factor{ $$ ← $1 }Factor → ( Expr ){ $$ ← $2 }|num{ $$ ← type of the num }|name{ $$ ← type of the name }n FIGURE 4.13 Ad Hoc Framework for Inferring Expression Types.4.4 Ad Hoc Syntax-Directed Translation 205ProductionExprTermSyntax-Directed Actions→ Expr + Term{ $$ ← MakeNode2 (plus, $1, $3);$$.type ← F+ ($1.type, $3.type) }|Expr − Term{ $$ ← MakeNode2 (minus, $1, $3);$$.type ← F− ($1.type,$3.type) }|Term{ $$ ← $1 }→ Term × Factor{ $$ ← MakeNode2 (times, $1, $3);$$.type ← F× ($1.type, $3.type) }|Term ÷ Factor{ $$ ← MakeNode2 (divide, $1, $3);$$.type ← F÷ ($1.type, $3.type) }|Factor{ $$ ← $1 }Factor → ( Expr ){ $$ ← $2 }|num{ $$ ← MakeNode0 (number);$$.text ← scanned text;$$.type ← type of the number }|name{ $$ ← MakeNode0 (identifier);$$.text ← scanned text;$$.type ← type of the identifier }n FIGURE 4.14 Building an Abstract Syntax Tree and Inferring Expression Types.Building an Abstract Syntax TreeCompiler front ends must build an intermediate representation of the program for use in the compiler’s middle part and its back end.
Abstract syntaxtrees are a common form of tree-structured ir. The task of building an astfits neatly into an ad hoc syntax-directed translation scheme.Assume that the compiler has a series of routines named MakeNodei , for0 ≤ i ≤ 3. The routine takes, as its first argument, a constant that uniquelyidentifies the grammar symbol that the new node will represent. The remaining i arguments are the nodes that head each of the i subtrees. Thus,MakeNode0 (number) constructs a leaf node and marks it as representinga num.
Similarly,MakeNode2 (Plus,MakeNode0 (number,) MakeNode0 (number))builds an ast rooted in a node for plus with two children, each of which isa leaf node for num.The MakeNode routines can implement thetree in any appropriate way. For example, theymight map the structure onto a binary tree, asdiscussed in Section B.3.1.206 CHAPTER 4 Context-Sensitive AnalysisTo build an abstract syntax tree, the ad hoc syntax-directed translationscheme follows two general principles:1. For an operator, it creates a node with a child for each operand. Thus,2 + 3 creates a binary node for + with the nodes for 2 and 3 as children.2.
For a useless production, such as Term → Factor, it reuses the resultfrom the Factor action as its own result.In this manner, it avoids building tree nodes that represent syntactic variables, such as Factor, Term, and Expr. Figure 4.14 shows a syntax-directedtranslation scheme that incorporates these ideas.Generating ILOC for ExpressionsAs a final example of manipulating expressions, consider an ad hocframework that generates iloc rather than an ast.
We will make severalsimplifying assumptions. The example limits its attention to integers;handling other types adds complexity, but little insight. The example alsoassumes that all values can be held in registers—both that the values fit inregisters and that the iloc implementation provides more registers than thecomputation will use.Code generation requires the compiler to track many small details.
Toabstract away most of these bookkeeping details (and to defer some deeperissues to following chapters), the example framework uses four supportingroutines.1. Address takes a variable name as its argument. It returns the number ofa register that contains the value specified by name. If necessary, itgenerates code to load that value.2. Emit handles the details of creating a concrete representation for thevarious iloc operations. It might format and print them to a file.Alternatively, it might build an internal representation for later use.3.
NextRegister returns a new register number. A simple implementationcould increment a global counter.4. Value takes a number as its argument and returns a register number. Itensures that the register contains the number passed as its argument. Ifnecessary, it generates code to move that number into the register.Figure 4.15 shows the syntax-directed framework for this problem. Theactions communicate by passing register names in the parsing stack. Theactions pass these names to Emit as needed, to create the operations thatimplement the input expression.4.4 Ad Hoc Syntax-Directed Translation 207ProductionExprSyntax-Directed Actions→ Expr + Term{ $$ ← NextRegister;Emit(add, $1, $3, $$) }| Expr − Term{ $$ ← NextRegister;Emit(sub, $1, $3, $$) }| Term{ $$ ← $1 }Term → Term × Factor{ $$ ← NextRegister;Emit(mult, $1, $3, $$) }| Term ÷ Factor{ $$ ← NextRegister;Emit(div, $1, $3,$$) }| Factor{ $$ ← $1 }Factor → ( Expr ){ $$ ← $2 }| num{ $$ ← Value(scanned text); }| name{ $$ ← Address(scanned text); }n FIGURE 4.15 Emitting ILOC for Expressions.Processing DeclarationsOf course, the compiler writer can use syntax-directed actions to fill in muchof the information that resides in the symbol table.
For example, the grammar fragment shown in Figure 4.16 describes a limited subset of the syntaxfor declaring variables in c. (It omits typedefs, structs, unions, the typequalifiers const, restrict, and volatile, as well as the details of theinitialization syntax. It also leaves several nonterminals unelaborated.) Consider the actions required to build symbol-table entries for each declaredvariable.
Each Declaration begins with a set of one or more qualifiers thatspecify the variable’s type and storage class. These qualifiers are followedby a list of one or more variable names; each variable name can includespecifications about indirection (one or more occurrences of *), about arraydimensions, and about initial values for the variable.For example, the StorageClass production allows the programmer to specifyinformation about the lifetime of a variable’s value; an auto variable has alifetime that matches the lifetime of the block that declares it, while staticvariables have lifetimes that span the program’s entire execution.
The register specifier suggests to the compiler that the value should be kept in alocation that can be accessed quickly—historically, a hardware register. Theextern specifier tells the compiler that declarations of the same name indifferent compilation units are to be linked as a single object.208 CHAPTER 4 Context-Sensitive AnalysisDeclarationList→|DeclarationSpecifierList→→Specifier→StorageClass→|||||TypeSpecifier→||||||||InitDeclaratorList→|InitDeclarator→Declarator→Pointer→|||DirectDeclarator→||||||DeclarationList DeclarationDeclarationSpecifierList InitDeclaratorList ;Specifier SpecifierListSpecifierStorageClassTypeSpecifierautostaticexternregistervoidcharshortintlongsignedunsignedfloatdoubleInitDeclaratorList , InitDeclaratorInitDeclaratorDeclarator = InitializerDeclaratorPointer DirectDeclaratorDirectDeclarator** Pointerident( Declarator )DirectDeclarator ( )DirectDeclarator ( ParameterTypeList )DirectDeclarator ( IdentifierList )DirectDeclarator [ ]DirectDeclarator [ ConstantExpr ]n FIGURE 4.16 A Subset of C’s Declaration Syntax.While such restrictions can be encoded in thegrammar, the standard writers chose to leave itfor semantic elaboration to check, rather thancomplicate an already large grammar.The compiler must ensure that each declared name has at most one storageclass attribute.
The grammar places the specifiers before a list of one or morenames. The compiler can record the specifiers as it processes them and applythem to the names when it later encounters them. The grammar admits anarbitrary number of StorageClass and TypeSpecifier keywords; the standardlimits the ways that the actual keywords can be combined. For example,it allows only one StorageClass per declaration. The compiler must enforce4.4 Ad Hoc Syntax-Directed Translation 209WHAT ABOUT CONTEXT-SENSITIVE GRAMMARS?Given the progression of ideas from the previous chapters, it might seemnatural to consider the use of context-sensitive languages to performcontext-sensitive checks, such as type inference.
After all, we used regular languages to perform lexical analysis and context-free languages toperform syntax analysis. A natural progression might suggest the study ofcontext-sensitive languages and their grammars. Context-sensitive grammars can express a larger family of languages than can context-freegrammars.However, context-sensitive grammars are not the right answer for two distinct reasons. First, the problem of parsing a context-sensitive grammaris P-Space complete. Thus, a compiler that used such a technique couldrun very slowly. Second, many of the important questions are difficult,if not impossible, to encode in a context-sensitive grammar. For example, consider the issue of declaration before use. To write this rule intoa context-sensitive grammar would require the grammar to encode eachdistinct combination of declared variables.
With a sufficiently small namespace (for example, Dartmouth BASIC limited the programmer to singleletter names, with an optional single digit), this might be manageable; in amodern language with a large name space, the set of names is too large toencode in a context-sensitive grammar.this restriction through context-sensitive checking. Similar restrictions applyto TypeSpecifiers. For example, short is legal with int but not with float.To process declarations, the compiler must collect the attributes from thequalifiers, add any indirection, dimension, or initialization attributes, andenter the variable in the table. The compiler writer might set up a propertiesstructure whose fields correspond to the properties of a symbol-table entry.At the end of a Declaration, it can initialize the values of each field in thestructure.