Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 93

Rice 1869

Описание файла

Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 93 страницы из PDF

Walking that chain, it finds the entry for node.next and itsoffset, 4.In laying out a structure and assigning offsets to its elements, the compilermust obey the alignment rules of the target architecture. This may force itto leave unused space in the structure. The compiler confronts this problemwhen it lays out the structure declared on the left:struct example {int fee;04feedouble fum;};1216fie20foe2428fum···Elements in Declaration Orderdouble fie;int foe;8···04fie812fum1616feefoeElements Ordered by AlignmentThe top-right drawing shows the structure layout if the compiler is constrained to place the elements in declaration order. Because fie and fummust be doubleword aligned, the compiler must insert padding after fee andfoe.

If the compiler could order the elements in memory arbitrarily, it coulduse the layout shown on the bottom left, which needs no padding. This is alanguage-design issue: the language definition specifies whether or not thelayout of a structure is exposed to the user.7.7.2 Arrays of StructuresMany programming languages allow the user to declare an array of structures. If the user is allowed to take the address of a structure-valued elementof an array, then the compiler must lay out the data in memory as multiplecopies of the structure layout. If the programmer cannot take the addressof a structure-valued element of an array, the compiler might lay out thestructure as if it were a structure composed of elements that are, themselves,arrays.

Depending on how the surrounding code accesses the data, these twostrategies may have strikingly different performance on a system with cachememory.To address an array of structures laid out as multiple copies of thestructure, the compiler uses the array-address polynomials described inSection 7.5. The overall length of the structure, including any neededpadding, becomes the element size w in the address polynomial. The polynomial generates the address of the start of the structure instance. To obtainthe value of a specific element, the element’s offset is added to the instance’saddress.7.7 Structure References 377If the compiler has laid out the structure with elements that are arrays, itmust compute the starting location of the element array using the offset-tableinformation and the array dimension.

This address can then be used as thestarting point for an address calculation using the appropriate array-addresspolynomial.7.7.3 Unions and Runtime TagsMany languages allow the programmer to create a structure with multiple, data-dependent interpretations. In c, the union construct has this effect.Pascal achieved the same effect with its variant records.Unions and variants present one additional complication. To emit code for areference to an element of a union, the compiler must resolve the reference toa specific offset.

Because a union is built from multiple structure definitions,the possibility exists that element names are not unique. The compiler mustresolve each reference to a unique offset and type in the runtime object.This problem has a linguistic solution. The programming language can forcethe programmer to make the reference unambiguous.

Consider the c declarations shown in Figure 7.13. Panel a shows declarations for two kinds ofnode, one that holds an integer value and another that holds a floating-pointvalue.The code in panel b declares a union named one that is either an n1 or an n2. Toreference an integer value, the programmer specifies u1.inode.value. Toreference a floating-point value, the programmer specifies u1.fnode.value.The fully qualified name resolves any ambiguity.struct n1 {union one {int kind;struct n1 inode;int value;struct n2 fnode;};} u1;union two {struct {int kind;int value;} inode;struct n2 {struct {int kind;int kind;float value;float value;};} fnode;} u2;(a) Basic Structures(b) Union of Structuresn FIGURE 7.13 Union Declarations in C.(c) Union of Implicit Structures378 CHAPTER 7 Code ShapeThe code in panel c declares a union named two that has the same propertiesas one.

The declaration of two explicitly declares its internal structure. Thelinguistic mechanism for disambiguating a reference to value, however, isthe same—the programmer specifies a fully qualified name.As an alternative, some systems have relied on runtime discrimination. Here,each variant in the union has a field that distinguishes it from all othervariants—a “tag.” (For example, the declaration of two, might initializekind to one for inode and to two for fnode.) The compiler can then emitcode to check the value of the tag field and ensure that each object is handled correctly. In essence, it emits a case statement based on the tag’s value.The language may require that the programmer define the tag field and itsvalues; alternatively, the compiler could generate and insert tags automatically.

In this latter case, the compiler has a strong motivation to perform typechecking and remove as many checks as possible.7.7.4 Pointers and Anonymous ValuesA c program creates an instance of a structure in one of two ways. It candeclare a structure instance, as with NilNode in the earlier example. Alternatively, the code can explicitly allocate a structure instance. For a variablefee declared as a pointer to node, the allocation would look like:fee = (struct node *) malloc(sizeof(node));The only access to this new node is through the pointer fee. Thus, we thinkof it as an anonymous value, since it has no permanent name.Because the only name for an anonymous value is a pointer, the compilercannot easily determine if two pointer references specify the same memorylocation. Consider the code fragment12345678p1 = (node *) malloc(sizeof(node));p2 = (node *) malloc(sizeof(node));if (...)then p3 = p1;else p3 = p2;p1->value = ...;p3->value = ...;...= p1->value;7.7 Structure References 379The first two lines create anonymous nodes.

Line 6 writes through p1 whileline 7 writes through p3. Because of the if-then-else, p3 can refer toeither the node allocated in line 1 or in line 2. Finally, line 8 referencesp1->value.The use of pointers limits the compiler’s ability to keep values in registers.Consider the sequence of assignments in lines 6 through 8. Line 8 reuseseither the value assigned in line 6 or the value assigned in line 7. As amatter of efficiency, the compiler should avoid storing that value to memory and reloading it. However, the compiler cannot easily determine whichvalue line 8 uses. The answer to that question depends on the value of theconditional expression in line 3.While it may be possible to know the value of the conditional expression incertain specific instances (for example, 1 > 2), it is undecidable in the generalcase.

Unless the compiler knows the value of the conditional expression,it must emit conservative code for the three assignments. It must load thevalue used in line 8 from memory, even though it recently had the value in aregister.The uncertainty introduced by pointers prevents the compiler from keeping values used in pointer-based references in registers.

Anonymous objectsfurther complicate the problem because they introduce an unbounded set ofobjects to track. As a result, statements that involve pointer-based referencesare often less efficient than the corresponding computations on unambiguouslocal values.A similar effect occurs for code that makes intensive use of arrays. Unlessthe compiler performs an in-depth analysis of the array subscripts, it maynot be able to determine whether two array references overlap.

When thecompiler cannot distinguish between two references, such as a[i,j,k] anda[i,j,l], it must treat both references conservatively. The problem of disambiguating array references, while challenging, is easier than the problemof disambiguating pointer references.Analysis to disambiguate pointer references and array references is amajor source of potential improvement in program performance. Forpointer-intensive programs, the compiler may perform an interprocedural data-flow analysis aimed at discovering, for each pointer, the set ofobjects to which it can point. For array-intensive programs, the compiler may use data-dependence analysis to understand the patterns of arrayreferences.Data-dependence analysis is beyond the scope ofthis book.

See [352, 20, 270].380 CHAPTER 7 Code ShapeSECTION REVIEWTo implement structures and arrays of structures, the compiler mustestablish a layout for each structure and must have a formula to calculatethe offset of any structure element. In a language where the declarationsdictate the relative position of data elements, structure layout simplyrequires the compiler to calculate offsets. If the language allows thecompiler to determine the relative position of the data elements, thenthe layout problem is similar to data-area layout (see Section 7.2.2). Theaddress computation for a structure element is a simple application ofthe schemes used for scalar variables (e.g.

base + offset) and for arrayelements.Two features related to structures introduce complications. If thelanguage permits unions or variant structures, then input code mustspecify the desired element in an unambiguous way. The typical solutionto this problem is the use of fully qualified names for structure elementsin a union. The second issue arises from runtime allocation of structures.The use of pointers to hold addresses of dynamically allocated objectsintroduces ambiguities that complicate the issue of which values can bekept in registers.Review Questions1. When the compiler lays out a structure, it must ensure that each element of the structure is aligned on the appropriate boundary.

Thecompiler may need to insert padding (blank space) between elementsto meet alignment restrictions. Write a set of "rules of thumb" that aprogrammer could use to reduce the likelihood of compiler-insertedpadding.2. If the compiler has the freedom to rearrange structures and arrays, itcan sometimes improve performance. What programming languagefeatures inhibit the compiler’s ability to perform such rearrangement?7.8 CONTROL-FLOW CONSTRUCTSA basic block is just a maximal-length sequence of straight-line, unpredicated code. Any statement that does not affect control flow can appearinside a block. Any control-flow transfer ends the block, as does a labelledstatement since it can be the target of a branch. As the compiler generatescode, it can build up basic blocks by simply aggregating consecutive, unlabeled, non-control-flow operations. (We assume that a labelled statement isnot labelled gratuitously, that is, every labelled statement is the target of7.8 Control-Flow Constructs 381some branch.) The representation of a basic block need not be complex.

Forexample, if the compiler has an assembly-like representation held in a simplelinear array, then a block can be described by a pair, h first,lasti, that holdsthe indices of the instruction that begins the block and the instruction thatends the block. (If the block indices are stored in ascending numerical order,an array of firsts will suffice.)To tie a set of blocks together so that they form a procedure, the compilermust insert code that implements the control-flow operations of the sourceprogram. To capture the relationships among blocks, many compilers builda control-flow graph (cfg, see Sections 5.2.2 and 8.6.1) and use it for analysis, optimization, and code generation. In the cfg, nodes represent basicblocks and edges represent possible transfers of control between blocks.Typically, the cfg is a derivative representation that contains references to amore detailed representation of each block.The code to implement control-flow constructs resides in the basic blocks—at or near the end of each block.

(In iloc, there is no fall-through caseon a branch, so every block ends with a branch or a jump. If the ir models delay slots, then the control-flow operation may not be the last operationin the block.) While many different syntactic conventions have been usedto express control flow, the number of underlying concepts is small. Thissection examines many of the control-flow constructs found in modernprogramming languages.7.8.1 Conditional ExecutionMost programming languages provide some version of an if-then-elseconstruct.

Свежие статьи
Популярно сейчас