K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 62
Текст из файла (страница 62)
Since a compiler spends most of its time manipulating the ir form ofthe code, these costs deserve some attention. While this discussion focuseson three-address codes, most of the points apply equally to stack-machinecode (or any other linear form).t1t2t3t4t5←←←←←2bt1 × t2at4 - t3Three-Address CodeThree-address codes are often implemented as a set of quadruples. Eachquadruple is represented with four fields: an operator, two operands (orsources), and a destination.
To form blocks, the compiler needs a mechanismto connect individual quadruples. Compilers implement quadruples in avariety of ways.Figure 5.5 shows three different schemes for implementing the threeaddress code for a - 2 × b, repeated in the margin. The simplest scheme, in5.3 Linear IRs 239TargetOpArg1t1t2t3t4t5←←2bt1at4×←-Arg2(a) Simple Arrayt2t3t1 ← 2t1 ← 2t2 ← bt2 ← bt3 × t1 t2t3 × t1 t2t4 ← at4 ← at5 - t4 t3t5 - t4 t3(b) Array of Pointers(c) Linked Listn FIGURE 5.5 Implementations of Three-Address Code for a - 2 × b.Figure 5.5a, uses a short array to represent each basic block.
Often, the compiler writer places the array inside a node in the cfg. (This may be the mostcommon form of hybrid ir.) The scheme in Figure 5.5b uses an array ofpointers to group quadruples into a block; the pointer array can be containedin a cfg node. The final scheme, in Figure 5.5c, links the quadruples togetherto form a list. It requires less storage in the cfg node, at the cost of restrictingaccesses to sequential traversals.Consider the costs incurred in rearranging the code in this block. The firstoperation loads a constant into a register; on most machines this translatesdirectly into an immediate load operation. The second and fourth operationsload values from memory, which on most machines might incur a multicycledelay unless the values are already in the primary cache.
To hide some of thedelay, the instruction scheduler might move the loads of b and a in front ofthe immediate load of 2.In the simple array scheme, moving the load of b ahead of the immediate load requires saving the four fields of the first operation, copying thecorresponding fields from the second slot into the first slot, and overwriting the fields in the second slot with the saved values for the immediateload. The array of pointers requires the same three-step approach, exceptthat only the pointer values must be changed.
Thus, the compiler saves thepointer to the immediate load, copies the pointer to the load of b intothe first slot in the array, and overwrites the second slot in the array withthe saved pointer to the immediate load. For the linked list, the operationsare similar, except that the complier must save enough state to let it traversethe list.Now, consider what happens in the front end when it generates the initialround of ir.
With the simple array form and the array of pointers, the compiler must select a size for the array—in effect, the number of quadruplesthat it expects in a block. As it generates the quadruples, it fills in the array.If the array is too large, it wastes space. If it is too small, the compiler must240 CHAPTER 5 Intermediate RepresentationsINTERMEDIATE REPRESENTATIONS IN ACTUAL USEIn practice, compilers use a variety of IRs. Legendary FORTRAN compilers ofyore, such as IBM’s FORTRAN H compilers, used a combination of quadruples and control-flow graphs to represent the code for optimization.
SinceFORTRAN H was written in FORTRAN, it held the IR in an array.For a long time, GCC relied on a very low-level IR, called register transfer language (RTL). In recent years, GCC has moved to a series of IRs. Theparsers initially produce a near-source tree; these trees can be languagespecific but are required to implement parts of a common interface. Thatinterface includes a facility for lowering the trees to the second IR, GIMPLE.Conceptually, GIMPLE consists of a language-independent, tree-like structure for control-flow constructs, annotated with three-address code forexpressions and assignments. It is designed, in part, to simplify analysis.Much of GCC’s new optimizer uses GIMPLE; for example, GCC builds staticsingle-assignment form on top of GIMPLE. Ultimately, GCC translates GIMPLEinto RTL for final optimization and code generation.The LLVM compiler uses a single low-level IR; in fact, the name LLVM standsfor "low-level virtual machine." LLVM’s IR is a linear three-address code.
TheIR is fully typed and has explicit support for array and structure addresses. Itprovides support for vector or SIMD data and operations. Scalar values aremaintained in SSA form throughout the compiler. The LLVM environmentuses GCC front ends, so LLVM IR is produced by a pass that performs GIMPLEto-LLVM translation.The Open64 compiler, an open-source compiler for the IA-64 architecture, uses a family of five related IRs, called WHIRL. The initial translation inthe parser produces a near-source-level WHIRL. Subsequent phases of thecompiler introduce more detail to the WHIRL program, lowering the levelof abstraction toward the actual machine code.
This lets the compiler use asource-level AST for dependence-based transformations on the source textand a low-level IR for the late stages of optimization and code generation.reallocate it to obtain a larger array, copy the contents of the “too small”array into the new, larger array, and deallocate the small array. The linkedlist, however, avoids these problems. Expanding the list just requiresallocating a new quadruple and setting the appropriate pointer in thelist.A multipass compiler may use different implementations to represent their at different points in the compilation process. In the front end, where thefocus is on generating the ir, a linked list might both simplify the implementation and reduce the overall cost.
In an instruction scheduler, with its focuson rearranging the operations, either of the array implementations mightmake more sense.5.3 Linear IRs 241Notice that some information is missing from Figure 5.5. For example, nolabels are shown because labels are a property of the block rather than anyindividual quadruple. Storing a list of labels with the block saves space ineach quadruple; it also makes explicit the property that labels occur only onthe first operation in a basic block. With labels attached to a block, the compiler can ignore them when reordering operations inside the block, avoidingone more complication.5.3.4 Building a Control-Flow Graphfrom a Linear CodeCompilers often must convert between different irs, often different stylesof irs.
One routine conversion is to build a cfg from a linear ir such asiloc. The essential features of a cfg are that it identifies the beginning andend of each basic block and connects the resulting blocks with edges thatdescribe the possible transfers of control among blocks. Often, the compilermust build a cfg from a simple, linear ir that represents a procedure.As a first step, the compiler must find the beginning and the end of each basicblock in the linear ir. We will call the initial operation of a block a leader.An operation is a leader if it is the first operation in the procedure, or if ithas a label that is, potentially, the target of some branch. The compiler canidentify leaders in a single pass over the ir, shown in Figure 5.6a.
It iteratesover the operations in the program, in order, finds the labelled statements,and records them as leaders.If the linear ir contains labels that are not used as branch targets, then treating labels as leaders may unnecessarily split blocks. The algorithm couldAmbiguous jumpa branch or jump whose target cannot bedetermined at compile time; typically, a jump toan address in a registerfor i ← 1 to next - 1j ← Leader[i] + 1while ( j ≤ n and opj ∈/ Leader)j ← j + 1j ← j - 1Last[i] ← jnext ← 1Leader[next++] ← 1for i ← 1 to nif opi has a label li thenLeader[next++] ← icreate a CFG node for li(a) Finding Leadersn FIGURE 5.6 Building a Control-Flow Graph.if opj is "cbr rk → l1 , l2 " thenadd edge from j to node for l1add edge from j to node for l2else if opj is "jumpI → l1 " thenadd edge from j to node for l1else if opj is "jump → r1 " thenadd edges from j to all labelled statements(b) Finding Last and Adding Edges242 CHAPTER 5 Intermediate RepresentationsCOMPLICATIONS IN CFG CONSTRUCTIONFeatures of the IR, the target machine, and the source language cancomplicate CFG construction.Ambiguous jumps may force the compiler to introduce edges that arenever feasible at runtime.
The compiler writer can improve this situation byincluding features in the IR that record potential jump targets. ILOC includesthe tbl pseudo-operation to let the compiler record the potential targetsof an ambiguous jump. Anytime the compiler generates a jump, it shouldfollow the jump with a set of tbl operations that record the possiblebranch targets. CFG construction can use these hints to avoid spuriousedges.If the compiler builds a CFG from target-machine code, features of the target architecture can complicate the process.
The algorithm in Figure 5.6assumes that all leaders, except the first, are labelled. If the target machinehas fall-through branches, the algorithm must be extended to recognize unlabeled statements that receive control on a fall-through path.PC-relative branches cause a similar set of problems.Branch delay slots introduce several problems. A labelled statement thatsits in a branch delay slot is a member of two distinct blocks.