Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 62

Rice 1869

Описание файла

Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 62 страницы из PDF

It iteratesover the operations in the program, in order, finds the labelled statements,and records them as leaders.If the linear ir contains labels that are not used as branch targets, then treating labels as leaders may unnecessarily split blocks. The algorithm couldAmbiguous jumpa branch or jump whose target cannot bedetermined at compile time; typically, a jump toan address in a registerfor i ← 1 to next - 1j ← Leader[i] + 1while ( j ≤ n and opj ∈/ Leader)j ← j + 1j ← j - 1Last[i] ← jnext ← 1Leader[next++] ← 1for i ← 1 to nif opi has a label li thenLeader[next++] ← icreate a CFG node for li(a) Finding Leadersn FIGURE 5.6 Building a Control-Flow Graph.if opj is "cbr rk → l1 , l2 " thenadd edge from j to node for l1add edge from j to node for l2else if opj is "jumpI → l1 " thenadd edge from j to node for l1else if opj is "jump → r1 " thenadd edges from j to all labelled statements(b) Finding Last and Adding Edges242 CHAPTER 5 Intermediate RepresentationsCOMPLICATIONS IN CFG CONSTRUCTIONFeatures of the IR, the target machine, and the source language cancomplicate CFG construction.Ambiguous jumps may force the compiler to introduce edges that arenever feasible at runtime.

The compiler writer can improve this situation byincluding features in the IR that record potential jump targets. ILOC includesthe tbl pseudo-operation to let the compiler record the potential targetsof an ambiguous jump. Anytime the compiler generates a jump, it shouldfollow the jump with a set of tbl operations that record the possiblebranch targets. CFG construction can use these hints to avoid spuriousedges.If the compiler builds a CFG from target-machine code, features of the target architecture can complicate the process. The algorithm in Figure 5.6assumes that all leaders, except the first, are labelled. If the target machinehas fall-through branches, the algorithm must be extended to recognize unlabeled statements that receive control on a fall-through path.PC-relative branches cause a similar set of problems.Branch delay slots introduce several problems.

A labelled statement thatsits in a branch delay slot is a member of two distinct blocks. The compilercan cure this problem by replication—creating new (unlabeled) copies ofthe operations in the delay slots. Delay slots also complicate finding theend of a block. The compiler must place operations located in delay slotsinto the block that precedes the branch or jump.If a branch or jump can occur in a branch delay slot, the CFG builder mustwalk forward from the leader to find the block-ending branch—the firstbranch it encounters. Branches in the delay slot of a block-ending branchcan, themselves, be pending on entry to the target block.

They can splitthe target block and force creation of new blocks and new edges. This kindof behavior seriously complicates CFG construction.Some languages allow jumps to labels outside the current procedure. Inthe procedure containing the branch, the branch target can be modelledwith a new CFG node created for that purpose. The complication ariseson the other end of the branch. The compiler must know that the targetlabel is the target of a nonlocal branch, or else subsequent analysis mayproduce misleading results. For this reason, languages such as Pascal orAlgol restricted nonlocal gotos to labels in visible outer lexical scopes.

Crequires the use of the functions setjmp and longjmp to expose thesetransfers.track which labels are jump targets. However, if the code contains any ambiguous jumps, then it must treat all labelled statements as leaders anyway.The second pass, shown in Figure 5.6b, finds every block-ending operation.It assumes that every block ends with a branch or a jump and that branches5.4 Mapping Values to Names 243specify labels for both outcomes—a “branch taken” label and a “branch nottaken” label. This simplifies the handling of blocks and allows the compiler’sback end to choose which path will be the “fall through” case of a branch.(For the moment, assume branches have no delay slots.)To find the end of each block, the algorithm iterates through the blocks, inorder of their appearance in the Leader array.

It walks forward through the iruntil it finds the leader of the next block. The operation immediately beforethat leader ends the current block. The algorithm records that operation’sindex in Last[i], so that the pair hLeader[i],Last[i]i describes block i.It adds edges to the cfg as needed.For a variety of reasons, the cfg should have a unique entry node n0 and aunique exit node n f . The underlying code should have this shape. If it doesnot, a simple postpass over the graph can create n0 and n f .SECTION REVIEWLinear IRs represent the code being compiled as an ordered sequenceof operations. Linear IRs can vary in their level of abstraction; the sourcecode for a program in a plain text file is a linear form, as is the assemblycode for that same program.

Linear IRs lend themselves to compact,human-readable representations.Two widely used linear IRs are bytecodes, generally implemented as aone-address code with implicit names on many operations, and threeaddress code, generally implemented as a set of binary operations thathave distinct name fields for two operands and one result.Review Questions1. Consider the expression a × 2 + a × 2 × b. Translate it into stack machinecode and into three address code. Compare and contrast the numberof operations and the number of operands in each form. How do theycompare against the trees in Figure 5.1?2. Sketch an algorithm to build control-flow graphs from ILOC forprograms that include spurious labels and ambiguous jumps.5.4 MAPPING VALUES TO NAMESThe choice of a specific ir and a level of abstraction helps determine whatoperations the compiler can manipulate and optimize.

For example, a sourcelevel ast makes it easy to find all the references to an array ×. At the same244 CHAPTER 5 Intermediate Representationstime, it hides the details of the address calculations required to access anelement of ×. In contrast, a low-level, linear ir such as iloc exposes thedetails of the address calculation, at the cost of obscuring the fact that aspecific reference relates to ×.Similarly, the discipline that the compiler uses to assign internal names tothe various values computed during execution has an effect on the code thatit can generate. A naming scheme can expose opportunities for optimizationor it can obscure them. The compiler must invent names for many, if notall, of the intermediate results that the program produces when it executes.The choices that it makes with regard to names determines, to a large extent,which computations can be analyzed and optimized.5.4.1 Naming Temporary ValuesThe ir form of a program usually contains more detail than does the sourceversion.

Some of those details are implicit in the source code; others resultfrom deliberate choices in the translation. To see this, consider the four-lineblock of source code shown in Figure 5.7a. Assume that the names refer todistinct values.The block deals with just four names, { a, b, c, d }. It refers to more thanfour values. Each of b, c, and d have a value before the first statement executes. The first statement computes a new value, b + c, as does the second,which computes a - d. The expression b + c in the third statement computest1 ← bt1 ← bt2 ← ct2 ← ct3 ← t1 + t2t3 ← t1 + t2aa← t3← t3t4 ← dt4 ← dt1 ← t3 - t4t5 ← t3 - t4bb← t1← t5a ← b + ct2 ← t1 + t2t6 ← t5 + t2b ← a - dccc ← b + ct4 ← t3 - t4t5 ← t3 - t4d ← a - ddd(a) Source Code← t2← t4(b) Source Namesn FIGURE 5.7 Naming Leads to Different Translations.← t6← t5(c) Value Names5.4 Mapping Values to Names 245a different value than the earlier b + c, unless c = d initially.

Finally, the laststatement computes a - d; its result is always identical to that produced bythe second statement.The source code names tell the compiler little about the values that they hold.For example, the use of b in the first and third statements refer to distinctvalues (unless c = d). The reuse of the name b conveys no information; infact, it might mislead a casual reader into thinking that the code sets a and cto the same value.When the compiler names each of these expressions, it can chose names inways that specifically encode useful information about their values.

Consider, for example, the translations shown in Figures 5.7b and 5.7c. Thesetwo variants were generated with different naming disciplines.The code in Figure 5.7b uses fewer names than the code in 5.7c. Itfollows the source code names, so that a reader can easily relate the codeback to the code in Figure 5.7a. The code in Figure 5.7c uses more namesthan the code in 5.7b. Its naming discipline reflects the computed values andensures that textually identical expressions produce the same result. Thisscheme makes it obvious that a and c may receive different values, while band d must receive the same value.As another example of the impact of names, consider again the representation of an array reference, A[i,j]. Figure 5.8 shows two ir fragments thatrepresent the same computation at very different levels of abstraction.

Thehigh-level ir, in Figure 5.8a, contains all the essential information and iseasy to identify as a subscript reference. The low-level ir, in Figure 5.8b,load1subrj , r1 ⇒ r2loadI 10subscriptAij(a) Source-Level Abstract Syntax Tree⇒ r1⇒ r3multr2 , r3 ⇒ r4subri , r1 ⇒ r5addr4 , r5 ⇒ r6loadI @A⇒ r7addr7 , r6 ⇒ r8loadr8⇒ rAij(b) Low-Level Linear Code (ILOC)n FIGURE 5.8 Different Levels of Abstraction for an Array Subscript Reference .246 CHAPTER 5 Intermediate Representationsexposes many details to the compiler that are implicit in the high-level astfragment. All of the details in the low-level ir can be inferred from thesource-level ast.In the low-level ir, each intermediate result has its own name.

Свежие статьи
Популярно сейчас