K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 12
Текст из файла (страница 12)
It is used to disambiguate references toarray elements. Chapter 9 presents a detailed look at data-flow analysis andits application, along with the construction of static-single-assignment form,an ir that encodes information about the flow of both values and controldirectly in the ir.TransformationTo improve the code, the compiler must go beyond analyzing it. The compiler must use the results of analysis to rewrite the code into a moreefficient form. Myriad transformations have been invented to improve thetime or space requirements of executable code.
Some, such as discoveringloop-invariant computations and moving them to less frequently executedlocations, improve the running time of the program. Others make the codemore compact. Transformations vary in their effect, the scope over whichthey operate, and the analysis required to support them. The literature ontransformations is rich; the subject is large enough and deep enough tomerit one or more separate books. Chapter 10 covers the subject of scalartransformations—that is, transformations intended to improve the performance of code on a single processor. It presents a taxonomy for organizingthe subject and populates that taxonomy with examples.1.3.3 The Back EndThe compiler’s back end traverses the ir form of the code and emits codefor the target machine.
It selects target-machine operations to implementeach ir operation. It chooses an order in which the operations will executeefficiently. It decides which values will reside in registers and which valueswill reside in memory and inserts code to enforce those decisions.16 CHAPTER 1 Overview of CompilationABOUT ILOCThroughout the book, low-level examples are written in a notation thatwe call ILOC—an acronym derived from "intermediate language for anoptimizing compiler." Over the years, this notation has undergone manychanges. The version used in this book is described in detail in Appendix A.Think of ILOC as the assembly language for a simple RISC machine. It has astandard set of operations. Most operations take arguments that are registers. The memory operations, loads and stores, transfer values betweenmemory and the registers.
To simplify the exposition in the text, mostexamples assume that all data consists of integers.Each operation has a set of operands and a target. The operation is writtenin five parts: an operation name, a list of operands, a separator, a list oftargets, and an optional comment. Thus, to add registers 1 and 2, leavingthe result in register 3, the programmer would writeadd r1 ,r2 ⇒ r3// example instructionThe separator, ⇒, precedes the target list. It is a visual reminder that information flows from left to right. In particular, it disambiguates cases wherea person reading the assembly-level text can easily confuse operands andtargets.
(See loadAI and storeAI in the following table.)The example in Figure 1.3 only uses four ILOC operations:ILOC OperationloadAIloadImultstoreAIr1 ,c2 ⇒ r3c1⇒ r2r1 ,r2 ⇒ r3r1⇒ r2 ,c3MeaningMemory(r1 +c2 ) → r3c1 → r2r1 × r2 → r3r1 → Memory(r2 +c3 )Appendix A contains a more detailed description of ILOC. The examplesconsistently use rarp as a register that contains the start of data storagefor the current procedure, also known as the activation record pointer.Instruction Selectiont0t1t2t3a←←←←←a × 2t0 × bt1 × ct2 × dt3The first stage of code generation rewrites the ir operations into targetmachine operations, a process called instruction selection.
Instructionselection maps each ir operation, in its context, into one or moretarget machine operations. Consider rewriting our example expression,a ← a × 2 × b × c × d, into code for the iloc virtual machine toillustrate the process. (We will use iloc throughout the book.) The ir formof the expression is repeated in the margin. The compiler might choosethe operations shown in Figure 1.3.
This code assumes that a, b, c, and d1.3 Overview of Translation 17loadIrarp , @a ⇒ ra2⇒ r2loadAIrarp , @b ⇒ rb// constant 2 into r2// load ‘b’loadAIrarp , @c ⇒ rc// load ‘c’loadAIrarp , @d ⇒ rd// load ‘d’multra , r2multra , rbmultra , rcra , rdloadAImultstoreAI ra⇒⇒⇒⇒⇒// load ‘a’ra// ra ← a × 2ra// ra ← (a × 2) × brara// ra ← (a × 2 × b) × c// ra ← (a × 2 × b × c) × drarp , @a// write ra back to ‘a’n FIGURE 1.3 ILOC Code for a ← a × 2 × b × c × d.are located at offsets @a, @b, @c, and @d from an address contained in theregister rarp .The compiler has chosen a straightforward sequence of operations. It loadsall of the relevant values into registers, performs the multiplications in order,and stores the result to the memory location for a. It assumes an unlimitedsupply of registers and names them with symbolic names such as ra to holda and rarp to hold the address where the data storage for our named valuesbegins.
Implicitly, the instruction selector relies on the register allocator tomap these symbolic register names, or virtual registers, to the actual registersof the target machine.The instruction selector can take advantage of special operations onthe target machine. For example, if an immediate-multiply operation(multI) is available, it might replace the operation mult ra , r2 ⇒ ra withmultI ra , 2 ⇒ ra , eliminating the need for the operation loadI 2 ⇒ r2 andreducing the demand for registers.
If addition is faster than multiplication, it might replace mult ra , r2 ⇒ ra with add ra , ra ⇒ ra , avoiding boththe loadI and its use of r2 , as well as replacing the mult with a fasteradd. Chapter 11 presents two different techniques for instruction selection that use pattern matching to choose efficient implementations for iroperations.Register AllocationDuring instruction selection, the compiler deliberately ignored the factthat the target machine has a limited set of registers. Instead, it used virtual registers and assumed that “enough” registers existed.
In practice, theearlier stages of compilation may create more demand for registers than thehardware can support. The register allocator must map those virtual registersVirtual registera symbolic register name that the compiler usesto indicate that a value can be stored in a register18 CHAPTER 1 Overview of Compilationonto actual target-machine registers.
Thus, the register allocator decides, ateach point in the code, which values should reside in the target-machine registers. It then rewrites the code to reflect its decisions. For example, a registerallocator might minimize register use by rewriting the code from Figure 1.3as follows:loadAIrarp , @a ⇒ r1⇒loadAI rarp , @b ⇒multr1 , r2⇒loadAI rarp , @c ⇒multr1 , r2⇒loadAI rarp , @d ⇒multr1 , r2⇒storeAI r1⇒addr1 , r1// load ‘a’r1// r1 ← a × 2r2// load ‘b’r1r2// r1 ← (a × 2) × b// load ‘c’r1// r1 ← (a × 2 × b) × cr2// load ‘d’r1// r1 ← (a × 2 × b × c) × drarp , @a// write ra back to ‘a’This sequence uses three registers instead of six.Minimizing register use may be counterproductive. If, for example, any ofthe named values, a, b, c, or d, are already in registers, the code shouldreference those registers directly.
If all are in registers, the sequence couldbe implemented so that it required no additional registers. Alternatively, ifsome nearby expression also computed a × 2, it might be better to preservethat value in a register than to recompute it later. This optimization wouldincrease demand for registers but eliminate a later instruction. Chapter 13explores the problems that arise in register allocation and the techniques thatcompiler writers use to solve them.Instruction SchedulingTo produce code that executes quickly, the code generator may need toreorder operations to reflect the target machine’s specific performance constraints. The execution time of the different operations can vary. Memoryaccess operations can take tens or hundreds of cycles, while some arithmetic operations, particularly division, take several cycles.
The impact ofthese longer latency operations on the performance of compiled code can bedramatic.Assume, for the moment, that a loadAI or storeAI operation requires threecycles, a mult requires two cycles, and all other operations require one cycle.The following table shows how the previous code fragment performs underthese assumptions. The Start column shows the cycle in which each operation begins execution and the End column shows the cycle in which itcompletes.1.3 Overview of Translation 19StartEnd1458101315182034791214171922loadAIrarp , @a ⇒ r1addr1 , r1loadAImultloadAImultloadAImultstoreAI⇒ r1rarp , @b ⇒ r2r1 , r2⇒ r1rarp , @c ⇒ r2r1 , r2⇒ r1rarp , @d ⇒ r2r1 , r2⇒ r1r1⇒ rarp , @a// load ‘a’// r1 ← a × 2// load ‘b’// r1 ← (a × 2) × b// load ‘c’// r1 ← (a × 2 × b) × c// load ‘d’// r1 ← (a × 2 × b × c) × d// write ra back to ‘a’This nine-operation sequence takes 22 cycles to execute. Minimizing register use did not lead to rapid execution.Many processors have a property by which they can initiate new operationswhile a long-latency operation executes.