Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 12
Текст из файла (страница 12)
(We will use iloc throughout the book.) The ir formof the expression is repeated in the margin. The compiler might choosethe operations shown in Figure 1.3. This code assumes that a, b, c, and d1.3 Overview of Translation 17loadIrarp , @a ⇒ ra2⇒ r2loadAIrarp , @b ⇒ rb// constant 2 into r2// load ‘b’loadAIrarp , @c ⇒ rc// load ‘c’loadAIrarp , @d ⇒ rd// load ‘d’multra , r2multra , rbmultra , rcra , rdloadAImultstoreAI ra⇒⇒⇒⇒⇒// load ‘a’ra// ra ← a × 2ra// ra ← (a × 2) × brara// ra ← (a × 2 × b) × c// ra ← (a × 2 × b × c) × drarp , @a// write ra back to ‘a’n FIGURE 1.3 ILOC Code for a ← a × 2 × b × c × d.are located at offsets @a, @b, @c, and @d from an address contained in theregister rarp .The compiler has chosen a straightforward sequence of operations. It loadsall of the relevant values into registers, performs the multiplications in order,and stores the result to the memory location for a.
It assumes an unlimitedsupply of registers and names them with symbolic names such as ra to holda and rarp to hold the address where the data storage for our named valuesbegins. Implicitly, the instruction selector relies on the register allocator tomap these symbolic register names, or virtual registers, to the actual registersof the target machine.The instruction selector can take advantage of special operations onthe target machine. For example, if an immediate-multiply operation(multI) is available, it might replace the operation mult ra , r2 ⇒ ra withmultI ra , 2 ⇒ ra , eliminating the need for the operation loadI 2 ⇒ r2 andreducing the demand for registers. If addition is faster than multiplication, it might replace mult ra , r2 ⇒ ra with add ra , ra ⇒ ra , avoiding boththe loadI and its use of r2 , as well as replacing the mult with a fasteradd.
Chapter 11 presents two different techniques for instruction selection that use pattern matching to choose efficient implementations for iroperations.Register AllocationDuring instruction selection, the compiler deliberately ignored the factthat the target machine has a limited set of registers. Instead, it used virtual registers and assumed that “enough” registers existed. In practice, theearlier stages of compilation may create more demand for registers than thehardware can support.
The register allocator must map those virtual registersVirtual registera symbolic register name that the compiler usesto indicate that a value can be stored in a register18 CHAPTER 1 Overview of Compilationonto actual target-machine registers. Thus, the register allocator decides, ateach point in the code, which values should reside in the target-machine registers. It then rewrites the code to reflect its decisions. For example, a registerallocator might minimize register use by rewriting the code from Figure 1.3as follows:loadAIrarp , @a ⇒ r1⇒loadAI rarp , @b ⇒multr1 , r2⇒loadAI rarp , @c ⇒multr1 , r2⇒loadAI rarp , @d ⇒multr1 , r2⇒storeAI r1⇒addr1 , r1// load ‘a’r1// r1 ← a × 2r2// load ‘b’r1r2// r1 ← (a × 2) × b// load ‘c’r1// r1 ← (a × 2 × b) × cr2// load ‘d’r1// r1 ← (a × 2 × b × c) × drarp , @a// write ra back to ‘a’This sequence uses three registers instead of six.Minimizing register use may be counterproductive.
If, for example, any ofthe named values, a, b, c, or d, are already in registers, the code shouldreference those registers directly. If all are in registers, the sequence couldbe implemented so that it required no additional registers. Alternatively, ifsome nearby expression also computed a × 2, it might be better to preservethat value in a register than to recompute it later. This optimization wouldincrease demand for registers but eliminate a later instruction.
Chapter 13explores the problems that arise in register allocation and the techniques thatcompiler writers use to solve them.Instruction SchedulingTo produce code that executes quickly, the code generator may need toreorder operations to reflect the target machine’s specific performance constraints. The execution time of the different operations can vary. Memoryaccess operations can take tens or hundreds of cycles, while some arithmetic operations, particularly division, take several cycles.
The impact ofthese longer latency operations on the performance of compiled code can bedramatic.Assume, for the moment, that a loadAI or storeAI operation requires threecycles, a mult requires two cycles, and all other operations require one cycle.The following table shows how the previous code fragment performs underthese assumptions. The Start column shows the cycle in which each operation begins execution and the End column shows the cycle in which itcompletes.1.3 Overview of Translation 19StartEnd1458101315182034791214171922loadAIrarp , @a ⇒ r1addr1 , r1loadAImultloadAImultloadAImultstoreAI⇒ r1rarp , @b ⇒ r2r1 , r2⇒ r1rarp , @c ⇒ r2r1 , r2⇒ r1rarp , @d ⇒ r2r1 , r2⇒ r1r1⇒ rarp , @a// load ‘a’// r1 ← a × 2// load ‘b’// r1 ← (a × 2) × b// load ‘c’// r1 ← (a × 2 × b) × c// load ‘d’// r1 ← (a × 2 × b × c) × d// write ra back to ‘a’This nine-operation sequence takes 22 cycles to execute. Minimizing register use did not lead to rapid execution.Many processors have a property by which they can initiate new operationswhile a long-latency operation executes.
As long as the results of a longlatency operation are not referenced until the operation completes, executionproceeds normally. If, however, some intervening operation tries to read theresult of the long-latency operation prematurely, the processor delays theoperation that needs the value until the long-latency operation completes.An operation cannot begin to execute until its operands are ready, and itsresults are not ready until the operation terminates.The instruction scheduler reorders the operations in the code. It attempts tominimize the number of cycles wasted waiting for operands. Of course, thescheduler must ensure that the new sequence produces the same result as theoriginal.
In many cases, the scheduler can drastically improve the performance of “naive” code. For our example, a good scheduler might producethe following sequence:StartEnd123456791134546881013loadAIrarp , @a ⇒ r1loadAIloadAIaddmultloadAImultrarp , @b ⇒ r2rarp , @c ⇒ r3r1 , r1 ⇒ r1r1 , r2 ⇒ r1rarp , @d ⇒ r2r1 , r3 ⇒ r1multstoreAIr1 , r 2r1⇒ r1⇒ rarp , @a// load ‘a’// load ‘b’// load ‘c’// r1 ← a × 2// r1 ← (a × 2) × b// load ‘d’// r1 ← (a × 2 × b) × c// r1 ← (a × 2 × b × c) × d// write ra back to ‘a’20 CHAPTER 1 Overview of CompilationCOMPILER CONSTRUCTION IS ENGINEERINGA typical compiler has a series of passes that, together, translate codefrom some source language into some target language. Along the way,the compiler uses dozens of algorithms and data structures.
The compilerwriter must select, for each step in the process, an appropriate solution.A successful compiler executes an unimaginable number of times. Consider the total number of times that GCC compiler has run. Over GCC’slifetime, even small inefficiencies add up to a significant amount of time.The savings from good design and implementation accumulate over time.Thus, the compiler writer must pay attention to compile time costs, suchas the asymptotic complexity of algorithms, the actual running time ofthe implementation, and the space used by data structures.
The compilerwriter should have in mind a budget for how much time the compiler willspend on its various tasks.For example, scanning and parsing are two problems for which efficientalgorithms abound. Scanners recognize and classify words in time proportional to the number of characters in the input program. For a typicalprogramming language, a parser can build derivations in time proportionalto the length of the derivation. (The restricted structure of programminglanguages makes efficient parsing possible.) Because efficient and effective techniques exist for scanning and parsing, the compiler writer shouldexpect to spend just a small fraction of compile time on these tasks.By contrast, optimization and code generation contain several problemsthat require more time.
Many of the algorithms that we will examine forprogram analysis and optimization will have complexities greater thanO(n). Thus, algorithm choice in the optimizer and code generator has alarger impact on compile time than it does in the compiler’s front end.
Thecompiler writer may need to trade precision of analysis and effectivenessof optimization against increases in compile time. He or she should makesuch decisions consciously and carefully.This version of the code requires just 13 cycles to execute. The code usesone more register than the minimal number. It starts an operation in everycycle except 8, 10, and 12.
Other equivalent schedules are possible, as areequal-length schedules that use more registers. Chapter 12 presents severalscheduling techniques that are in widespread use.Interactions Among Code-Generation ComponentsMost of the truly hard problems that occur in compilation arise during codegeneration. To make matters more complex, these problems interact.