Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 9
Текст из файла (страница 9)
Fewsoftware systems bring together as many complex and diverse components. Working inside a compiler provides practical experience in softwareengineering that is hard to obtain with smaller, less intricate systems.Compilers play a fundamental role in the central activity of computerscience: preparing problems for solution by computer.
Most software is compiled, and the correctness of that process and the efficiency of the resultingcode have a direct impact on our ability to build large systems. Most studentsare not satisfied with reading about these ideas; many of the ideas must beimplemented to be appreciated. Thus, the study of compiler construction isan important component of a computer science education.Compilers demonstrate the successful application of theory to practicalproblems. The tools that automate the production of scanners and parsersapply results from formal language theory. These same tools are used for1.1 Introduction 5text searching, website filtering, word processing, and command-languageinterpreters.
Type checking and static analysis apply results from lattice theory, number theory, and other branches of mathematics to understand andimprove programs. Code generators use algorithms for tree-pattern matching, parsing, dynamic programming, and text matching to automate theselection of instructions.Still, some problems that arise in compiler construction are open problems—that is, the current best solutions have room for improvement. Attempts todesign high-level, universal, intermediate representations have foundered oncomplexity.
The dominant method for scheduling instructions is a greedyalgorithm with several layers of tie-breaking heuristics. While it is obviousthat compilers should use commutativity and associativity to improve thecode, most compilers that try to do so simply rearrange the expression intosome canonical order.Building a successful compiler requires expertise in algorithms, engineering,and planning.
Good compilers approximate the solutions to hard problems.They emphasize efficiency, in their own implementations and in the codethey generate. They have internal data structures and knowledge representations that expose the right level of detail—enough to allow strongoptimization, but not enough to force the compiler to wallow in detail.Compiler construction brings together ideas and techniques from across thebreadth of computer science and applies them in a constrained setting tosolve some truly hard problems.The Fundamental Principles of CompilationCompilers are large, complex, carefully engineered objects. While manyissues in compiler design are amenable to multiple solutions and interpretations, there are two fundamental principles that a compiler writer mustkeep in mind at all times. The first principle is inviolable:The compiler must preserve the meaning of the program being compiled.Correctness is a fundamental issue in programming.
The compiler mustpreserve correctness by faithfully implementing the “meaning” of its inputprogram. This principle lies at the heart of the social contract between thecompiler writer and compiler user. If the compiler can take liberties withmeaning, then why not simply generate a nop or a return? If an incorrecttranslation is acceptable, why expend the effort to get it right?The second principle that a compiler must observe is practical:The compiler must improve the input program in some discernible way.6 CHAPTER 1 Overview of CompilationA traditional compiler improves the input program by making it directlyexecutable on some target machine. Other “compilers” improve their inputin different ways. For example, tpic is a program that takes the specification for a drawing written in the graphics language pic and converts it intoLATEX; the “improvement” lies in LATEX’s greater availability and generality.A source-to-source translator for c must produce code that is, in some measure, better than the input program; if it is not, why would anyone invoke it?1.2 COMPILER STRUCTUREA compiler is a large, complex software system.
The community has beenbuilding compilers since 1955, and over the years, we have learned manylessons about how to structure a compiler. Earlier, we depicted a compiler asa simple box that translates a source program into a target program. Reality,of course, is more complex than that simple picture.As the single-box model suggests, a compiler must both understand thesource program that it takes as input and map its functionality to the targetmachine. The distinct nature of these two tasks suggests a division of laborand leads to a design that decomposes compilation into two major pieces: afront end and a back end.SourceProgramFront EndIRBack EndTargetProgramCompilerThe front end focuses on understanding the source-language program.
Theback end focuses on mapping programs to the target machine. This separation of concerns has several important implications for the design andimplementation of compilers.IRA compiler uses some set of data structures torepresent the code that it processes. That form iscalled an intermediate representation, or IR.The front end must encode its knowledge of the source program in somestructure for later use by the back end. This intermediate representation (ir)becomes the compiler’s definitive representation for the code it is translating.At each point in compilation, the compiler will have a definitive representation. It may, in fact, use several different irs as compilation progresses,but at each point, one representation will be the definitive ir.
We think ofthe definitive ir as the version of the program passed between independentphases of the compiler, like the ir passed from the front end to the back endin the preceding drawing.In a two-phase compiler, the front end must ensure that the source programis well formed, and it must map that code into the ir. The back end must map1.2 Compiler Structure 7MAY YOU STUDY IN INTERESTING TIMESThis is an exciting era in the design and implementation of compilers. Inthe 1980s, almost all compilers were large, monolithic systems. They tookas input one of a handful of languages and produced assembly code forsome particular computer. The assembly code was pasted together withthe code produced by other compilations—including system libraries andapplication libraries—to form an executable.
The executable was storedon a disk, and at the appropriate time, the final code was moved from thedisk to main memory and executed.Today, compiler technology is being applied in many different settings. Ascomputers find applications in diverse places, compilers must cope withnew and different constraints.
Speed is no longer the sole criterion forjudging the compiled code. Today, code might be judged on how smallit is, on how much energy it consumes, on how well it compresses, or onhow many page faults it generates when it runs.At the same time, compilation techniques have escaped from the monolithic systems of the 1980s. They are appearing in many new places. Javacompilers take partially compiled programs (in Java "bytecode" format)and translate them into native code for the target machine.
In this environment, success requires that the sum of compile time plus runtime must beless than the cost of interpretation. Techniques to analyze whole programsare moving from compile time to link time, where the linker can analyzethe assembly code for the entire application and use that knowledge toimprove the program. Finally, compilers are being invoked at runtime togenerate customized code that capitalizes on facts that cannot be knownany earlier.
If the compilation time can be kept small and the benefits arelarge, this strategy can produce noticeable improvements.the ir program into the instruction set and the finite resources of the targetmachine. Because the back end only processes ir created by the front end, itcan assume that the ir contains no syntactic or semantic errors.The compiler can make multiple passes over the ir form of the code beforeemitting the target program. This should lead to better code, as the compilercan, in effect, study the code in one phase and record relevant details. Then,in later phases, it can use these recorded facts to improve the quality oftranslation.
This strategy requires that knowledge derived in the first pass berecorded in the ir, where later passes can find and use it.Finally, the two-phase structure may simplify the process of retargetingthe compiler. We can easily envision constructing multiple back ends for asingle front end to produce compilers that accept the same language but target different machines. Similarly, we can envision front ends for differentRetargetingThe task of changing the compiler to generatecode for a new processor is often calledretargeting the compiler.8 CHAPTER 1 Overview of Compilationlanguages producing the same ir and using a common back end. Bothscenarios assume that one ir can serve for several combinations of sourceand target; in practice, both language-specific and machine-specific detailsusually find their way into the ir.OptimizerThe middle section of a compiler, called anoptimizer, analyzes and transforms the IR toimprove it.Introducing an ir makes it possible to add more phases to compilation.