Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 13
Текст из файла (страница 13)
For1.4 Summary and Perspective 21example, instruction scheduling moves load operations away from the arithmetic operations that depend on them. This can increase the period overwhich the values are needed and, correspondingly, increase the number ofregisters needed during that period.
Similarly, the assignment of particularvalues to specific registers can constrain instruction scheduling by creatinga “false” dependence between two operations. (The second operation cannot be scheduled until the first completes, even though the values in thecommon register are independent.
Renaming the values can eliminate thisfalse dependence, at the cost of using more registers.)1.4 SUMMARY AND PERSPECTIVECompiler construction is a complex task. A good compiler combines ideasfrom formal language theory, from the study of algorithms, from artificialintelligence, from systems design, from computer architecture, and from thetheory of programming languages and applies them to the problem of translating a program. A compiler brings together greedy algorithms, heuristictechniques, graph algorithms, dynamic programming, dfas and nfas, fixedpoint algorithms, synchronization and locality, allocation and naming, andpipeline management.
Many of the problems that confront the compiler aretoo hard for it to solve optimally; thus, compilers use approximations, heuristics, and rules of thumb. This produces complex interactions that can lead tosurprising results—both good and bad.To place this activity in an orderly framework, most compilers are organizedinto three major phases: a front end, an optimizer, and a back end. Eachphase has a different set of problems to tackle, and the approaches used tosolve those problems differ, too. The front end focuses on translating sourcecode into some ir.
Front ends rely on results from formal language theoryand type theory, with a healthy dose of algorithms and data structures. Themiddle section, or optimizer, translates one ir program into another, withthe goal of producing an ir program that executes efficiently. Optimizersanalyze programs to derive knowledge about their runtime behavior and thenuse that knowledge to transform the code and improve its behavior. The backend maps an ir program to the instruction set of a specific processor. A backend approximates the answers to hard problems in allocation and scheduling,and the quality of its approximation has a direct impact on the speed and sizeof the compiled code.This book explores each of these phases. Chapters 2 through 4 deal withthe algorithms used in a compiler’s front end.
Chapters 5 through 7 describebackground material for the discussion of optimization and code generation.Chapter 8 provides an introduction to code optimization. Chapters 9 and 1022 CHAPTER 1 Overview of Compilationprovide more detailed treatment of analysis and optimization for the interested reader. Finally, Chapters 11 through 13 cover the techniques used byback ends for instruction selection, scheduling, and register allocation.nCHAPTER NOTESThe first compilers appeared in the 1950s. These early systems showedsurprising sophistication.
The original fortran compiler was a multipasssystem that included a distinct scanner, parser, and register allocator, alongwith some optimizations [26, 27]. The Alpha system, built by Ershov andhis colleagues, performed local optimization [139] and used graph coloringto reduce the amount of memory needed for data items [140, 141].Knuth provides some interesting recollections of compiler construction inthe early 1960s [227]. Randell and Russell describe early implementation efforts for Algol 60 [293]. Allen describes the history of compilerdevelopment inside ibm with an emphasis on the interplay of theory andpractice [14].Many influential compilers were built in the 1960s and 1970s.
These includethe classic optimizing compiler fortran H [252, 307], the Bliss-11 andBliss-32 compilers [72, 356], and the portable bcpl compiler [300]. Thesecompilers produced high-quality code for a variety of cisc machines. Compilers for students, on the other hand, focused on rapid compilation, gooddiagnostic messages, and error correction [97, 146].The advent of risc architecture in the 1980s led to another generation ofcompilers; these focused on strong optimization and code generation [24,81, 89, 204]. These compilers featured full-blown optimizers structured asshown in Figure 1.1.
Modern risc compilers still follow this model.During the 1990s, compiler-construction research focused on reacting tothe rapid changes taking place in microprocessor architecture. The decadebegan with Intel’s i860 processor challenging compiler writers to managepipelines and memory latencies directly. At its end, compilers confrontedchallenges that ranged from multiple functional units to long memory latencies to parallel code generation. The structure and organization of 1980s risccompilers proved flexible enough for these new challenges, so researchersbuilt new passes to insert into the optimizers and code generators of theircompilers.While Java systems use a mix of compilation and interpretation [63, 279],Java is not the first language to employ such a mix.
Lisp systems have longincluded both native-code compilers and virtual-machine implementationExercises 23schemes [266, 324]. The Smalltalk-80 system used a bytecode distributionand a virtual machine [233]; several implementations added just-in-timecompilers [126].nEXERCISES1. Consider a simple Web browser that takes as input a textual string inhtml format and displays the specified graphics on the screen.
Is thedisplay process one of compilation or interpretation?2. In designing a compiler, you will face many tradeoffs. What are thefive qualities that you, as a user, consider most important in a compilerthat you purchase? Does that list change when you are the compilerwriter? What does your list tell you about a compiler that you wouldimplement?3. Compilers are used in many different circumstances. What differencesmight you expect in compilers designed for the following applications?a. A just-in-time compiler used to translate user interface codedownloaded over a networkb.
A compiler that targets the embedded processor used in a cellulartelephonec. A compiler used in an introductory programming course at a highschoold. A compiler used to build wind-tunnel simulations that run on amassively parallel processor (where all processors are identical)e. A compiler that targets numerically intensive programs to a largenumber of diverse machinesThis page intentionally left blankChapter2ScannersnCHAPTER OVERVIEWThe scanner’s task is to transform a stream of characters into a stream ofwords in the input language.
Each word must be classified into a syntacticcategory, or “part of speech.” The scanner is the only pass in the compilerto touch every character in the input program. Compiler writers place a premium on speed in scanning, in part because the scanner’s input is larger,in some measure, than that of any other pass, and, in part, because highlyefficient techniques are easy to understand and to implement.This chapter introduces regular expressions, a notation used to describethe valid words in a programming language. It develops the formal mechanisms to generate scanners from regular expressions, either manually orautomatically.Keywords: Scanner, Finite Automaton, Regular Expression, Fixed Point2.1 INTRODUCTIONScanning is the first stage of a three-part process that the compiler usesto understand the input program. The scanner, or lexical analyzer, reads astream of characters and produces a stream of words.
It aggregates characters to form words and applies a set of rules to determine whether or not eachword is legal in the source language. If the word is valid, the scanner assignsit a syntactic category, or part of speech.The scanner is the only pass in the compiler that manipulates every character of the input program. Because scanners perform a relatively simple task,grouping characters together to form words and punctuation in the sourcelanguage, they lend themselves to fast implementations.
Automatic toolsfor scanner generation are common. These tools process a mathematicalEngineering a Compiler. DOI: 10.1016/B978-0-12-088478-0.00002-5c 2012, Elsevier Inc. All rights reserved.Copyright 2526 CHAPTER 2 Scannersdescription of the language’s lexical syntax and produce a fast recognizer.Alternatively, many compilers use hand-crafted scanners; because the taskis simple, such scanners can be fast and robust.Conceptual RoadmapRecognizera program that identifies specific words in astream of charactersThis chapter describes the mathematical tools and programming techniquesthat are commonly used to construct scanners—both generated scannersand hand-crafted scanners.
The chapter begins, in Section 2.2, by introducing a model for recognizers, programs that identify words in a stream ofcharacters. Section 2.3 describes regular expressions, a formal notation forspecifying syntax. In Section 2.4, we show a set of constructions to convert aregular expression into a recognizer. Finally, in Section 2.5 we present threedifferent ways to implement a scanner: a table-driven scanner, a direct-codedscanner, and a hand-coded approach.Both generated and hand-crafted scanners rely on the same underlying techniques. While most textbooks and courses advocate the use of generatedscanners, most commercial compilers and open-source compilers use handcrafted scanners.
A hand-crafted scanner can be faster than a generatedscanner because the implementation can optimize away a portion of the overhead that cannot be avoided in a generated scanner. Because scanners aresimple and they change infrequently, many compiler writers deem that theperformance gain from a hand-crafted scanner outweighs the convenienceof automated scanner generation. We will explore both alternatives.OverviewSyntactic categorya classification of words according to theirgrammatical usageMicrosyntaxthe lexical structure of a languageA compiler’s scanner reads an input stream that consists of charactersand produces an output stream that contains words, each labelled with itssyntactic category—equivalent to a word’s part of speech in English.