Cooper_Engineering_a_Compiler(Second Edition) (1157546), страница 10
Текст из файла (страница 10)
Thecompiler writer can insert a third phase between the front end and the backend. This middle section, or optimizer, takes an ir program as its input andproduces a semantically equivalent ir program as its output. By using the iras an interface, the compiler writer can insert this third phase with minimaldisruption to the front end and back end. This leads to the following compilerstructure, termed a three-phase compiler.SourceProgramFront EndIROptimizerIRBack EndTargetProgramCompilerThe optimizer is an ir-to-ir transformer that tries to improve the ir programin some way. (Notice that these transformers are, themselves, compilersaccording to our definition in Section 1.1.) The optimizer can make one ormore passes over the ir, analyze the ir, and rewrite the ir. The optimizermay rewrite the ir in a way that is likely to produce a faster target programfrom the back end or a smaller target program from the back end. It mayhave other objectives, such as a program that produces fewer page faults oruses less energy.Conceptually, the three-phase structure represents the classic optimizingcompiler.
In practice, each phase is divided internally into a series of passes.The front end consists of two or three passes that handle the details ofrecognizing valid source-language programs and producing the initial irform of the program. The middle section contains passes that perform different optimizations. The number and purpose of these passes vary fromcompiler to compiler.
The back end consists of a series of passes, each ofwhich takes the ir program one step closer to the target machine’s instruction set. The three phases and their individual passes share a commoninfrastructure. This structure is shown in Figure 1.1.In practice, the conceptual division of a compiler into three phases, a frontend, a middle section or optimizer, and a back end, is useful. The problemsaddressed by these phases are different. The front end is concerned withunderstanding the source program and recording the results of its analysis into ir form.
The optimizer section focuses on improving the ir form.1.3 Overview of Translation 9--Reg Allocation-Inst Scheduling...Inst Selection-Optimization n-Optimization 2-Optimization 1 Optimizer Back End Elaboration-Parser-ScannerFront End- 666666666?????????Infrastructuren FIGURE 1.1 Structure of a Typical Compiler.The back end must map the transformed ir program onto the boundedresources of the target machine in a way that leads to efficient use of thoseresources.Of these three phases, the optimizer has the murkiest description. The termoptimization implies that the compiler discovers an optimal solution to someproblem. The issues and problems that arise in optimization are so complex and so interrelated that they cannot, in practice, be solved optimally.Furthermore, the actual behavior of the compiled code depends on interactions among all of the techniques applied in the optimizer and the back end.Thus, even if a single technique can be proved optimal, its interactions withother techniques may produce less than optimal results.
As a result, a goodoptimizing compiler can improve the quality of the code, relative to an unoptimized version. However, an optimizing compiler will almost always fail toproduce optimal code.The middle section can be a single monolithic pass that applies one or moreoptimizations to improve the code, or it can be structured as a series ofsmaller passes with each pass reading and writing ir. The monolithic structure may be more efficient. The multipass structure may lend itself to a lesscomplex implementation and a simpler approach to debugging the compiler.It also creates the flexibility to employ different sets of optimization in different situations. The choice between these two approaches depends on theconstraints under which the compiler is built and operates.1.3 OVERVIEW OF TRANSLATIONTo translate code written in a programming language into code suitable forexecution on some target machine, a compiler runs through many steps.10 CHAPTER 1 Overview of CompilationNOTATIONCompiler books are, in essence, about notation.
After all, a compiler translates a program written in one notation into an equivalent program writtenin another notation. A number of notational issues will arise in yourreading of this book. In some cases, these issues will directly affect yourunderstanding of the material.Expressing Algorithms We have tried to keep the algorithms concise.Algorithms are written at a relatively high level, assuming that the readercan supply implementation details. They are written in a slanted, sansserif font. Indentation is both deliberate and significant; it matters mostin an if-then-else construct. Indented code after a then or an elseforms a block. In the following code fragmentif Action [s,word] = ‘‘shift si ’’ thenpush wordpush siword ← NextWord()else if · · ·all the statements between the then and the else are part of the thenclause of the if-then-else construct.
When a clause in an if-thenelse construct contains just one statement, we write the keyword thenor else on the same line as the statement.Writing Code In some examples, we show actual program text written insome language chosen to demonstrate a particular point.
Actual programtext is written in a monospace font.Arithmetic Operators Finally, we have forsaken the traditional useof * for × and of / for ÷, except in actual program text. The meaningshould be clear to the reader.To make this abstract process more concrete, consider the steps needed togenerate executable code for the following expression:a ← a × 2 × b × c × dwhere a, b, c, and d are variables, ← indicates an assignment, and × is theoperator for multiplication. In the following subsections, we will trace thepath that a compiler takes to turn this simple expression into executable code.1.3.1 The Front EndBefore the compiler can translate an expression into executable targetmachine code, it must understand both its form, or syntax, and its meaning,1.3 Overview of Translation 11or semantics.
The front end determines if the input code is well formed, interms of both syntax and semantics. If it finds that the code is valid, it createsa representation of the code in the compiler’s intermediate representation; ifnot, it reports back to the user with diagnostic error messages to identify theproblems with the code.Checking SyntaxTo check the syntax of the input program, the compiler must compare theprogram’s structure against a definition for the language.
This requires anappropriate formal definition, an efficient mechanism for testing whether ornot the input meets that definition, and a plan for how to proceed on anillegal input.Mathematically, the source language is a set, usually infinite, of stringsdefined by some finite set of rules, called a grammar. Two separate passesin the front end, called the scanner and the parser, determine whether or notthe input code is, in fact, a member of the set of valid programs defined bythe grammar.Programming language grammars usually refer to words based on their partsof speech, sometimes called syntactic categories.
Basing the grammar ruleson parts of speech lets a single rule describe many sentences. For example,in English, many sentences have the formSentence → Subject verb Object endmarkwhere verb and endmark are parts of speech, and Sentence, Subject, andObject are syntactic variables. Sentence represents any string with the formdescribed by this rule.
The symbol “→” reads “derives” and means that aninstance of the right-hand side can be abstracted to the syntactic variable onthe left-hand side.Consider a sentence like “Compilers are engineered objects.” The first stepin understanding the syntax of this sentence is to identify distinct wordsin the input program and to classify each word with a part of speech. In acompiler, this task falls to a pass called the scanner. The scanner takes astream of characters and converts it to a stream of classified words—thatis, pairs of the form (p,s), where p is the word’s part of speech and s is itsspelling.
A scanner would convert the example sentence into the followingstream of classified words:(noun,“Compilers”), (verb,“are”), (adjective,“engineered”),(noun,“objects”), (endmark,“.”)Scannerthe compiler pass that converts a string ofcharacters into a stream of words12 CHAPTER 1 Overview of CompilationIn practice, the actual spelling of the words might be stored in a hash tableand represented in the pairs with an integer index to simplify equality tests.Chapter 2 explores the theory and practice of scanner construction.In the next step, the compiler tries to match the stream of categorized wordsagainst the rules that specify syntax for the input language.
For example,a working knowledge of English might include the following grammaticalrules:123456SentenceSubjectSubjectObjectObjectModifier...→→→→→→Subject verb Object endmarknounModifier nounnounModifier nounadjectiveBy inspection, we can discover the following derivation for our examplesentence:Rule—1256Prototype SentenceSentenceSubject verb Object endmarknoun verb Object endmarknoun verb Modifier noun endmarknoun verb adjective noun endmarkThe derivation starts with the syntactic variable Sentence. At each step, itrewrites one term in the prototype sentence, replacing the term with a righthand side that can be derived from that rule.
The first step uses Rule 1to replace Sentence. The second uses Rule 2 to replace Subject. The thirdreplaces Object using Rule 5, while the final step rewrites Modifier withadjective according to Rule 6. At this point, the prototype sentence generated by the derivation matches the stream of categorized words produced bythe scanner.Parserthe compiler pass that determines if the inputstream is a sentence in the source languageThe derivation proves that the sentence “Compilers are engineered objects.”belongs to the language described by Rules 1 through 6. The sentence isgrammatically correct.
The process of automatically finding derivations iscalled parsing. Chapter 3 presents the techniques that compilers use to parsethe input program.1.3 Overview of Translation 13A grammatically correct sentence can be meaningless. For example, thesentence “Rocks are green vegetables” has the same parts of speech inthe same order as “Compilers are engineered objects,” but has no rationalmeaning. To understand the difference between these two sentences requirescontextual knowledge about software systems, rocks, and vegetables.The semantic models that compilers use to reason about programming languages are simpler than the models needed to understand natural language.A compiler builds mathematical models that detect specific kinds of inconsistency in a program.