K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 11
Текст из файла (страница 11)
The symbol “→” reads “derives” and means that aninstance of the right-hand side can be abstracted to the syntactic variable onthe left-hand side.Consider a sentence like “Compilers are engineered objects.” The first stepin understanding the syntax of this sentence is to identify distinct wordsin the input program and to classify each word with a part of speech. In acompiler, this task falls to a pass called the scanner.
The scanner takes astream of characters and converts it to a stream of classified words—thatis, pairs of the form (p,s), where p is the word’s part of speech and s is itsspelling. A scanner would convert the example sentence into the followingstream of classified words:(noun,“Compilers”), (verb,“are”), (adjective,“engineered”),(noun,“objects”), (endmark,“.”)Scannerthe compiler pass that converts a string ofcharacters into a stream of words12 CHAPTER 1 Overview of CompilationIn practice, the actual spelling of the words might be stored in a hash tableand represented in the pairs with an integer index to simplify equality tests.Chapter 2 explores the theory and practice of scanner construction.In the next step, the compiler tries to match the stream of categorized wordsagainst the rules that specify syntax for the input language.
For example,a working knowledge of English might include the following grammaticalrules:123456SentenceSubjectSubjectObjectObjectModifier...→→→→→→Subject verb Object endmarknounModifier nounnounModifier nounadjectiveBy inspection, we can discover the following derivation for our examplesentence:Rule—1256Prototype SentenceSentenceSubject verb Object endmarknoun verb Object endmarknoun verb Modifier noun endmarknoun verb adjective noun endmarkThe derivation starts with the syntactic variable Sentence. At each step, itrewrites one term in the prototype sentence, replacing the term with a righthand side that can be derived from that rule. The first step uses Rule 1to replace Sentence.
The second uses Rule 2 to replace Subject. The thirdreplaces Object using Rule 5, while the final step rewrites Modifier withadjective according to Rule 6. At this point, the prototype sentence generated by the derivation matches the stream of categorized words produced bythe scanner.Parserthe compiler pass that determines if the inputstream is a sentence in the source languageThe derivation proves that the sentence “Compilers are engineered objects.”belongs to the language described by Rules 1 through 6. The sentence isgrammatically correct. The process of automatically finding derivations iscalled parsing. Chapter 3 presents the techniques that compilers use to parsethe input program.1.3 Overview of Translation 13A grammatically correct sentence can be meaningless.
For example, thesentence “Rocks are green vegetables” has the same parts of speech inthe same order as “Compilers are engineered objects,” but has no rationalmeaning. To understand the difference between these two sentences requirescontextual knowledge about software systems, rocks, and vegetables.The semantic models that compilers use to reason about programming languages are simpler than the models needed to understand natural language.A compiler builds mathematical models that detect specific kinds of inconsistency in a program.
Compilers check for consistency of type; for example,the expressionType checkingthe compiler pass that checks for type-consistentuses of names in the input programa ← a × 2 × b × c × dmight be syntactically well-formed, but if b and d are character strings, thesentence might still be invalid. Compilers also check for consistency of number in specific situations; for example, an array reference should have thesame number of dimensions as the array’s declared rank and a procedurecall should specify the same number of arguments as the procedure’s definition.
Chapter 4 explores some of the issues that arise in compiler-based typechecking and semantic elaboration.Intermediate RepresentationsThe final issue handled in the front end of a compiler is the generation ofan ir form of the code. Compilers use a variety of different kinds of ir,depending on the source language, the target language, and the specific transformations that the compiler applies. Some irs represent the program as agraph.
Others resemble a sequential assembly code program. The code inthe margin shows how our example expression might look in a low-level,sequential ir. Chapter 5 presents an overview of the variety of kinds of irsthat compilers use.For every source-language construct, the compiler needs a strategy for howit will implement that construct in the ir form of the code. Specific choicesaffect the compiler’s ability to transform and improve the code. Thus, wespend two chapters on the issues that arise in generation of ir for source-codeconstructs.
Procedure linkages are, at once, a source of inefficiency in thefinal code and the fundamental glue that pieces together different source filesinto an application. Thus, we devote Chapter 6 to the issues that surroundprocedure calls. Chapter 7 presents implementation strategies for most otherprogramming language constructs.t0t1t2t3a←←←←←a × 2t0 × bt1 × ct2 × dt314 CHAPTER 1 Overview of CompilationTERMINOLOGYA careful reader will notice that we use the word code in many placeswhere either program or procedure might naturally fit. Compilers can beinvoked to translate fragments of code that range from a single referencethrough an entire system of programs. Rather than specify some scope ofcompilation, we will continue to use the ambiguous, but more general,term, code.1.3.2 The OptimizerWhen the front end emits ir for the input program, it handles the statementsone at a time, in the order that they are encountered.
Thus, the initial irprogram contains general implementation strategies that will work in anysurrounding context that the compiler might generate. At runtime, the codewill execute in a more constrained and predictable context. The optimizeranalyzes the ir form of the code to discover facts about that context and usesthat contextual knowledge to rewrite the code so that it computes the sameanswer in a more efficient way.Efficiency can have many meanings.
The classic notion of optimization isto reduce the application’s running time. In other contexts, the optimizermight try to reduce the size of the compiled code, or other properties suchas the energy that the processor consumes evaluating the code. All of thesestrategies target efficiency.Returning to our example, consider it in the context shown in Figure 1.2a.The statement occurs inside a loop. Of the values that it uses, only a andd change inside the loop. The values of 2, b, and c are invariant in theloop.
If the optimizer discovers this fact, it can rewrite the code as shown inFigure 1.2b. In this version, the number of multiplications has been reducedfrom 4·n to 2·n + 2. For n > 1, the rewritten loop should execute faster. Thiskind of optimization is discussed in Chapters 8, 9, and 10.AnalysisData-flow analysisa form of compile-time reasoning about theruntime flow of valuesMost optimizations consist of an analysis and a transformation.
The analysisdetermines where the compiler can safely and profitably apply the technique.Compilers use several kinds of analysis to support transformations. Dataflow analysis reasons, at compile time, about the flow of values at runtime.Data-flow analyzers typically solve a system of simultaneous set equationsthat are derived from the structure of the code being translated. Dependenceanalysis uses number-theoretic tests to reason about the values that can be1.3 Overview of Translation 15b ← ···c ← ···a ← 1for i = 1 to nread da ← a × 2 × b × c × dendb ← ···c ← ···a ← 1t ← 2 × b × cfor i = 1 to nread da ← a × d × tend(a) Original Code in Context(b) Improved Coden FIGURE 1.2 Context Makes a Difference.assumed by subscript expressions.