Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 57
Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "разное". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 57 страницы из PDF
DOI: 10.1016/B978-0-12-088478-0.00005-0c 2012, Elsevier Inc. All rights reserved.Copyright 221222 CHAPTER 5 Intermediate RepresentationsConceptual RoadmapThis chapter focuses on the issues that surround the design and use of anir in compilation. Section 5.1.1 provides a taxonomic overview of irs andtheir properties.
Many compiler writers consider trees and graphs as the natural representation for programs; for example, parse trees easily capture thederivations built by a parser. Section 5.2 describes several irs based on treesand graphs. Of course, most processors that compilers target have linearassembly languages as their native language. Accordingly, some compilersuse linear irs with the rationale that those irs expose properties of the targetmachine’s code that the compiler should explicitly see.
Section 5.3 examineslinear irs.Appendix B.4 provides more material on symboltable implementation.The final sections of this chapter deal with issues that relate to irs but are not,strictly speaking, ir design issues. Section 5.4 explores issues that relate tonaming: the choice of specific names for specific values. Naming can have astrong impact on the kind of code generated by a compiler. That discussionincludes a detailed look at a specific, widely used ir called static singleassignment form, or ssa.
Section 5.5 provides a high-level overview of howthe compiler builds, uses, and maintains symbol tables. Most compilers buildone or more symbol tables to hold information about names and values andto provide efficient access to that information.OverviewTo convey information between its passes, a compiler needs a representationfor all of the knowledge that it derives about the program being compiled.Thus, almost all compilers use some form of intermediate representation tomodel the code being analyzed, translated, and optimized. Most passes inthe compiler consume ir; the scanner is an exception. Most passes in thecompiler produce ir; passes in the code generator can be exceptions.
Manymodern compilers use multiple irs during the course of a single compilation.In a pass-structured compiler, the ir serves as the primary and definitiverepresentation of the code.A compiler’s ir must be expressive enough to record all of the useful factsthat the compiler might need to transmit between passes. Source code isinsufficient for this purpose; the compiler derives many facts that have norepresentation in source code, such as the addresses of variables and constants or the register in which a given parameter is passed. To record all ofthe detail that the compiler must encode, most compiler writers augment their with tables and sets that record additional information.
We consider thesetables part of the ir.5.1 Introduction 223Selecting an appropriate ir for a compiler project requires an understandingof the source language, the target machine, and the properties of the applications that the compiler will translate. For example, a source-to-sourcetranslator might use an ir that closely resembles the source code, while acompiler that produces assembly code for a microcontroller might obtainbetter results with an assembly-code-like ir. Similarly, a compiler for cmight need annotations about pointer values that are irrelevant in a compiler for Perl, and a Java compiler keeps records about the class hierarchythat have no counterpart in a c compiler.Implementing an ir forces the compiler writer to focus on practical issues.The compiler needs inexpensive ways to perform the operations that it doesfrequently.
It needs concise ways to express the full range of constructs thatmight arise during compilation. The compiler writer also needs mechanismsthat let humans examine the ir program easily and directly. Self-interestshould ensure that compiler writers pay heed to this last point. Finally, compilers that use an ir almost always make multiple passes over the ir for aprogram. The ability to gather information in one pass and use it in anotherimproves the quality of code that a compiler can generate.5.1.1 A Taxonomy of Intermediate RepresentationsCompilers have used many kinds of ir. We will organize our discussion ofirs along three axes: structural organization, level of abstraction, and namingdiscipline. In general, these three attributes are independent; most combinations of organization, abstraction, and naming have been used in somecompiler.Broadly speaking, irs fall into three structural categories:nnnGraphical IRs encode the compiler’s knowledge in a graph.
Thealgorithms are expressed in terms of graphical objects: nodes, edges,lists, or trees. The parse trees used to depict derivations in Chapter 3are a graphical ir.Linear IRs resemble pseudo-code for some abstract machine. Thealgorithms iterate over simple, linear sequences of operations. The iloccode used in this book is a form of linear ir.Hybrid IRs combine elements of both graphical and linear irs, in anattempt to capture their strengths and avoid their weaknesses. Acommon hybrid representation uses a low-level linear ir to representblocks of straight-line code and a graph to represent the flow of controlamong those blocks.The ⇒ symbol in ILOC serves no purpose exceptto improve readability.224 CHAPTER 5 Intermediate RepresentationsThe structural organization of an ir has a strong impact on how the compilerwriter thinks about analysis, optimization, and code generation.
For example, treelike irs lead naturally to passes structured as some form of treewalk.Similarly, linear irs lead naturally to passes that iterate over the operationsin order.The second axis of our ir taxonomy is the level of abstraction at which their represents operations. The ir can range from a near-source representationin which a single node might represent an array access or a procedure call toa low-level representation in which several ir operations must be combinedto form a single target-machine operation.To illustrate the possibilities, assume that A[1...10, 1...10] is an array offour-byte elements stored in row-major order and consider how the compilermight represent the array reference A[i,j] in a source-level tree and in iloc.subIri , 1⇒ r1multI r1 , 10 ⇒ r2subscriptAijSource-Level TreesubIrj , 1⇒ r3addr2 , r3 ⇒ r4multI r4 , 4⇒ r5loadI @A⇒ r6addr5 , r6 ⇒ r7loadr7⇒ rAijILOC CodeIn the source-level tree, the compiler can easily recognize the computation asan array reference; the iloc code obscures that fact fairly well.
In a compilerthat tries to determine when two different references can touch the samememory location, the source-level tree makes it easy to find and comparereferences. By contrast, the iloc code makes those tasks hard. Optimizationonly makes the situation worse; in the iloc code, optimization might moveparts of the address computation elsewhere. The tree node will remain intactunder optimization.On the other hand, if the goal is to optimize the target-machine code generated for the array access, the iloc code lets the compiler optimize detailsthat remain implicit in the source-level tree. For this purpose, a low-level irmay prove better.Not all tree-based irs use a near-source-level of abstraction.
To be sure, parsetrees are implicitly related to the source code, but trees with other levels5.1 Introduction 225of abstraction have been used in many compilers. Many c compilers, forexample, have used low-level expression trees. Similarly, linear irs can haverelatively high-level constructs, such as a max or a min operator, or a stringcopy operation.The third axis of our ir taxonomy deals with the name space used to represent values in the code. In translating source code to a lower-level form, thecompiler must choose names for a variety of distinct values.
For example, toevaluate a - 2 × b in a low-level ir, the compiler might generate a sequenceof operations such as those shown in the margin. Here, the compiler hasused four names, t1 through t4 . An equally valid scheme would replacethe occurrences of t2 and t4 with t1 , which cuts the number of namesin half.The choice of a naming scheme has a strong effect on how optimization canimprove the code. If the subexpression 2 - b has a unique name, the compilermight find other evaluations of 2 - b that it can replace with a reference tothe value produced here. If the name is reused, the current value may not beavailable at the subsequent, redundant evaluation. The choice of a namingscheme also has an impact on compile time, because it determines the sizesof many compile-time data structures.As a practical matter, the costs of generating and manipulating an ir shouldconcern the compiler writer, since they directly affect a compiler’s speed.The data-space requirements of different irs vary over a wide range.
Sincethe compiler typically touches all of the space that it allocates, data spaceusually has a direct relationship to running time. To make this discussionconcrete, consider the irs used in two different research systems that webuilt at Rice University.nnThe Rn Programming Environment built an abstract syntax tree forfortran.
Nodes in the tree occupied 92 bytes each. The parser built anaverage of eleven nodes per fortran source line, for a size of just over1,000 bytes per source-code line.The mscp research compiler used a full-scale implementation of iloc.(The iloc in this book is a simple subset.) iloc operations occupy 23 to25 bytes. The compiler generates an average of roughly fifteen ilocoperations per source-code line, or about 375 bytes per source-codeline. Optimization reduces the size to just over three operations persource-code line, or fewer than 100 bytes per source-code line.Finally, the compiler writer should consider the expressiveness of the ir—itsability to accommodate all the facts that the compiler needs to record.