Cooper_Engineering_a_Compiler(Second Edition) (Rice), страница 44
Описание файла
Файл "Cooper_Engineering_a_Compiler(Second Edition)" внутри архива находится в следующих папках: Rice, Купер и Торчсон - перевод. PDF-файл из архива "Rice", который расположен в категории "". Всё это находится в предмете "конструирование компиляторов" из 7 семестр, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 44 страницы из PDF
Consider implementing addition in fortran 77. The compiler can completely determine the types ofall expressions, so it can consult a table similar to the one in Figure 4.2.The code on the right shows the iloc operation for the addition, alongwith the conversions specified in the fortran standard for each mixed-typeexpression. The full table would include all the cases from Figure 4.1.In a language with types that cannot be wholly determined at compiletime, some of this checking might be deferred until runtime. To accomplishthis, the compiler would need to emit code similar to the pseudo-code inFigure 4.3.
The figure only shows the code for two numeric types, integerand real. An actual implementation would need to cover the entire set ofpossibilities. While this approach ensures runtime safety, it adds significantOperator overloadingAn operator that has different meanings basedon the types of its arguments is "overloaded."168 CHAPTER 4 Context-Sensitive AnalysisType ofaCodeba+bintegerintegerintegerintegerrealrealintegerdoubledoublerealrealrealrealdoubledoubledoubledoubledoubleiADD ra , rb ⇒ ra+bi2f fa ⇒ raffADD raf , rb ⇒ ra +bfi2d ra ⇒ raddADD rad , rb ⇒ ra +bdfADD ra , rb ⇒ ra+br2d ra ⇒ raddADD rad , rb ⇒ ra +bddADD ra , rb ⇒ ra+bn FIGURE 4.2 Implementing Addition in FORTRAN 77.overhead to each operation.
One goal of compile-time checking is to providesuch safety without the runtime cost.The benefit of keeping a in a register comes fromspeed of access. If a’s tag is in RAM, that benefit islost.An alternative is to use part of the space in a tostore the tag and to reduce the range of valuesthat a can hold.Notice that runtime type checking requires a runtime representation for type.Thus, each variable has both a value field and a tag field.
The code that performs runtime checking—the nested if-then-else structure in Figure 4.3—relies on the tag fields, while the arithmetic uses the value fields. With tags,each data item needs more space, that is, more bytes in memory. If a variableis stored in a register, both its value and its tag will need registers. Finally,tags must be initialized, read, compared, and written at runtime.
All of thoseactivities add overhead to a simple addition operation.Runtime type checking imposes a large overhead on simple arithmetic andon other operations that manipulate data. Replacing a single addition, or aconversion and an addition, with the nest of if-then-else code in Figure 4.3has a significant performance impact. The size of the code in Figure 4.3strongly suggests that operators such as addition be implemented as procedures and that each instance of an operator be treated as a procedure call. In alanguage that requires runtime type checking, the costs of runtime checkingcan easily overwhelm the costs of the actual operations.Performing type inference and checking at compile time eliminates thiskind of overhead. It can replace the complex code of Figure 4.3 with thefast, compact code of Figure 4.2.
From a performance perspective, compiletime checking is always preferable. However, language design determineswhether or not that is possible.4.2 An Introduction to Type Systems 169// partial code for "a+b ⇒ c"if (tag(a) = integer) thenif (tag(b) = integer) thenvalue(c) = value(a) + value(b);tag(c) = integer;else if (tag(b) = real) thentemp = ConvertToReal(a);value(c) = temp + value(b);tag(c) = real;else if (tag(b) = .
. . ) then// handle all other types . . .elsesignal runtime type faultelse if (tag(a) = real) thenif (tag(b) = integer) thentemp = ConvertToReal(b);value(c) = value(a) + temp;tag(c) = real;else if (tag(b) = real) thenvalue(c) = value(a) + value(b);tag(c) = real;else if (tag(b) = . . . ) then// handle all other types . . .elsesignal runtime type faultelse if (tag(a) = . . . ) then// handle all other types . . .elsesignal illegal tag value;n FIGURE 4.3 Schema for Implementing Addition with Runtime Type Checking.Type CheckingTo avoid the overhead of runtime type checking, the compiler must analyzethe program and assign a type to each name and each expression. It mustcheck these types to ensure that they are used in contexts where they arelegal. Taken together, these activities are often called type checking. This isan unfortunate misnomer, because it lumps together the separate activities oftype inference and identifying type-related errors.170 CHAPTER 4 Context-Sensitive AnalysisThe programmer should understand how type checking is performed ina given language and compiler.
A strongly typed, statically checkablelanguage might be implemented with runtime checking (or with no checking). An untyped language might be implemented in a way that catchescertain kinds of errors. Both ml and Modula-3 are good examples of stronglytyped languages that can be statically checked. Common Lisp has a strongtype system that must be checked dynamically. ansi c is a typed language,but some implementations do a poor job of identifying type errors.The theory underlying type systems encompasses a large and complex bodyof knowledge. This section provides an overview of type systems andintroduces some simple problems in type checking. Subsequent sectionsuse simple problems of type inference as examples of context-sensitivecomputations.4.2.2 Components of a Type SystemA type system for a typical modern language has four major components: aset of base types, or built-in types; rules for constructing new types from theexisting types; a method for determining if two types are equivalent or compatible; and rules for inferring the type of each source-language expression.Many languages also include rules for the implicit conversion of values fromone type to another based on context.
This section describes each of these inmore detail, with examples from popular programming languages.Base TypesMost programming languages include base types for some, if not all, of thefollowing kinds of data: numbers, characters, and booleans. These types aredirectly supported by most processors. Numbers typically come in severalforms, such as integers and floating-point numbers. Individual languages addother base types.
Lisp includes both a rational number type and a recursivetype cons. Rational numbers are, essentially, pairs of integers interpreted asratios. A cons is defined as either the designated value nil or (cons firstrest) where first is an object, rest is a cons, and cons creates a list fromits arguments.The precise definitions for base types, and the operators defined for them,vary across languages. Some languages refine these base types to createmore; for example, many languages distinguish between several types ofnumbers in their type systems. Other languages lack one or more of thesebase types. For example, c has no string type, so c programmers use an arrayof characters instead.
Almost all languages include facilities to constructmore complex types from their base types.4.2 An Introduction to Type Systems 171NumbersAlmost all programming languages include one or more kinds of numbers asbase types. Typically, they support both limited-range integers and approximate real numbers, often called floating-point numbers. Many programminglanguages expose the underlying hardware implementation by creating distinct types for different hardware implementations. For example, c, c++, andJava distinguish between signed and unsigned integers.fortran, pl/i, and c expose the size of numbers.
Both c and fortranspecify the length of data items in relative terms. For example, a doublein fortran is twice the length of a real. Both languages, however, givethe compiler control over the length of the smallest category of number.In contrast, pl/i declarations specify a length in bits. The compiler mapsthis desired length onto one of the hardware representations. Thus, theibm 370 implementation of pl/i mapped both a fixed binary(12) and afixed binary(15) variable to a 16-bit integer, while a fixed binary(31)became a 32-bit integer.Some languages specify implementations in detail.
For example, Javadefines distinct types for signed integers with lengths of 8, 16, 32, and 64bits. Respectively, they are byte, short, int, and long. Similarly, Java’sfloat type specifies a 32-bit ieee floating-point number, while its doubletype specifies a 64-bit ieee floating-point number. This approach ensuresidentical behavior on different architectures.Scheme takes a different approach. The language defines a hierarchy of number types but lets the implementor select a subset to support. However, thestandard draws a careful distinction between exact and inexact numbers andspecifies a set of operations that should return an exact number when all ofits arguments are exact.
This provides a degree of flexibility to the implementer, while allowing the programmer to reason about when and whereapproximation can occur.CharactersMany languages include a character type. In the abstract, a character is a single letter. For years, due to the limited size of the Western alphabets, this ledto a single-byte (8-bit) representation for characters, usually mapped into theascii character set.