K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 44
Текст из файла (страница 44)
In truth, thetype system cannot catch all ill-formed programs; the set of ill-formed programs is not computable. Some runtime errors, such as dereferencing anout-of-bounds pointer, have obvious (and often catastrophic) effects. Others, such as mistakenly interpreting an integer as a floating-point number,can have subtle and cumulative effects. The compiler should eliminate asmany runtime errors as it can using type-checking techniques.To accomplish this, the compiler must first infer a type for each expression.These inferred types expose situations in which a value is incorrectly interpreted, such as using a floating-point number in place of a boolean value.Second, the compiler must check the types of the operands of each operatoragainst the rules that define what the language allows.
In some cases, theserules might require the compiler to convert values from one representationto another. In other circumstances, they may forbid such a conversion andsimply declare that the program is ill formed and, therefore, not executable.Type inferencethe process of determining a type for each nameand each expression in the code166 CHAPTER 4 Context-Sensitive Analysis+integerintegerrealdoublecomplexintegerrealdoublecomplexrealrealrealdoublecomplexdoublecomplexdoubledoubledoublecomplexcomplexillegalcomplexillegaln FIGURE 4.1 Result Types for Addition in FORTRAN 77.Implicit conversionMany languages specify rules that allow anoperator to combine values of different type andrequire that the compiler insert conversions asneeded.The alternative is to require the programmer towrite an explicit conversion or cast.In many languages, the compiler can infer a type for every expression. fortran 77 has a particularly simple type system with just a handful of types.Figure 4.1 shows all the cases that can arise for the + operator.
Given anexpression a + b and the types of a and b, the table specifies the type of a + b.For an integer a and a double-precision b, a + b produces a double-precisionresult. If, instead, a were complex, a + b would be illegal. The compilershould detect this situation and report it before the program executes—asimple example of type safety.For some languages, the compiler cannot infer types for all expressions. apl,for example, lacks declarations, allows a variable’s type to change at anyassignment, and lets the user enter arbitrary code at input prompts. While thismakes apl powerful and expressive, it ensures that the implementation mustdo some amount of runtime type inference and checking.
The alternative, ofcourse, is to assume that the program behaves well and ignore such checking.In general, this leads to bad behavior when a program goes awry. In apl,many of the advanced features rely heavily on the availability of type anddimension information.Safety is a strong reason for using typed languages. A language implementation that guarantees to catch most type-related errors before they execute cansimplify the design and implementation of programs. A language in whichevery expression can be assigned an unambiguous type is called a stronglytyped language.
If every expression can be typed at compile time, the language is statically typed; if some expressions can only be typed at runtime,the language is dynamically typed. Two alternatives exist: an untyped language, such as assembly code or bcpl, and a weakly typed language—onewith a poor type system.Improving ExpressivenessA well-constructed type system allows the language designer to specifybehavior more precisely than is possible with context-free rules.
This capability lets the language designer include features that would be impossible4.2 An Introduction to Type Systems 167to specify in a context-free grammar. An excellent example is operatoroverloading, which gives context-dependent meanings to an operator. Manyprogramming languages use + to signify several kinds of addition. The interpretation of + depends on the types of its operands.
In typed languages,many operators are overloaded. The alternative, in an untyped language, isto provide lexically different operators for each case.For example, in bcpl, the only type is a “cell.” A cell can hold any bitpattern; the interpretation of that bit pattern is determined by the operatorapplied to the cell. Because cells are essentially untyped, operators cannot beoverloaded. Thus, bcpl uses + for integer addition and #+ for floating-pointaddition. Given two cells a and b, both a + b and a #+ b are valid expressions,neither of which performs any conversion on its operands.In contrast, even the oldest typed languages use overloading to specify complex behavior.
As described in the previous section, fortran has a singleaddition operator, +, and uses type information to determine how it shouldbe implemented. ansi c uses function prototypes—declarations of the number and type of a function’s parameters and the type of its returned value—toconvert arguments to the appropriate types. Type information determines theeffect of autoincrementing a pointer in c; the amount of the increment isdetermined by the pointer’s type. Object-oriented languages use type information to select the appropriate implementation at each procedure call.
Forexample, Java selects between a default constructor and a specialized one byexamining the constructor’s argument list.Generating Better CodeA well-designed type system provides the compiler with detailed information about every expression in the program—information that can often beused to produce more efficient translations. Consider implementing addition in fortran 77. The compiler can completely determine the types ofall expressions, so it can consult a table similar to the one in Figure 4.2.The code on the right shows the iloc operation for the addition, alongwith the conversions specified in the fortran standard for each mixed-typeexpression. The full table would include all the cases from Figure 4.1.In a language with types that cannot be wholly determined at compiletime, some of this checking might be deferred until runtime.
To accomplishthis, the compiler would need to emit code similar to the pseudo-code inFigure 4.3. The figure only shows the code for two numeric types, integerand real. An actual implementation would need to cover the entire set ofpossibilities. While this approach ensures runtime safety, it adds significantOperator overloadingAn operator that has different meanings basedon the types of its arguments is "overloaded."168 CHAPTER 4 Context-Sensitive AnalysisType ofaCodeba+bintegerintegerintegerintegerrealrealintegerdoubledoublerealrealrealrealdoubledoubledoubledoubledoubleiADD ra , rb ⇒ ra+bi2f fa ⇒ raffADD raf , rb ⇒ ra +bfi2d ra ⇒ raddADD rad , rb ⇒ ra +bdfADD ra , rb ⇒ ra+br2d ra ⇒ raddADD rad , rb ⇒ ra +bddADD ra , rb ⇒ ra+bn FIGURE 4.2 Implementing Addition in FORTRAN 77.overhead to each operation.
One goal of compile-time checking is to providesuch safety without the runtime cost.The benefit of keeping a in a register comes fromspeed of access. If a’s tag is in RAM, that benefit islost.An alternative is to use part of the space in a tostore the tag and to reduce the range of valuesthat a can hold.Notice that runtime type checking requires a runtime representation for type.Thus, each variable has both a value field and a tag field. The code that performs runtime checking—the nested if-then-else structure in Figure 4.3—relies on the tag fields, while the arithmetic uses the value fields.
With tags,each data item needs more space, that is, more bytes in memory. If a variableis stored in a register, both its value and its tag will need registers. Finally,tags must be initialized, read, compared, and written at runtime. All of thoseactivities add overhead to a simple addition operation.Runtime type checking imposes a large overhead on simple arithmetic andon other operations that manipulate data. Replacing a single addition, or aconversion and an addition, with the nest of if-then-else code in Figure 4.3has a significant performance impact. The size of the code in Figure 4.3strongly suggests that operators such as addition be implemented as procedures and that each instance of an operator be treated as a procedure call.
In alanguage that requires runtime type checking, the costs of runtime checkingcan easily overwhelm the costs of the actual operations.Performing type inference and checking at compile time eliminates thiskind of overhead. It can replace the complex code of Figure 4.3 with thefast, compact code of Figure 4.2. From a performance perspective, compiletime checking is always preferable. However, language design determineswhether or not that is possible.4.2 An Introduction to Type Systems 169// partial code for "a+b ⇒ c"if (tag(a) = integer) thenif (tag(b) = integer) thenvalue(c) = value(a) + value(b);tag(c) = integer;else if (tag(b) = real) thentemp = ConvertToReal(a);value(c) = temp + value(b);tag(c) = real;else if (tag(b) = .