K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 47
Текст из файла (страница 47)
Consider the two declarations in c shown in the margin. Are Tree and STree thesame type? Are they equivalent? Any programming language with a nontrivial type system must include an unambiguous rule to answer this question forarbitrary types.Historically, two general approaches have been tried. The first, name equivalence, asserts that two types are equivalent if and only if they have thesame name. Philosophically, this rule assumes that the programmer canselect any name for a type; if the programmer chooses different names, thelanguage and its implementation should honor that deliberate act.
Unfortunately, the difficulty of maintaining consistent names grows with the size ofthe program, the number of authors, and the number of distinct files of code.The second approach, structural equivalence, asserts that two types areequivalent if and only if they have the same structure. Philosophically, thisrule asserts that two objects are interchangeable if they consist of the sameset of fields, in the same order, and those fields all have equivalent types.Structural equivalence examines the essential properties that define the type.4.2 An Introduction to Type Systems 177REPRESENTING TYPESAs with most objects that a compiler must manipulate, types need an internal representation.
Some languages, such as FORTRAN 77, have a small fixedset of types. For these languages, a small integer tag is both efficient andsufficient. However, many modern languages have open-ended type systems. For these languages, the compiler writer needs to design a structurethat can represent arbitrary types.If the type system is based on name equivalence, any number of simplerepresentations will suffice, as long as the compiler can use the representation to trace back to a representation of the actual structure.
Ifthe type system is based on structural equivalence, the representation ofthe type must encode its structure. Most such systems build trees to represent types. They construct a tree for each type declaration and comparetree structures to test for equivalence.Each policy has strengths and weaknesses. Name equivalence assumes thatidentical names occur as a deliberate act; in a large programming project,this requires discipline to avoid unintentional clashes. Structural equivalence assumes that interchangeable objects can be used safely in place ofone another; if some of the values have “special” meanings, this can create problems. (Imagine two hypothetical, structurally identical types. Thefirst holds a system i/o control block, while the second holds the collectionof information about a bit-mapped image on the screen.
Treating them asdistinct types would allow the compiler to detect a misuse—passing the i/ocontrol block to a screen refresh routine—while treating them as the sametype would not.)Inference RulesIn general, type inference rules specify, for each operator, the mappingbetween the operand types and the result type. For some cases, the mappingis simple. An assignment, for example, has one operand and one result. Theresult, or left-hand side, must have a type that is compatible with the typeof the operand, or right-hand side.
(In Pascal, the subrange 1..100 is compatible with the integers since any element of the subrange can be assignedsafely to an integer.) This rule allows assignment of an integer value to aninteger variable. It forbids assignment of a structure to an integer variable,without an explicit conversion that makes sense of the operation.The relationship between operand types and result types is often specifiedas a recursive function on the type of the expression tree. The functioncomputes the result type of an operation as a function of the types of its178 CHAPTER 4 Context-Sensitive Analysisoperands.
The functions might be specified in tabular form, similar to thetable in Figure 4.1. Sometimes, the relationship between operand types andresult types is specified by a simple rule. In Java, for example, adding twointeger types of different precision produces a result of the more precise(longer) type.The inference rules point out type errors. Mixed-type expressions may beillegal. In fortran 77, a program cannot add a double and a complex.In Java, a program cannot assign a number to a character. These combinations should produce a type error at compile time, along with a messagethat indicates how the program is ill formed.Some languages require the compiler to perform implicit conversions.
Thecompiler must recognize certain combinations of mixed-type expressionsand handle them by inserting the appropriate conversions. In fortran,adding an integer and a floating-point number forces conversion of theinteger to floating-point form before the addition. Similarly, Java mandatesimplicit conversions for integer addition of values with different precision.The compiler must coerce the less precise value to the form of the moreprecise value before addition.
A similar situation arises in Java with integerassignment. If the right-hand side is less precise, it is converted to the moreprecise type of the left-hand side. If, however, the left-hand side is less precise than the right-hand side, the assignment produces a type error unless theprogrammer inserts an explicit cast operation to change its type and coerceits value.Declarations and InferenceThis scheme overloads 2 with different meaningsin different contexts. Experience suggests thatprogrammers are good at understanding thiskind of overloading.As previously mentioned, many programming languages include a “declarebefore use” rule. With mandatory declarations, each variable has a welldefined type.
The compiler needs a way to assign types to constants. Twoapproaches are common. Either a constant’s form implies a specific type—for example, 2 is an integer and 2.0 is a floating-point number—or thecompiler infers a constant’s type from its usage—for example, sin(2)implies that 2 is a floating-point number, while x ← 2, for integer x,implies that 2 is an integer. With declared types for variables, implied typesfor constants, and a complete set of type-inference rules, the compiler canassign types to any expression over variables and constants. Function callscomplicate the picture, as we shall see.Some languages absolve the programmer from writing any declarations. Inthese languages, the problem of type inference becomes substantially moreintricate.
Section 4.5 describes some of the problems that this creates andsome of the techniques that compilers use to address them.4.2 An Introduction to Type Systems 179CLASSIFYING TYPE SYSTEMSMany terms are used to describe type systems. In the text, we haveintroduced the terms strongly typed, untyped, and weakly typed languages.Other distinctions between type systems and their implementations areimportant.Checked versus Unchecked Implementations The implementation of a programming language may elect to perform enough checking to detectand to prevent all runtime errors that result from misuse of a type. (Thismay actually exclude some value-specific errors, such as division by zero.)Such an implementation is called strongly checked.
The opposite of astrongly checked implementation is an unchecked implementation—onethat assumes a well-formed program. Between these poles lies a spectrumof weakly checked implementations that perform partial checking.Compile Time versus Runtime Activity A strongly typed language may havethe property that all inference and all checking can be done at compile time. An implementation that actually does all this work at compiletime is called statically typed and statically checked. Some languages haveconstructs that must be typed and checked at runtime. We term theselanguages dynamically typed and dynamically checked. To confuse mattersfurther, of course, a compiler writer can implement a strongly typed, statically typed language with dynamic checking.
Java is an example of alanguage that could be statically typed and checked, except for an execution model that keeps the compiler from seeing all the source code atonce. This forces it to perform type inference as classes are loaded and toperform some of the checking at runtime.Inferring Types for ExpressionsThe goal of type inference is to assign a type to each expression that occursin a program. The simplest case for type inference occurs when the compilercan assign a type to each base element in an expression—that is, to eachleaf in the parse tree for an expression. This requires declarations for allvariables, inferred types for all constants, and type information about allfunctions.Conceptually, the compiler can assign a type to each value in the expressionduring a simple postorder tree walk. This should let the compiler detect everyviolation of an inference rule, and report it at compile time.
If the languagelacks one or more of the features that make this simple style of inferencepossible, the compiler will need to use more sophisticated techniques. If180 CHAPTER 4 Context-Sensitive Analysiscompile time type inference becomes too difficult, the compiler writer mayneed to move some of the analysis and checking to runtime.Type inference for expressions, in this simple case, directly follows theexpression’s structure.
The inference rules describe the problem in termsof the source language. The evaluation strategy operates bottom up on theparse tree. For these reasons, type inference for expressions has become aclassic example problem to illustrate context-sensitive analysis.Interprocedural Aspects of Type InferenceType inference for expressions depends, inherently, on the other proceduresthat form the executable program. Even in the simplest type systems, expressions contain function calls.