K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 45
Текст из файла (страница 45)
. . ) then// handle all other types . . .elsesignal runtime type faultelse if (tag(a) = real) thenif (tag(b) = integer) thentemp = ConvertToReal(b);value(c) = value(a) + temp;tag(c) = real;else if (tag(b) = real) thenvalue(c) = value(a) + value(b);tag(c) = real;else if (tag(b) = . . . ) then// handle all other types . . .elsesignal runtime type faultelse if (tag(a) = . .
. ) then// handle all other types . . .elsesignal illegal tag value;n FIGURE 4.3 Schema for Implementing Addition with Runtime Type Checking.Type CheckingTo avoid the overhead of runtime type checking, the compiler must analyzethe program and assign a type to each name and each expression. It mustcheck these types to ensure that they are used in contexts where they arelegal. Taken together, these activities are often called type checking.
This isan unfortunate misnomer, because it lumps together the separate activities oftype inference and identifying type-related errors.170 CHAPTER 4 Context-Sensitive AnalysisThe programmer should understand how type checking is performed ina given language and compiler. A strongly typed, statically checkablelanguage might be implemented with runtime checking (or with no checking). An untyped language might be implemented in a way that catchescertain kinds of errors.
Both ml and Modula-3 are good examples of stronglytyped languages that can be statically checked. Common Lisp has a strongtype system that must be checked dynamically. ansi c is a typed language,but some implementations do a poor job of identifying type errors.The theory underlying type systems encompasses a large and complex bodyof knowledge. This section provides an overview of type systems andintroduces some simple problems in type checking. Subsequent sectionsuse simple problems of type inference as examples of context-sensitivecomputations.4.2.2 Components of a Type SystemA type system for a typical modern language has four major components: aset of base types, or built-in types; rules for constructing new types from theexisting types; a method for determining if two types are equivalent or compatible; and rules for inferring the type of each source-language expression.Many languages also include rules for the implicit conversion of values fromone type to another based on context.
This section describes each of these inmore detail, with examples from popular programming languages.Base TypesMost programming languages include base types for some, if not all, of thefollowing kinds of data: numbers, characters, and booleans. These types aredirectly supported by most processors. Numbers typically come in severalforms, such as integers and floating-point numbers. Individual languages addother base types. Lisp includes both a rational number type and a recursivetype cons.
Rational numbers are, essentially, pairs of integers interpreted asratios. A cons is defined as either the designated value nil or (cons firstrest) where first is an object, rest is a cons, and cons creates a list fromits arguments.The precise definitions for base types, and the operators defined for them,vary across languages.
Some languages refine these base types to createmore; for example, many languages distinguish between several types ofnumbers in their type systems. Other languages lack one or more of thesebase types. For example, c has no string type, so c programmers use an arrayof characters instead.
Almost all languages include facilities to constructmore complex types from their base types.4.2 An Introduction to Type Systems 171NumbersAlmost all programming languages include one or more kinds of numbers asbase types. Typically, they support both limited-range integers and approximate real numbers, often called floating-point numbers.
Many programminglanguages expose the underlying hardware implementation by creating distinct types for different hardware implementations. For example, c, c++, andJava distinguish between signed and unsigned integers.fortran, pl/i, and c expose the size of numbers. Both c and fortranspecify the length of data items in relative terms. For example, a doublein fortran is twice the length of a real. Both languages, however, givethe compiler control over the length of the smallest category of number.In contrast, pl/i declarations specify a length in bits.
The compiler mapsthis desired length onto one of the hardware representations. Thus, theibm 370 implementation of pl/i mapped both a fixed binary(12) and afixed binary(15) variable to a 16-bit integer, while a fixed binary(31)became a 32-bit integer.Some languages specify implementations in detail. For example, Javadefines distinct types for signed integers with lengths of 8, 16, 32, and 64bits. Respectively, they are byte, short, int, and long. Similarly, Java’sfloat type specifies a 32-bit ieee floating-point number, while its doubletype specifies a 64-bit ieee floating-point number.
This approach ensuresidentical behavior on different architectures.Scheme takes a different approach. The language defines a hierarchy of number types but lets the implementor select a subset to support. However, thestandard draws a careful distinction between exact and inexact numbers andspecifies a set of operations that should return an exact number when all ofits arguments are exact. This provides a degree of flexibility to the implementer, while allowing the programmer to reason about when and whereapproximation can occur.CharactersMany languages include a character type. In the abstract, a character is a single letter. For years, due to the limited size of the Western alphabets, this ledto a single-byte (8-bit) representation for characters, usually mapped into theascii character set.
Recently, more implementations—both operating system and programming language—have begun to support larger charactersets expressed in the Unicode standard format, which requires 16 bits. Mostlanguages assume that the character set is ordered, so that standard comparison operators, such as <, =, and >, work intuitively, enforcing lexicographicordering. Conversion between a character and an integer appears in somelanguages.
Few other operations make sense on character data.172 CHAPTER 4 Context-Sensitive AnalysisBooleansMost programming languages include a boolean type that takes on two values: true and false. Standard operations provided for booleans includeand, or, xor, and not. Boolean values, or boolean-valued expressions, areoften used to determine the flow of control. c considers boolean values as asubrange of the unsigned integers, restricted to the values zero (false) andone (true).Compound and Constructed TypesWhile the base types of a programming language usually provide an adequate abstraction of the actual kinds of data handled directly by the hardware, they are often inadequate to represent the information domain neededby programs. Programs routinely deal with more complex data structures,such as graphs, trees, tables, arrays, records, lists, and stacks.
These structures consist of one or more objects, each with its own type. The ability toconstruct new types for these compound or aggregate objects is an essentialfeature of many programming languages. It lets the programmer organizeinformation in novel, program-specific ways. Tying these organizations tothe type system improves the compiler’s ability to detect ill-formed programs. It also lets the language express higher-level operations, such as awhole-structure assignment.Take, for example, Lisp, which provides extensive support for programmingwith lists.
Lisp’s list is a constructed type. A list is either the designated valuenil or (cons first rest) where first is an object, rest is a list, andcons is a constructor that creates a list from its two arguments. A Lisp implementation can check each call to cons to ensure that its second argument is,in fact, a list.ArraysArrays are among the most widely used aggregate objects. An array groupstogether multiple objects of the same type and gives each a distinct name—albeit an implicit, computed name rather than an explicit, programmerdesignated, name.
The c declaration int a[100][200]; sets aside space for100 × 200 = 20,000 integers and ensures that they can be addressed usingthe name a. The references a[1][17] and a[2][30] access distinct andindependent memory locations. The essential property of an array is thatthe program can compute names for each of its elements by using numbers(or some other ordered, discrete type) as subscripts.Support for operations on arrays varies widely. fortran 90, pl/i, and aplall support assignment of whole or partial arrays.