Главная » Все файлы » Просмотр файлов из архивов » PDF-файлы » A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition)

A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition), страница 28

Описание файла

PDF-файл из архива "A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition)", который расположен в категории "книги и методические указания". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .

Просмотр PDF-файла онлайн

Текст 28 страницы из PDF

All the variables and lvalues in MiniJava are scalar. Even a MiniJava array or object variable is really a pointer (akind of scalar); the Java Language Reference Manual may not say so explicitly, because it istalking about Java semantics instead of Java implementation.In C or Pascal there are structured l-values - structs in C, arrays and records in Pascal - thatare not scalar. To implement a language with "large" variables such as the arrays and recordsin C or Pascal requires a bit of extra work. In a C compiler, the access type would requireinformation about the size of the variable.

Then, the MEM operator of the TREE intermediatelanguage would need to be extended with a notion of size:package Tree;abstract class ExpMEM(Exp exp, int size)The translation of a local variable into an IR tree would look likeMEM(+(TEMP fp, CONST kn), S)128where the S indicates the size of the object to be fetched or stored (depending on whether thistree appears on the left or right of a MOVE).Leaving out the size on MEM nodes makes the MiniJava compiler easier to implement, butlimits the generality of its intermediate representation.SUBSCRIPTING AND FIELD SELECTIONTo subscript an array in Pascal or C (to compute a[i]), just calculate the address of the ithelement of a: (i −l) × s + a, where l is the lower bound of the index range, s is the size (inbytes) of each array element, and a is the base address of the array elements.

If a is global,with a compile-time constant address, then the subtraction a − s × l can be done at compiletime.Similarly, to select field f of a record l-value a (to calculate a. f), simply add the constant fieldoffset of f to the address a.An array variable a is an l-value; so is an array subscript expression a[i], even if i is not an lvalue.

To calculate the l-value a[i] from a, we do arithmetic on the address of a. Thus, in aPascal compiler, the translation of an l-value (particularly a structured l-value) should not besomething likebut should instead be the Tree expression representing the base address of the array:What could happen to this l-value?••A particular element might be subscripted, yielding a (smaller) l-value. A "+" nodewould add the index times the element size to the l-value for the base of the array.The l-value (representing the entire array) might be used in a context where an r-valueis required (e.g., passed as a by-value parameter, or assigned to another arrayvariable). Then the l-value is coerced into an r-value by applying the MEM operator toit.In the MiniJava language, there are no structured, or "large", l-values.

This is because allobject and array values are really pointers to object and array structures. The "base address"of the array is really the contents of a pointer variable, so MEM is required to fetch this baseaddress.Thus, if a is a memory-resident array variable represented as MEM(e), then the contents ofaddress e will be a one-word pointer value p. The contents of addresses p, p + W, p + 2W, …(where W is the word size) will be the elements of the array (all elements are one word long).Thus, a[i] is just129l-values and MEM nodes.

Technically, an l-value (or assignable variable) should berepresented as an address (without the top MEM node in the diagram above). Converting an lvalue to an r-value (when it is used in an expression) means fetching from that address;assigning to an l-value means storing to that address. We are attaching the MEM node to thel-value before knowing whether it is to be fetched or stored; this works only because in theTree intermediate representation, MEM means both store (when used as the left child of aMOVE)and fetch (when used elsewhere).A SERMON ON SAFETYLife is too short to spend time chasing down irreproducible bugs, and money is too valuableto waste on the purchase of flaky software.

When a program has a bug, it should detect thatfact as soon as possible and announce that fact (or take corrective action) before the bugcauses any harm.Some bugs are very subtle. But it should not take a genius to detect an outof-bounds arraysubscript: If the array bounds are L ..H, and the subscript is i, then i < L or i > H is an arraybounds error.

Furthermore, computers are well-equipped with hardware able to compute thecondition i > H. For several decades now, we have known that compilers can automaticallyemit the code to test this condition. There is no excuse for a compiler that is unable to emitcode for checking array bounds. Optimizing compilers can often safely remove the checks bycompile-time analysis; see Section 18.4.One might say, by way of excuse, "but the language in which I program has the kind ofaddress arithmetic that makes it impossible to know the bounds of an array." Yes, and the manwho shot his mother and father threw himself upon the mercy of the court because he was anorphan.In some rare circumstances, a portion of a program demands blinding speed, and the timingbudget does not allow for bounds checking.

In such a case, it would be best if the optimizingcompiler could analyze the subscript expressions and prove that the index will always bewithin bounds, so that an explicit bounds check is not necessary. If that is not possible,perhaps it is reasonable in these rare cases to allow the programmer to explicitly specify anunchecked subscript operation. But this does not excuse the compiler from checking all theother subscript expressions in the program.Needless to say, the compiler should check pointers for nil before dereferencing them, too.[1]ARITHMETICInteger arithmetic is easy to translate: Each arithmetic operator corresponds to a Treeoperator.130The Tree language has no unary arithmetic operators.

Unary negation of integers can beimplemented as subtraction from zero; unary complement can be implemented as XOR withall ones.Unary floating-point negation cannot be implemented as subtraction from zero, because manyfloating-point representations allow a negative zero. The negation of negative zero is positivezero, and vice versa. Thus, the Tree language does not support unary negation very well.Fortunately, the MiniJava language doesn't support floating-point numbers; but in a realcompiler, a new operator would have to be added for floating negation.CONDITIONALSThe result of a comparison operator will be a Cx expression: a statement s that will jump toany true-destination and false-destination you specify.Making "simple" Cx expressions from comparison operators is easy with the CJUMPoperator.

However, the whole point of the Cx representation is that conditional expressionscan be combined easily with the MiniJava operator&&. Therefore, an expression such as x<5 will be translated as Cx(s1), wherefor any labels t and f.To do this, we extend the Cx class to make a subclass RelCx that has private fields to hold theleft and right expressions (in this case x and 5) and the comparison operator (in this caseTree.CJUMP.LT). Then we override the unCx method to generate the CJUMP from these data.It is not necessary to make unEx and unNx methods, since these will be inherited from theparent Cx class.The most straightforward thing to do with an if-expressionif e1 then e2 else e3is to treat e1 as a Cx expression, and e2 and e3 as Ex expressions. That is, use the unCx methodof e1 and the unEx of e2 and e3.

Make two labels t and f to which the conditional will branch.Allocate a temporary r, and after label t, move e2 to r; after label f, move e3 to r. Bothbranches should finish by jumping to a newly created "join" label.This will produce perfectly correct results. However, the translated code may not be veryefficient at all. If e2 and e3 are both "statements" (expressions that return no value), then theirrepresentation is likely to be Nx, not Ex.

Applying unEx to them will work - a coercion willautomatically be applied - but it might be better to recognize this case specially.Even worse, if e2 or e3 is a Cx expression, then applying the unEx coercion to it will yield ahorrible tangle of jumps and labels. It is much better to recognize this case specially.For example, considerif x < 5 then a > b else 0131As shown above, x < 5 translates into Cx(s1); similarly, a > b will be translated as Cx(s2) forsome s2. The whole if-statement should come out approximately asfor some new label z.Therefore, the translation of an if requires a new subclass of Exp:class IfThenElseExp extends Exp {Exp cond, a, b;Label t = new Label();Label f = new Label();Label join = new Label();IfThenElseExp(Exp cc, Exp aa, Exp bb) {cond=cc; a=aa; b=bb;}Tree.Stm unCx(Label tt, Label ff) { ... }Tree.Exp unEx() { ...

}Tree.Stm unNx() { ... }}The labels t and f indicate the beginning of the then-clause and elseclause, respectively. Thelabels tt and ff are quite different: These are the places to which conditions inside the thenclause (or else-clause) must jump, depending on the truth of those subexpressions.STRINGSA string literal is typically implemented as the constant address of a segment of memoryinitialized to the proper characters. In assembly language, a label is used to refer to thisaddress from the middle of some sequence of instructions. At some other place in theassembly-language program, the definition of that label appears, followed by the assemblylanguage pseudo-instruction to reserve and initialize a block of memory to the appropriatecharacters.For each string literal lit, a translator must make a new label lab, and return the treeTree.NAME(lab).

Свежие статьи
Популярно сейчас