A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition), страница 49
PDF-файл из архива "A.W. Appel, J. Palsberg - Modern Compiler Implementation in Java (Second Edition)", который расположен в категории "книги и методические указания". Всё это находится в предмете "конструирование компиляторов" из седьмого семестра, которые можно найти в файловом архиве МГУ им. Ломоносова. Не смотря на прямую связь этого архива с МГУ им. Ломоносова, его также можно найти и в других разделах. .
Просмотр PDF-файла онлайн
Текст 49 страницы из PDF
By keeping next and limit in registers, steps 2 and 5 can bedone in a total of three instructions.By this combination of techniques, the cost of allocating a record - and then eventuallygarbage collecting it - can be brought down to about four instructions. This means thatprogramming techniques such as the persistent binary search tree (page 108) can be efficientenough for everyday use.230DESCRIBING DATA LAYOUTSThe collector must be able to operate on records of all types: list, tree, or whatever theprogram has declared. It must be able to determine the number of fields in each record, andwhether each field is a pointer.For statically typed languages such as Pascal, or for object-oriented languages such as Java orModula-3, the simplest way to identify heap objects is to have the first word of every objectpoint to a special type- or class-descriptor record.
This record tells the total size of the objectand the location of each pointer field.For statically typed languages this is an overhead of one word per record to serve the garbagecollector. But object-oriented languages need this descriptor pointer in every object just toimplement dynamic method lookup, so that there is no additional per-object overheadattributable to garbage collection.The type- or class-descriptor must be generated by the compiler from the static typeinformation calculated by the semantic analysis phase of the compiler.
The descriptor pointerwill be the argument to the runtime system's alloc function.In addition to describing every heap record, the compiler must identify to the collector everypointer-containing temporary and local variable, whether it is in a register or in an activationrecord. Because the set of live temporaries can change at every instruction, the pointer map isdifferent at every point in the program. Therefore, it is simpler to describe the pointer maponly at points where a new garbage collection can begin.
These are at calls to the allocfunction; and also, since any function call might be calling a function which in turn callsalloc, the pointer map must be described at each function call.The pointer map is best keyed by return addresses: A function call at location a is bestdescribed by its return address immediately after a, because the return address is what thecollector will see in the very next activation record.
The data structure maps return addressesto live-pointer sets; for each pointer that is live immediately after the call, the pointer maptells its register or frame location.To find all the roots, the collector starts at the top of the stack and scans downward, frame byframe. Each return address keys the pointer-map entry that describes the next frame.
In eachframe, the collector marks (or forwards, if copying collection) from the pointers in that frame.Callee-save registers need special handling. Suppose function f calls g, which calls h.Function h knows that it saved some of the callee-save registers in its frame and mentions thisfact in its pointer map; but h does not know which of these registers are pointers. Thereforethe pointer map for g must describe which of its callee-save registers contain pointers at thecall to h and which are "inherited" from f.DERIVED POINTERSSometimes a compiled program has a pointer that points into the middle of a heap record, orthat points before or after the record.
For example, the expression a[i-2000] can becalculated internally as M[a-2000+i]:231If the expression a[i-2000] occurs inside a loop, the compiler might choose to hoist t1 ← a− 2000 outside the loop to avoid recalculating it in each iteration. If the loop also contains analloc, and a garbage collection occurs while t1 is live, will the collector be confused by apointer t1 that does not point to the beginning of an object, or (worse yet) that points to anunrelated object?We say that the t1 is derived from the base pointer a. The pointer map must identify eachderived pointer and tell the base pointer from which it is derived. Then, when the collectorrelocates a to address a′, it must adjust t1 to point to address t1 + a′ − a.Of course, this means that a must remain live as long as t1 is live.
Consider the loop at left,implemented as shown at right:r1 ← 100letvar a := intarray of 0r2 ← 0call alloca ← r1t1 ← a - 2000infor i := 1930 to 1990do f(a[i-2000])endL1i ← 1930L1 : r1 ← M[t1 + i]call fL2 : if i ≤ 1990 gotoIf there are no other uses of a, then the temporary a appears dead after the assignment to t1.But then the pointer map associated with the return address L2 would not be able to "explain"t1 adequately. Therefore, for purposes of the compiler's liveness analysis, a derived pointerimplicitly keeps its base pointer live.PROGRAM DESCRIPTORSImplement record descriptors and pointer maps for the MiniJava compiler.For each record-type declaration, make a string literal to serve as the record descriptor.
Thelength of the string should be equal to the number of fields in the record. The ith byte of thestring should be p if the ith field of the record is a pointer (string, record, or array), or n if theith field is a nonpointer.The allocRecord function should now take the record descriptor string (pointer) instead of alength; the allocator can obtain the length from the string literal. Then allocRecord shouldstore this descriptor pointer at field zero of the record. Modify the runtime systemappropriately.The user-visible fields of the record will now be at offsets 1, 2, 3,… instead of 0, 1, 2,…;adjust the compiler appropriately.Design a descriptor format for arrays, and implement it in the compiler and runtime system.232Implement a temp-map with a boolean for each temporary: Is it a pointer or not? Also make asimilar map for the offsets in each stack frame, for frame-resident pointer variables.
You willnot need to handle derived pointers, as your MiniJava compiler probably does not keepderived pointers live across function calls.For each procedure call, put a new return-address label Lret immediately after the callinstruction. For each one, make a data fragment of the formLptrmap327 :.word.word.wordLptrmap326Lret327…link to previous ptr-map entrykey for this entrypointer map for this return address⋮and then the runtime system can traverse this linked list of pointer-map entries, and perhapsbuild it into a data structure of its own choosing for fast lookup of return addresses. The datalayout pseudo-instructions (.word, etc.) are, of course, machine-dependent.PROGRAM GARBAGE COLLECTIONImplement a mark-sweep or copying garbage collector in the C language, and link it into theruntime system.
Invoke the collector from allocRecord or initArray when the free space isexhausted.FURTHER READINGReference counting [Collins 1960] and mark-sweep collection [McCarthy 1960] are almost asold as languages with pointers. The pointer-reversal idea is attributed by Knuth  toPeter Deutsch and to Herbert Schorr and W. M. Waite.Fenichel and Yochelson  designed the first two-space copying collector, using depthfirst search; Cheney  designed the algorithm that uses the unscanned nodes in to-spaceas the queue of a breadth-first search, and also the semi-depth-first copying that improves thelocality of a linked list.Steele  designed the first concurrent mark-and-sweep algorithm.
Dijkstra et al. formalized the notion of tricolor marking, and designed a concurrent algorithm that they couldprove correct, trying to keep the synchronization requirements as weak as possible. Baker invented the incremental copying algorithm in which the mutator sees only to-spacepointers.Generational garbage collection, taking advantage of the fact that newer objects die quicklyand that there are few old-to-new pointers, was invented by Lieberman and Hewitt ;Ungar  developed a simpler and more efficient remembered set mechanism.The Symbolics Lisp Machine [Moon 1984] had special hardware to assist with incrementaland generational garbage collection.
The microcoded memory-fetch instructions enforced theinvariant of Baker's algorithm; the microcoded memory-store instructions maintained theremembered set for generational collection. This collector was the first to explicitly improvelocality of reference by keeping related objects on the same virtual-memory page.233As modern computers rarely use microcode, and a modern general-purpose processorembedded in a general-purpose memory hierarchy tends to be an order of magnitude fasterand cheaper than a computer with special-purpose instructions and memory tags, attentionturned in the late 1980s to algorithms that could be implemented with standard RISCinstructions and standard virtual-memory hardware. Appel et al.  use virtual memory toimplement a read barrier in a truly concurrent variant of Baker's algorithm.
Shaw  usesvirtual-memory dirty bits to implement a write barrier for generational collection, and Boehmet al.  make the same simple write barrier serve for concurrent generational mark-andsweep. Write barriers are cheaper to implement than read barriers, because stores to old pagesare rarer than fetches from to-space, and a write barrier merely needs to set a dirty bit andcontinue with minimal interruption of the mutator.
Sobalvarro  invented the cardmarking technique, which uses ordinary RISC instructions without requiring interaction withthe virtual-memory system.Appel and Shao  describe techniques for fast allocation of heap records and discussseveral other efficiency issues related to garbage-collected systems.Branquart and Lewi  describe pointer maps communicated from a compiler to itsgarbage collector; Diwan et al.  tie pointer maps to return addresses, show how tohandle derived pointers, and compress the maps to save space.Appel [1992, Chapter 12] shows that compilers for functional languages must be carefulabout closure representations; using simple static links (for example) can keep enormousamounts of data reachable, preventing the collector from reclaiming it.Boehm and Weiser  describe conservative collection, where the compiler does notinform the collector which variables and record fields contain pointers, so the collector must"guess." Any bit pattern pointing into the allocated heap is assumed to be a possible pointerand keeps the pointed-to record live.