K. Cooper, L. Torczon - Engineering a Compiler (2011 - 2nd edition) (798440), страница 81
Текст из файла (страница 81)
If an allocated block has that pointerset to an invalid value, then the deallocation routine can check that fieldand report a runtime error when the program attempts to deallocate a freeblock or an illegal address—a pointer to anything other than the start of anallocated block.For a modest additional overhead, heap-management software can provideadditional help. By linking together allocated blocks, the allocator can createan environment for memory-allocation debugging tools. A snapshot tool canwalk the list of allocated blocks.
Tagging blocks by the call site that createdthem lets the tool expose memory leaks. Timestamping them allows the tool6.6 Advanced Topics 317to provide the programmer with detailed information about memory use.Tools of this sort can provide invaluable help in locating blocks that arenever deallocated.6.6.2 Implicit DeallocationMany programming languages support implicit deallocation of heap objects.The implementation deallocates memory objects automatically when theyare no longer in use. This requires some care in the implementation ofboth the allocator and the compiled code. To perform implicit deallocation,or garbage collection, the compiler and runtime system must include amechanism for determining when an object is no longer of interest, or dead,and a mechanism for reclaiming and recycling the dead space.The work associated with garbage collection can be performed incrementally, for individual statements, or it can be performed as a batch-orientedtask that runs on demand, when the free-space pool is exhausted.
Reference counting is a classic way to perform incremental garbage collection.Mark-sweep collection is a classic approach to performing batch-orientedcollection.Reference CountingThis technique adds a counter to each heap-allocated object. The countertracks the number of outstanding pointers that refer to the object.
When theallocator creates the object, it sets the reference count to one. Each assignment to a pointer variable adjusts two reference counts. It decrements thereference count of the pointer’s preassignment value and increments thereference count of the pointer’s postassignment value. When an object’s reference count drops to zero, no pointer exists that can reach the object, sothe system may safely free the object. Freeing an object can, in turn, discardpointers to other objects.
This must decrement the reference counts of thoseobjects. Thus, discarding the last pointer to an abstract syntax tree shouldfree the entire tree. When the root node’s reference count drops to zero, it isfreed and its descendant’s reference counts are decremented. This, in turn,should free the descendants, decrementing the counts of their children. Thisprocess continues until the entire ast has been freed.The presence of pointers in allocated objects creates problems for referencecounting schemes, as follows:1.
The running code needs a mechanism to distinguish pointers from otherdata. It may either store extra information in the header field for eachobject or limit the range of pointers to less than a full word and use theGarbage collectionthe implicit deallocation of objects that reside onthe runtime heap318 CHAPTER 6 The Procedure Abstractionremaining bits to “tag” the pointer.
Batch collectors face the sameproblem and use the same solutions.2. The amount of work done for a single decrement can grow quite large.If external constraints require bounded deallocation times, the runtimesystem can adopt a more complex protocol that limits the number ofobjects deallocated for each pointer assignment. By keeping a queue ofobjects that must be freed and limiting the number handled on eachreference-count adjustment, the system can distribute the cost offreeing objects over a larger set of operations.
This amortizes the cost offreeing over the set of all assignments to heap-allocated objects andbounds the work done per assignment.3. The program might form cyclic graphs with pointers. The referencecounts for a cyclic data structure cannot be decremented to zero. Whenthe last external pointer is discarded, the cycle becomes bothunreachable and nonrecyclable. To ensure that all such objects are freed,the programmer must break the cycle before discarding the last pointerto the cycle. (The alternative, to perform reachability analysis on thepointers at runtime, would make reference counting prohibitivelyexpensive.) Many categories of heap-allocated objects, such asvariable-length strings and activation records, cannot be involved insuch cycles.Reference counting incurs additional cost on every pointer assignment.
Theamount of work done for a specific pointer assignment can be bounded; inany well-designed scheme, the total cost can be limited to some constantfactor times the number of pointer assignments executed plus the numberof objects allocated. Proponents of reference counting argue that these overheads are small enough and that the pattern of reuse in reference-countingsystems produces good program locality. Opponents of reference counting argue that real programs do more pointer assignments than allocations,so that garbage collection achieves equivalent functionality with less totalwork.Batch CollectorsIf the collector cannot free any space, then itmust request additional space from the system.If none is available, allocation fails.Batch collectors consider deallocation only when the free-space pool hasbeen exhausted.
When the allocator fails to find needed space, it invokesthe batch collector. The collector pauses the program’s execution, examines the pool of allocated memory to discover unused objects, and reclaimstheir space. When the collector terminates, the free-space pool is usuallynonempty. The allocator can finish its original task and return a newly allocated object to the caller.
(As with reference counting, schemes exist thatperform collection incrementally to amortize the cost over longer periods ofexecution.)6.6 Advanced Topics 319Logically, batch collectors proceed in two phases. The first phase discovers the set of objects that can be reached from pointers stored in programvariables and compiler-generated temporaries. The collector conservativelyassumes that any object reachable in this manner is live and that the remainder are dead.
The second phase deallocates and recycles dead objects.Two commonly used techniques are mark-sweep collectors and copyingcollectors. They differ in their implementation of the second phase ofcollection—recycling.Identifying Live DataCollecting allocators discover live objects by using a marking algorithm.The collector needs a bit for each object in the heap, called a mark bit. Thisbit can be stored in the object’s header, alongside tag information used torecord pointer locations or object size.
Alternatively, the collector can createa dense bit map for the heap when needed. The initial step clears all the markbits and builds a worklist that contains all the pointers stored in registers andin variables accessible to current or pending procedures. The second phaseof the algorithm walks forward from these pointers and marks every objectthat is reachable from this set of visible pointers.Figure 6.12 presents a high-level sketch of a marking algorithm. It is a simplefixed-point computation that halts because the heap is finite and the marksprevent a pointer contained in the heap from entering the Worklist morethan once.
The cost of marking is, in the worst case, proportional to thenumber of pointers contained in program variables and temporaries plus thesize of the heap.The marking algorithm can be either precise or conservative. The differencelies in how the algorithm determines that a specific data value is a pointer inthe final line of the while loop.nnIn a precise collector, the compiler and runtime system know the typeand layout of each object.
This information can be recorded in objectheaders, or it can be known implicitly from the type system. Either way,the marking phase only follows real pointers.In a conservative marking phase, the compiler and runtime system maybe unsure about the type and layout of some objects. Thus, when anobject is marked, the system considers each field that may be a possiblepointer. If its value might be a pointer, it is treated as a pointer. Anyvalue that does not represent a word-aligned address might be excluded,as might values that fall outside the known boundaries of the heap.Conservative collectors have limitations.
They fail to reclaim some objectsthat a precise collector would find. Nonetheless, conservative collectors have320 CHAPTER 6 The Procedure AbstractionClear all marksWorklist ← { pointer values from activation records & registers }while (Worklist 6 = ∅)remove p from the Worklistif (p→object is unmarked)mark p→objectadd pointers from p→object to Worklistn FIGURE 6.12 A Simple Marking Algorithm.been successfully retrofitted into implementations for languages such as cthat do not normally support garbage collection.When the marking algorithm halts, any unmarked object must be unreachable from the program.