Garbage Collection
Very simple idea: when an object is created, unused space is automatically allocated. We leverage the observation that a program can use only the objects that it could find:
for instance
let x:A <- new A in { x <- y }
- allocate space as needed for new objects
- when spaces runs out… a) compute what objects might be used again (trace from “root”) b) free the space not found in
Mark and Sweep
when memory runs out…
- mark: traces reachabel objects
- sweep: collects garbage objects
Every object has a new bit, the “mark bit”. Every object start with the mark bit set to 0, and we set it to 1 for reachable objects in the mark phase.
At the sweep phrase, we reset everything to 0.
mark
BFS through starting at all roots.
todo = graph_roots
while len(todo) > 0:
if mark(v) == 0: # unmarked, but reachable
set!(mark(v) <- 1)
todo = todo ++ heap_ptrs(v)
sweep
- scan the heap looking fort objects with mark bit 0
- an object is added to the free list
- the objects with a mark bit 1 have their mark bit reset to 0
p = bottom of heap
while p < top of heap:
if mark(p) == 1:
mark(p) = 0
else:
freelist.append((p, p+sizeof(p)))
p += sizeof(p)
catch
Noticeably, the mark phase requires an arbitrary-length todo list. We will therefore have to stick stuff onto the heap.
- free list: easy, just stick it onto the freed variables themselves
- todo list: instead of storing an explicit todo list, reverse each pointer to come back and descent down new branches
con
no object pointer updates during GC (pointers stay where they are).
pros
Importantly, this prohibits parallel garbage collection.
Also, lots of fragmentation.
Stop and Copy
organize memory into two areas
- old space: used for allocation
- new space: use as a reserve for GC
no fragmentation!
- copy reachable objects into new space from old space
- update pointers
- reverse the roles of old space and new space
as we copy an object, we save a “forwarding pointer” so we know stuff has been copied.
catch
We still have to figure out how to implement reference counting. To do this, in the new space, we maintain two pointers scan and alloc. The distance between scan and alloc is our BFS list. We move alloc forward whenever we copy something (when we encounter a reference), and w
pros
stop and copy is generally believed to be the fastest GC scheme.
- allocation is very cheap (we just use the new space)
- collection is decently cheap (only need to touch reachable objects)
Reference Counting
Instead of waiting for memory to be exhausted, collect an object when there are no more pointers to it.
Insight: store in every object the number of pointers to that object.
On new, store in each object the number of references to it.
On assignment x <- y
, where x points to p
and y points to o
:
- rc(p) <- rc(p) + 1
- rc(o) <- rc(o) - 1
- if (rc(o) == 0), mark o as free
- x <- y
catch
Such scheme cannot delete stuff that is in a cycle. For instance, if an object points to another in a cycle, both of them will forever have a reference counter of 1.
To deal with this, we can just use another garbage collection scheme.