Last Updated: February 25, 2016
· devtripper

Understanding Garbage Collection (p1/2)

Garbage Collection is a fundamental feature of modern programming languages. Its objective is to provide a way to manage memory. What GC does is to try to frees memory occupied by objects that no longer will be used.
GC will try to find that objects that cant be accessed in the future.


GC Pro and Cons

The main advantage of using GC is avoiding to deal with the innate complexity of memory deallocation. Indeed, memory deallocation could lead us to several bugs and a more complex code.
On the other hand, GC execution cant be predicted, so its possible to suffer stalls scattered throughout execution. Plus, garbage collected programs have demonstraded poor locality of reference, that is, all in all, that garbage collected programs uses more memory addresses.

Types of Garbage Collectors

Mainly, we will find two GC categories:

+ Tracing garbage collectors

+ Reference counting collectors

Tracing garbage collectors (TGC)

TGC are related to the concept of object reachability. An object is reachable, basically, when the program, at some point of the execution, can lead to it. On any call stack based language (such as Java), any object in stack is reachable, plus global variables. Additionally, reachability of an object is transitive, then referenced objects in a reachable object are also reachables.

TGC will try to liberate memory assigned to objects that are not reachables.
So, basically, tracing collectors will begin execution of garbage collection algorithm at some point ( in Java it could be urged to be executed ), iterating over the whole stack looking for references that could be recycled. The simplest collector algorithm is a simple mark-and-sweep. This algorithm uses a bit to set the reachability of object. When collection iteration begins, root set of objects plus transitively accesible objects to root set are marked as reacheables. After iteration, all objects not marked are candidates to be eliminated.
mark-and-sweep is clearly not efficient, due to the need to stop execution at some (not determined) time. Worst, it will require to analyze the full stack.
tri colour marking is a more efficient tracing collector algorithm. Its based on the idea of assigning a color to each reference. When tri colour inits its execution, all references are white. Plus, the gray is assigned to the root set. Then, the idea its simple: Algorithm begins checking every reference connected to grays, changing any white reference to gray. When all transitive references of a gray reference are gray, that reference is changed to black. When finally there is no more gray references, the remaining white references are not reachables.

Reference counting collectors(RCC)

RCC are based on a simple idea: each object has a count of the number of references to it. When number of references to object is zero, then is not rechable, and candidate to be collected. Main disadvantage of RCC are mutual references. In that scenario, an object referrs to another object, so reference count will never be zero for that object. This situation can lead to memory leaks if there are lots of mutual references at runtime.

On part 2, ill be focused on Java Garbage Collection system. Stay connected!