Skip to content

Garbage Collection Techniques

Garbage collection is a memory management technique that automatically frees memory that is no longer being used by a program. While it's most commonly associated with languages like Java, Python, and JavaScript, understanding garbage collection is crucial for systems programmers who need to make informed decisions about memory management strategies.

Let's explore the different garbage collection techniques and understand their trade-offs.

The Problem: Manual Memory Management

In languages like C and C++, you must manually manage memory:

cpp
void process_data() {
    int* data = malloc(1000 * sizeof(int));  // Allocate memory
    // Use data...
    free(data);  // Must remember to free
}

Problems with manual management:

  • Memory leaks: Forgetting to free allocated memory
  • Use-after-free: Using memory after it's been freed
  • Double free: Freeing the same memory twice
  • Complexity: Managing memory in complex data structures

Garbage collection solves these problems by automatically detecting and freeing unused memory.

Mark-and-Sweep: The Classic Approach

Mark-and-sweep is one of the most fundamental garbage collection algorithms. It works in two phases: marking and sweeping.

How Mark-and-Sweep Works

Phase 1: Mark The garbage collector starts from "root" objects (global variables, stack variables, registers) and marks all objects that are reachable by following references. Phase 2: Sweep The collector scans through all allocated memory and frees any objects that weren't marked as reachable. Example:

cpp
# Before garbage collection
root -> A -> B -> C
       |
       -> D -> E
       |
       -> F (unreachable)

# After mark phase
root -> A* -> B* -> C*
       |
       -> D* -> E*
       |
       -> F (unmarked)

# After sweep phase
root -> A -> B -> C
       |
       -> D -> E
       # F has been freed

Advantages and Disadvantages

Advantages:

  • Simple concept: Easy to understand and implement
  • Handles cycles: Can detect and free circular references
  • Complete: Finds all unreachable objects

Disadvantages:

  • Stop-the-world: Must pause the program during collection
  • Memory fragmentation: Can leave memory fragmented
  • Performance: Can be slow for large heaps

Reference Counting: Immediate Cleanup

Reference counting keeps track of how many references point to each object. When the reference count reaches zero, the object is immediately freed.

How Reference Counting Works

cpp
class Object:
    def __init__(self):
        self.ref_count = 0

    def add_ref(self):
        self.ref_count += 1

    def remove_ref(self):
        self.ref_count -= 1
        if self.ref_count == 0:
            self.cleanup()  # Free the object

Example:

cpp
# Create object A
A = Object()  # A.ref_count = 1

# Create reference to A
B = A         # A.ref_count = 2

# Remove one reference
B = None      # A.ref_count = 1

# Remove last reference
A = None      # A.ref_count = 0, object is freed

The Circular Reference Problem

Reference counting has a fundamental flaw: it cannot handle circular references.

cpp
class Node:
    def __init__(self):
        self.ref_count = 0
        self.next = None

# Create circular reference
A = Node()  # A.ref_count = 1
B = Node()  # B.ref_count = 1
A.next = B  # B.ref_count = 2
B.next = A  # A.ref_count = 2

# Remove external references
A = None    # A.ref_count = 1 (still referenced by B)
B = None    # B.ref_count = 1 (still referenced by A)

# Both objects are unreachable but have ref_count > 0!

Solutions to circular references:

  • Weak references: Don't count toward reference count
  • Cycle detection: Periodic mark-and-sweep to find cycles
  • Hybrid approaches: Combine reference counting with other techniques

Advantages and Disadvantages

Advantages:

  • Immediate cleanup: Objects freed as soon as they become unreachable
  • Predictable: No unexpected pauses
  • Incremental: Can be done in small steps

Disadvantages:

  • Circular references: Cannot handle cycles without additional techniques
  • Performance overhead: Every reference operation must update counts
  • Memory overhead: Each object needs a reference count field

Generational Garbage Collection: Exploiting Object Lifetimes

Generational garbage collection is based on the observation that most objects die young. It divides objects into generations and collects young generations more frequently.

The Generational Hypothesis

Empirical observation:

  • Young objects: Most objects are short-lived
  • Old objects: Objects that survive tend to live for a long time
  • Memory locality: Young objects are often allocated together

How Generational GC Works

Generations:

  • Young generation (Eden): Newly allocated objects
  • Old generation: Objects that have survived several collections
  • Permanent generation: Long-lived objects (classes, methods)

Collection process:

  1. Minor collection: Collect only the young generation
  2. Promotion: Move surviving objects to the old generation
  3. Major collection: Collect the entire heap (less frequent)

Example:

cpp
Initial state:
Young: [A][B][C][D][E][F]
Old:   [X][Y][Z]

After minor collection (A, C, E are dead):
Young: [B][D][F]  (survivors)
Old:   [X][Y][Z][B][D][F]  (promoted survivors)

Advantages and Disadvantages

Advantages:

  • Efficient: Most collections are fast (only young generation)
  • Good performance: Exploits object lifetime patterns
  • Reduced fragmentation: Objects in same generation tend to be similar size

Disadvantages:

  • Complexity: More complex to implement
  • Write barriers: Must track references between generations
  • Tuning: Requires tuning of generation sizes and collection frequencies

Stop-the-World vs Concurrent Garbage Collection

Stop-the-World Collection

In stop-the-world collection, the program is completely paused during garbage collection.

How it works:

cpp
Program execution: [=====][PAUSE][=====][PAUSE][=====]
                  ^      ^      ^      ^      ^
                  |      |      |      |      |
                  |      |      |      |      Program resumes
                  |      |      |      GC completes
                  |      |      Program resumes
                  |      GC starts
                  Program starts

Advantages:

  • Simple: Easy to implement and reason about
  • Complete: Can collect the entire heap at once
  • No interference: No race conditions with program execution

Disadvantages:

  • Pause times: Can cause noticeable pauses in program execution
  • Poor responsiveness: Not suitable for real-time applications
  • User experience: Can make applications feel unresponsive

Concurrent Garbage Collection

Concurrent garbage collection runs alongside the program, minimizing pause times.

How it works:

cpp
Program execution: [========================================]
GC execution:      [    ][    ][    ][    ][    ][    ]
Pause times:       [PAUSE]                    [PAUSE]

Types of concurrent collection:

  • Concurrent mark: Marking happens while program runs
  • Concurrent sweep: Sweeping happens while program runs
  • Incremental collection: Collection happens in small increments

Advantages:

  • Low pause times: Minimal interruption to program execution
  • Better responsiveness: Suitable for interactive applications
  • Real-time friendly: Can be used in real-time systems

Disadvantages:

  • Complexity: Much more complex to implement correctly
  • Overhead: Additional overhead during program execution
  • Race conditions: Must handle concurrent access to heap

Garbage Collection in Practice

When to Use Garbage Collection

Good candidates:

  • High-level applications: Web servers, desktop applications
  • Rapid development: When development speed is more important than performance
  • Complex data structures: When manual management becomes error-prone
  • Multi-threaded applications: When manual management is complex

Poor candidates:

  • Real-time systems: Where predictable performance is critical
  • Embedded systems: Where memory is limited
  • High-performance systems: Where every cycle counts
  • Systems programming: Where control over memory is essential

Tuning Garbage Collection

Key parameters:

  • Heap size: Total memory available for allocation
  • Generation sizes: Size of young and old generations
  • Collection frequency: How often to trigger collections
  • Pause time goals: Maximum acceptable pause times

Example tuning:

cpp
// JVM garbage collection tuning
-XX:+UseG1GC                    // Use G1 garbage collector
-XX:MaxGCPauseMillis=200        // Target 200ms pause time
-XX:G1HeapRegionSize=16m        // 16MB regions
-XX:InitiatingHeapOccupancyPercent=45  // Start GC at 45% heap usage

Garbage Collection Algorithms in Modern Systems

Java:

  • G1GC: Low-pause garbage collector for large heaps
  • ZGC: Ultra-low-latency garbage collector
  • Shenandoah: Low-pause garbage collector

Go:

  • Concurrent mark-and-sweep: Simple but effective
  • Tri-color marking: Efficient concurrent marking

JavaScript (V8):

  • Generational collection: Young and old generations
  • Incremental marking: Reduces pause times

Performance Considerations

Garbage Collection Overhead

CPU overhead:

  • Marking: Must traverse all reachable objects
  • Sweeping: Must scan all allocated memory
  • Write barriers: Overhead for tracking references

Memory overhead:

  • Reference counts: Extra field per object
  • Mark bits: Extra bit per object for marking
  • Generation metadata: Tracking generation information

Optimizing for Garbage Collection

Reduce allocation:

  • Object pooling: Reuse objects instead of creating new ones
  • Stack allocation: Use stack for temporary objects
  • Value types: Use value types instead of reference types

Improve locality:

  • Allocate together: Objects used together should be allocated together
  • Cache-friendly: Design data structures for good cache performance
  • Memory layout: Consider how objects are laid out in memory

The Bottom Line

Garbage collection is a powerful tool for automatic memory management, but it comes with trade-offs:

  • Mark-and-sweep: Simple but can cause pauses
  • Reference counting: Immediate cleanup but can't handle cycles
  • Generational: Efficient for most applications
  • Concurrent: Low pause times but complex implementation

The key is choosing the right garbage collection strategy for your specific use case. In high-performance systems, you might still prefer manual memory management for the control and predictability it provides. In other systems, garbage collection can significantly reduce development time and eliminate many memory-related bugs.

Remember: Garbage collection is not a silver bullet - it's a tool that trades complexity and control for convenience and safety. Understanding how it works helps you make informed decisions about when and how to use it.