False Sharing & Cache Performance

False sharing is one of the most insidious performance problems in multithreaded programming. It occurs when different threads access different variables that happen to be on the same cache line, causing unnecessary cache invalidations.

Modern CPUs use cache lines (typically 64 bytes) to transfer data between memory and cache. When one thread modifies a variable, the entire cache line is invalidated.

cpp

struct SharedData {
    std::atomic<int> counter1{0};  // Thread 1 writes here
    std::atomic<int> counter2{0};  // Thread 2 writes here
};

SharedData data;

// Thread 1
void thread1() {
    for (int i = 0; i < 1000000; ++i) {
        data.counter1.fetch_add(1);  // Invalidates entire cache line
    }
}

// Thread 2
void thread2() {
    for (int i = 0; i < 1000000; ++i) {
        data.counter2.fetch_add(1);  // Also invalidates the same cache line!
    }
}

Problem: Both counters are on the same cache line, causing constant cache invalidations.

Cache Line Basics

What is a Cache Line?

Size: Typically 64 bytes on modern processors
Purpose: Unit of data transfer between memory and cache
Behavior: When any byte in a cache line is modified, the entire line is invalidated

Memory Layout Example

cpp

Cache Line 1 (64 bytes):
[ counter1 (4 bytes) | counter2 (4 bytes) | padding (56 bytes) ]

Cache Line 2 (64 bytes):
[ other data... ]

The simplest way to solve false sharing is to ensure that each shared variable is aligned to the cache line size. This ensures that each thread accesses a different cache line, and therefore does not cause false sharing.

C++ provides the alignas keyword to align a variable to a specific byte boundary. The std::hardware_destructive_interference_size is a compile-time constant that is the size of the cache line. Combined, alignas(std::hardware_destructive_interference_size) ensures that each shared variable is aligned to its own cache line.

cpp

#include <new>

struct CacheLineAligned {
    alignas(std::hardware_destructive_interference_size) std::atomic<int> counter1{0};
    alignas(std::hardware_destructive_interference_size) std::atomic<int> counter2{0};
};

Real-World Examples

1. Per-Thread Counters

cpp

// WRONG - False sharing
struct alignas(std::hardware_destructive_interference_size) ThreadCounters {
    std::atomic<int> counters[16];  // 16 threads, 4 bytes each = 64 bytes
};

// CORRECT - Cache line aligned
struct alignas(std::hardware_destructive_interference_size) ThreadCounters {
    struct alignas(std::hardware_destructive_interference_size) Counter {
        std::atomic<int> value{0};
    };
    Counter counters[16];
};

2. Lock-Free Queue

cpp

template<typename T>
class LockFreeQueue {
private:
    struct alignas(std::hardware_destructive_interference_size) Node {
        T data;
        std::atomic<Node*> next{nullptr};
    }; // Two nodes can never be on the same cache line

    // head and tail should be on different cache lines
    struct alignas(std::hardware_destructive_interference_size) std::atomic<Node*> head{nullptr};
    struct alignas(std::hardware_destructive_interference_size) std::atomic<Node*> tail{nullptr};
};

Questions

Q: What is false sharing?

False sharing occurs when different threads access different variables that happen to be on the same cache line, causing unnecessary cache invalidations and performance degradation.

Q: What is the typical size of a cache line on modern processors?

Most modern processors use 64-byte cache lines. This means that when one thread modifies a variable, the entire 64-byte cache line is invalidated for other threads.

Q: How can you prevent false sharing?

Adding padding between variables ensures they're on different cache lines, preventing false sharing. This is often done using alignas or manual padding.

Q: Which of the following is most likely to cause false sharing?

Two atomic counters in the same struct without padding are most likely to be on the same cache line, causing false sharing when accessed by different threads.

Q: What is the performance impact of false sharing?

False sharing can cause significant performance degradation (2-10x slower) due to constant cache invalidations and memory bus contention between cores.

Q: Which alignment is typically used to prevent false sharing?

64-byte alignment is typically used to prevent false sharing, as it ensures variables are on different cache lines (which are usually 64 bytes).

Q: What is the purpose of std::hardware_destructive_interference_size?

std::hardware_destructive_interference_size provides the minimum distance between variables to avoid false sharing, typically equal to the cache line size.

Q: Which data structure is most vulnerable to false sharing?

An array of counters accessed by different threads is most vulnerable to false sharing because adjacent array elements are likely to be on the same cache line.

False Sharing & Cache Performance ​

The Problem: Cache Line Sharing ​

Example: False Sharing ​

Cache Line Basics ​

What is a Cache Line? ​

Memory Layout Example ​

Using alignment to solve false sharing ​

Real-World Examples ​

1. Per-Thread Counters ​

2. Lock-Free Queue ​

Questions ​