Skip to content

Virtual Memory Management

Video: But, what is Virtual Memory?

Virtual memory is a memory management technique that gives each program the illusion of having its own complete memory space, even when physical memory is limited.

The Problem Virtual Memory Solves

Imagine you have a computer with only 4GB of RAM, but you want to run:

  • A web browser (needs a maximum of 2GB)
  • A video editor (needs a maximum of 3GB)
  • A database (needs a maximum of 2GB)
  • An email client (needs a maximum of 1GB)

Total needed: 8GB Available: 4GB Without virtual memory: Only 2 programs could run at once With virtual memory: All programs can run simultaneously!

How Virtual Memory Works

Think of virtual memory like a banking system:

  1. Each program gets its own "account numbers" (virtual address space)
  2. Money (data) is stored in vaults (physical memory + disk)
  3. When you need money, the bank retrieves it from the vault (page fault handling)
  4. If the vaults are full, some money goes to long-term storage (swapping to disk)

Virtual Pages and Physical Frames

What are Virtual Pages?

A virtual page is a fixed-size block of memory that your program thinks it owns. Think of it like a bank account number - it's an abstract identifier (like account #12345) that represents where money is held, but the actual money might be stored in different physical locations.

What are Physical Frames?

A physical frame is the actual piece of physical memory (RAM) where your data is stored. Think of it like a vault or safe - it's the real physical location where money (data) is actually kept.

How They Work Together

Virtual pages are like bank account numbers - they're abstract identifiers that programs use to reference their data. Physical frames are like actual vaults where the real data is stored. Key Point: Multiple programs can share the same physical memory! When one program finishes using a physical frame, another program can reuse it for its own virtual pages. Just like how another bank account might share the same vault space if one account is closed.

The Connection: Page Tables

Page tables are like a bank's account ledger that maps virtual pages to physical frames:

cpp
Virtual Page 1 → Physical Frame 5
Virtual Page 2 → Physical Frame 12
Virtual Page 3 → Physical Frame 3
Virtual Page 4 → Not in memory (on disk)

How it works: When your program tries to access a virtual page (account number), the page table (ledger) tells the system which physical frame (vault) actually contains the data, or whether the data needs to be loaded from disk first.

If a page is not being used for some time, the OS will swap it to disk to make room for other pages. This allows the program to continue running even if it is not using all of its memory. It also enables the program to access more memory than is physically available.

Page Table Entry Structure

Each page table entry contains:

  • Physical Frame Number (52 bits): Points to the actual physical memory location
  • Control Bits (12 bits total): Represents the state of the page. For example, is this page in physical memory? Can this page be modified? Can user programs access this?

Multi-Level Page Tables

Modern systems use a 4-level 512-ary tree to manage large address spaces efficiently:

Level 4: 512 entries → Points to Level 3 tables Level 3: 512 entries → Points to Level 2 tables Level 2: 512 entries → Points to Level 1 tables Level 1: 512 entries → Points to physical memory pages Why 4 levels?

  • Efficiency: Only create page tables for memory regions you actually use
  • Space savings: Don't waste memory on unused address ranges
  • Scalability: Can handle massive address spaces (256TB on x86_64)

Address Translation: How It Works

Step-by-Step Process

When your program accesses memory address 0x1234567890ABCDEF:

  1. Split the address into parts:
    • Level 4 index: 0x123 (bits 39-47)
    • Level 3 index: 0x456 (bits 30-38)
    • Level 2 index: 0x789 (bits 21-29)
    • Level 1 index: 0x0AB (bits 12-20)
    • Page offset: 0xCDEF (bits 0-11)
  2. Follow the chain:
    • Look up entry 0x123 in Level 4 table
    • Follow pointer to Level 3 table, look up entry 0x456
    • Follow pointer to Level 2 table, look up entry 0x789
    • Follow pointer to Level 1 table, look up entry 0x0AB
    • Get physical frame number from Level 1 entry
  3. Calculate final address:
    • Combine physical frame number with page offset 0xCDEF

What Happens During Page Faults?

If any step fails, a page fault occurs:

Minor Page Fault: Page exists but is not loaded in memory

  • Load page from disk into physical memory
  • Update page table entry
  • Resume program execution

Major Page Fault: Page doesn't exist

  • Create new page in physical memory
  • Initialize with zeros
  • Update page table entry
  • Resume program execution

Protection Fault: Access violation

  • Program tried to access memory it doesn't have permission for
  • Operating system terminates the program

Page Replacement: When Memory is Full

The Problem

When physical memory is full and a new page is needed, the OS must decide which existing page to remove.

1. First-In-First-Out (FIFO)

How it works: Remove the page that has been in memory the longest. Real-world analogy: Like a queue at a restaurant - first person in line gets served first. Example:

cpp
Memory: [Page A, Page B, Page C, Page D]
New page needed: Page E
FIFO removes: Page A (oldest)
Result: [Page B, Page C, Page D, Page E]

Pros: Simple to implement Cons: Poor performance - doesn't consider if page is still being used. Therefore it is possible that a page that is actively being used is evicted from memory and would need to be loaded back into memory.

2. Least Recently Used (LRU)

How it works: Remove the page that hasn't been accessed for the longest time. Real-world analogy: Like a bank - accounts that haven't been accessed recently get moved to long-term storage. Example:

cpp
Memory: [Page A, Page B, Page C, Page D]
Access pattern: A, B, C, A, B, D
LRU removes: Page C (least recently used)

Pros: Good performance, considers actual usage Cons: Complex to implement, requires tracking access times

3. Clock Algorithm (Second Chance)

How it works: Give each page a "second chance" before removing it. Real-world analogy: Like a clock hand that moves around memory, giving pages a chance to prove they're still needed. How it works:

  1. Each page has a "reference bit" (R)
  2. When page is accessed, set R = 1
  3. Clock hand moves around memory
  4. If R = 1, set R = 0 and move to next page
  5. If R = 0, remove this page

Example:

cpp
Memory: [Page A(R=1), Page B(R=0), Page C(R=1), Page D(R=0)]
Clock hand at Page B
Action: Set R=0, move to Page C
Next: Set R=0, move to Page D
Next: Remove Page D (R=0)

Pros: Good performance, simple implementation Cons: Not as good as LRU but much simpler

Performance Comparison

AlgorithmPerformanceComplexityReal-world Use
FIFOPoorSimpleRarely used
LRUExcellentComplexUsed in theory
ClockGoodMediumUsed in practice

Thrashing: When the System Gets Overwhelmed

What is Thrashing?

ThrashingWhen the system spends more time swapping pages in and out of memory than doing actual work occurs when the system spends more time swapping pages in and out of memory than doing actual work.

Real-World Analogy

Think of a bank during a financial crisis:

  • Customers keep withdrawing and depositing the same money
  • Bank tellers spend all day moving money between vaults and storage
  • Very little actual banking gets done!

How to Detect Thrashing

Symptoms:

  • High page fault rate: 100+ page faults per second
  • Low CPU utilization: CPU is idle while waiting for disk I/O
  • Poor system responsiveness: Everything feels slow
  • High disk activity: Disk light constantly blinking

Example:

cpp
Normal system: 5 page faults/second, 80% CPU usage
Thrashing system: 200 page faults/second, 10% CPU usage

1. Reduce the Number of Active Processes

Example: Instead of running 50 processes, run only 20.

2. Increase Physical Memory

Example: Upgrade from 8GB to 16GB RAM.

3. Optimize Memory Usage

Example: Close unnecessary browser tabs, reduce database cache size.

4. Use Working Set Management

Working setThe set of pages a process needs to run efficiently: The set of pages a process needs to run efficiently.

Example:

  • Process A working set: 50MB
  • Process B working set: 30MB
  • Process C working set: 20MB
  • Total needed: 100MB
  • Available memory: 80MB
  • Solution: Suspend Process C until more memory is available

Fragmentation: Memory Wastage

1. External Fragmentation

What it is: Free memory exists but is scattered in small, non-contiguous blocks. Real-world analogy: Like a parking lot where cars are parked randomly, leaving small gaps that can't fit a large truck. Example:

cpp
Memory layout: [Process A][ ][Process B][ ][Process C][ ][Process D]
Free spaces: 30MB + 25MB + 10MB = 65MB total
Problem: Can't allocate 40MB because no single block is large enough

Impact:

  • Memory appears full even when plenty is available
  • Allocation requests fail
  • System performance degrades

2. Internal Fragmentation

What it is: Allocated memory blocks are larger than what was requested. Real-world analogy: Like buying a small item that comes in a large box - you pay for empty space. Example:

cpp
System uses 4KB pages:
- Request 1000 bytes → Get 4096 bytes (3096 bytes wasted)
- Request 5000 bytes → Get 8192 bytes (3092 bytes wasted)
- Request 2000 bytes → Get 4096 bytes (2096 bytes wasted)
Total waste: 8,284 bytes out of 16,384 bytes (51% waste)

Impact:

  • Reduced memory efficiency
  • Higher memory costs
  • Slower performance

1. Memory Compaction

What it is: Moving allocated blocks together to create large free spaces. Real-world analogy: Like reorganizing books on a shelf to create one large empty space. Example:

cpp
Before: [A][ ][B][ ][C][ ][D]
After:  [A][B][C][D][        ]

Pros: Eliminates external fragmentation Cons: Expensive operation, requires updating all pointers

2. Buddy Allocation

What it is: Always allocate memory in power-of-2 sizes. Real-world analogy: Like cutting paper - always cut in half, then half again. Example:

cpp
Request 3KB → Round up to 4KB → Split 8KB block → Allocate 4KB

Pros: Predictable, easy to merge adjacent blocks Cons: Can cause internal fragmentation

3. Slab Allocation

What it is: Pre-allocate pools for commonly used sizes. Real-world analogy: Like a restaurant that pre-cooks popular dishes. Example:

cpp
Pool 1: 1000 blocks of 64 bytes each
Pool 2: 500 blocks of 128 bytes each
Pool 3: 250 blocks of 256 bytes each

Pros: Fast allocation, no fragmentation within pools Cons: Only works for common object sizes

Performance Optimization

Spatial Locality

What it is: Tendency to access memory locations near recently accessed locations. Real-world analogy: Like reading a book - you're more likely to read the next page than jump to a random page. Good examples:

  • Sequential array access: for (int i = 0; i < 1000; i++) array[i] = 0;
  • Matrix operations: Accessing elements row by row
  • File reading: Reading consecutive bytes

Performance impact:

  • Good spatial locality: Fewer page faults, better cache performance
  • Poor spatial locality: More page faults, slower execution

Temporal Locality

What it is: Tendency to access the same memory locations repeatedly. Real-world analogy: Like your desk - you keep frequently used items close at hand. Good examples:

  • Loop variables: The counter i is accessed repeatedly
  • Function parameters: Accessed multiple times within a function
  • Global variables: Frequently used data

Performance impact:

  • Good temporal locality: Data stays in cache, fast access
  • Poor temporal locality: Data gets evicted from cache, slow access

Optimization Strategies

In systems where performance is critical, it is important to avoid TLB misses and page faults. A page fault can cause milliseconds of latency where your program is waiting for the page to be loaded into memory and sitting idle. That is wasted time that could be used to process more data! Therefore you must be aware of the following strategies to optimize your program's memory usage.

1. Page Size Selection

Large Pages (2MB, 1GB):

  • Pros: Fewer page table entries, better for large datasets
  • Cons: More internal fragmentation

Small Pages (4KB):

  • Pros: Less internal fragmentation, better memory utilization
  • Cons: More page table entries, more TLB misses

When to use each:

  • Large pages: Databases, scientific computing
  • Small pages: General applications, web servers

1.5. Hugepages

What are hugepages? Hugepages are memory pages that are significantly larger than the standard 4KB pages. Common sizes include 2MB and 1GB pages on x86_64 systems. How hugepages help performance:

  1. Reduced TLB Pressure: The Translation Lookaside Buffer (TLB) is a small, fast cache that stores recent virtual-to-physical address translations. With standard 4KB pages, the TLB can only cache a limited number of translations. Hugepages dramatically increase the amount of memory that can be cached in the TLB.

Example: If your TLB can hold 64 entries:

  • With 4KB pages: 64 × 4KB = 256KB of memory cached
  • With 2MB hugepages: 64 × 2MB = 128MB of memory cached
  1. Fewer Page Table Entries: Large pages require fewer page table entries, reducing memory overhead and improving page table walk performance.
  2. Better Memory Locality: Hugepages ensure that large data structures are stored in contiguous physical memory, improving cache performance and reducing memory access latency.

When to use hugepages:

  • Databases: Large database buffers and working sets
  • Scientific Computing: Large matrices and datasets
  • High-Performance Computing: Applications with large memory footprints
  • Virtual Machines: Large guest memory allocations

Example - Database Performance:

cpp
Standard 4KB pages:
- Database buffer: 8GB
- TLB entries needed: ~2 million
- TLB misses: Frequent, causing page table walks

2MB hugepages:
- Database buffer: 8GB  
- TLB entries needed: ~4,000
- TLB misses: Rare, much better performance

How to use hugepages:

  1. Reserve hugepages at boot time:
cpp
# In /etc/default/grub
   GRUB_CMDLINE_LINUX="hugepagesz=2M hugepages=1024"
  1. Mount hugepage filesystem:
cpp
mount -t hugetlbfs nodev /mnt/huge
  1. Allocate hugepages in your application:
cpp
// C example
   void *ptr = mmap(NULL, size, PROT_READ | PROT_WRITE,
                   MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
                   -1, 0);

Trade-offs:

  • Pros: Better performance for large memory workloads, reduced TLB misses
  • Cons: More internal fragmentation, requires system configuration, not suitable for small allocations

2. Memory Mapping

What it is: Using memory-mapped files instead of traditional file I/O. Real-world analogy: Instead of copying a book page by page, put the entire book on a table and read directly from it. Benefits:

  • Faster access (no system call overhead)
  • Shared memory between processes
  • Lazy loading (pages loaded only when accessed)

Examples:

  • Database files for fast query processing
  • Configuration files for quick access
  • Large data files to avoid loading entire file

3. Prefetching

What it is: Loading data into memory before it's needed. Real-world analogy: Like a waiter who brings your next course before you finish the current one. Types:

  • Hardware prefetching: CPU automatically detects patterns
  • Software prefetching: Program explicitly requests data
  • OS prefetching: Operating system predicts and loads pages

Examples:

  • Sequential file reading: Prefetch next few pages
  • Array processing: Prefetch next array elements
  • Database queries: Prefetch related data pages

Key Takeaways

Virtual Memory Concepts

  1. Virtual pages provide abstraction and protection
  2. Page tables map virtual to physical addresses
  3. Multi-level page tables reduce memory overhead
  4. TLB caches page table entries for performance

Page Replacement

  1. FIFO: Simple but poor performance
  2. LRU: Good performance but complex implementation
  3. Clock: Good compromise between performance and complexity
  4. Working set: Important for thrashing prevention

Thrashing

  1. High page fault rate with low CPU utilization
  2. Working set exceeds available physical memory
  3. Prevention: Reduce processes, increase memory, optimize patterns
  4. Detection: Monitor page fault rates and CPU utilization

Fragmentation

  1. External: Free memory scattered in small blocks
  2. Internal: Allocated blocks larger than requested
  3. Mitigation: Compaction, buddy allocation, slab allocation
  4. Impact: Reduced memory utilization and performance

Performance Optimization

  1. Spatial locality: Access nearby memory locations
  2. Temporal locality: Reuse recently accessed data
  3. Page size: Choose appropriate page size for workload
  4. Memory mapping: Use mmap() for large I/O operations
  5. NUMA awareness: Allocate memory on local nodes