Skip to content

Zero-Copy Techniques

Let's say we're building a high-performance web server that needs to serve thousands of files per second. Every time a client requests a file, our server reads it from disk, copies it to a buffer, then copies it to the network socket. That's a lot of copying! What if we could eliminate these copies?

Zero-copy techniques do exactly that. They allow data to move from one place to another without being copied through intermediate buffers. Let's explore how this works and why it matters for performance.

The Problem: Too Much Copying

Let's start with a simple example. You want to send a file over the network using traditional methods:

Traditional File Transfer (Multiple Copies)

cpp
// Traditional approach - lots of copying
FILE* file = fopen("data.txt", "rb");
char buffer[4096];
while (!feof(file)) {
    size_t bytes_read = fread(buffer, 1, sizeof(buffer), file);  // Copy 1: disk to buffer
    send(socket, buffer, bytes_read, 0);                         // Copy 2: buffer to socket
}

What happens here:

  1. Copy 1: Data moves from disk to kernel buffer (via fread)
  2. Copy 2: Data moves from kernel buffer to user buffer
  3. Copy 3: Data moves from user buffer to socket buffer (via send)
  4. Copy 4: Data moves from socket buffer to network interface

That's 4 copies for a single file transfer! Each copy consumes CPU cycles and memory bandwidth.

The Solution: Zero-Copy

Zero-copy techniques eliminate these intermediate copies by allowing the kernel to transfer data directly between sources and destinations.

Zero-Copy File Transfer

cpp
// Zero-copy approach - direct transfer
int file_fd = open("data.txt", O_RDONLY);
struct stat file_stat;
fstat(file_fd, &file_stat);
sendfile(socket, file_fd, NULL, file_stat.st_size);  // Direct transfer: file to socket

What happens here:

  1. Direct transfer: Data moves from disk directly to network interface
  2. No user buffer: No copying to user space
  3. Kernel handles everything: Single system call does the work

That's 1 copy instead of 4! The performance improvement can be dramatic.

Why Zero-Copy Matters

Why Copying is Expensive

Memory bandwidth: Modern systems have limited memory bandwidth. Each copy consumes this precious resource. CPU cycles: Copying requires CPU time to move data around. Cache pollution: Copies pollute the CPU cache with data that might not be needed again. Context switches: Traditional I/O often requires multiple system calls and context switches.

Performance Numbers

Traditional approach:

  • Latency: ~100-1000 μs per file
  • Throughput: Limited by memory bandwidth
  • CPU usage: High (lots of copying)

Zero-copy approach:

  • Latency: ~10-100 μs per file
  • Throughput: Limited by disk/network bandwidth
  • CPU usage: Low (minimal copying)

Zero-Copy Techniques

1. sendfile(): File to Socket Transfer

sendfile() is the most common zero-copy technique for file transfers. How it works:

cpp
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);
  • out_fd: Destination file descriptor (usually a socket)
  • in_fd: Source file descriptor (usually a file)
  • offset: Where to start reading from the file
  • count: How many bytes to transfer

Example: Web server serving files

cpp
// Serve a file directly to a client
int file_fd = open("index.html", O_RDONLY);
sendfile(client_socket, file_fd, NULL, file_size);

Benefits:

  • Eliminates user buffer copies
  • Single system call
  • Kernel optimizes the transfer
  • Works with any file descriptor

2. splice(): File Descriptor to File Descriptor

splice() is more general than sendfile() - it works between any file descriptors. How it works:

cpp
ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);

Example: Pipe to socket transfer

cpp
// Transfer data from a pipe to a socket
splice(pipe_fd, NULL, socket_fd, NULL, data_size, SPLICE_F_MOVE);

Benefits:

  • Works with any file descriptors
  • Can move data between pipes, sockets, files
  • Supports non-blocking operations
  • More flexible than sendfile()

3. tee(): Duplicate Data in a Pipe

tee() creates a copy of data in a pipe without consuming it. How it works:

cpp
ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags);

Example: Broadcasting data

cpp
// Send the same data to multiple destinations
tee(pipe_fd, broadcast_pipe_fd, data_size, 0);  // Copy to broadcast pipe
splice(pipe_fd, NULL, socket1_fd, NULL, data_size, 0);  // Send to client 1
splice(broadcast_pipe_fd, NULL, socket2_fd, NULL, data_size, 0);  // Send to client 2

Benefits:

  • Efficient broadcasting
  • No data copying to user space
  • Maintains data integrity

4. vmsplice(): User Memory to Kernel

vmsplice() allows efficient transfer of user memory to kernel space. How it works:

cpp
ssize_t vmsplice(int fd, const struct iovec *iov, unsigned long nr_segs, unsigned int flags);

Example: Efficient data transmission

cpp
struct iovec iov[2];
iov[0].iov_base = header_data;
iov[0].iov_len = header_size;
iov[1].iov_base = payload_data;
iov[1].iov_len = payload_size;

vmsplice(pipe_fd, iov, 2, SPLICE_F_GIFT);  // Transfer to kernel
splice(pipe_fd, NULL, socket_fd, NULL, total_size, 0);  // Send to network

Benefits:

  • Efficient user-to-kernel transfer
  • Works with scattered data (iovec)
  • Can "gift" memory to kernel (avoid copying back)

Real-World Applications

1. High-Performance Web Servers

Traditional web server:

cpp
// Read file into buffer, then send
FILE* file = fopen("large_file.dat", "rb");
char buffer[8192];
while (fread(buffer, 1, sizeof(buffer), file) > 0) {
    send(client_socket, buffer, bytes_read, 0);
}

Zero-copy web server:

cpp
// Direct file to socket transfer
int file_fd = open("large_file.dat", O_RDONLY);
sendfile(client_socket, file_fd, NULL, file_size);

Performance improvement: 2-5x faster, 50-80% less CPU usage.

2. Network File Systems

Example: NFS or SMB server

cpp
// Serve file data directly from storage to network
int file_fd = open(file_path, O_RDONLY);
sendfile(client_socket, file_fd, NULL, file_size);

Benefits: Reduced server load, better scalability.

3. Data Processing Pipelines

Example: Log processing

cpp
// Efficient log forwarding
int log_fd = open("app.log", O_RDONLY);
splice(log_fd, NULL, processing_pipe_fd, NULL, chunk_size, 0);

Benefits: High-throughput data movement between processes.

4. High-Frequency Trading Systems

Example: Market data distribution

cpp
// Efficient market data broadcasting
vmsplice(broadcast_pipe_fd, market_data_iov, num_segments, 0);
for (int i = 0; i < num_clients; i++) {
    splice(broadcast_pipe_fd, NULL, client_sockets[i], NULL, data_size, 0);
}

Benefits: Ultra-low latency data distribution.

Limitations and Considerations

1. Not Always Applicable

When zero-copy doesn't help:

  • Small data transfers (overhead of system call > copying)
  • Data that needs processing (must copy to user space anyway)
  • Complex protocols requiring packetization

2. Platform Dependencies

Linux: Excellent zero-copy support (sendfile, splice, tee, vmsplice) Windows: Limited support (mostly through TransmitFile) macOS: Good support (sendfile)

3. Memory Management

With vmsplice(..., SPLICE_F_GIFT):

  • Memory is "gifted" to kernel
  • Cannot be reused by application
  • Must be careful about memory allocation

4. Error Handling

Zero-copy operations can fail:

  • Insufficient kernel memory
  • File descriptor limitations
  • Network buffer full

Performance Measurement

Measuring Zero-Copy Benefits

What to measure:

  1. Throughput: MB/s transferred
  2. Latency: Time per transfer
  3. CPU usage: % CPU consumed
  4. Memory usage: Memory bandwidth consumed

Example benchmark:

cpp
// Measure traditional vs zero-copy performance
auto start = std::chrono::high_resolution_clock::now();

// Traditional approach
FILE* file = fopen("large_file.dat", "rb");
char buffer[8192];
while (fread(buffer, 1, sizeof(buffer), file) > 0) {
    send(socket, buffer, bytes_read, 0);
}

auto end = std::chrono::high_resolution_clock::now();
auto traditional_time = std::chrono::duration_cast<std::chrono::microseconds>(end - start);

// Zero-copy approach
start = std::chrono::high_resolution_clock::now();
int file_fd = open("large_file.dat", O_RDONLY);
sendfile(socket, file_fd, NULL, file_size);
end = std::chrono::high_resolution_clock::now();
auto zerocopy_time = std::chrono::duration_cast<std::chrono::microseconds>(end - start);

printf("Traditional: %ld μs, Zero-copy: %ld μs, Speedup: %.2fx\n", 
       traditional_time.count(), zerocopy_time.count(), 
       (double)traditional_time.count() / zerocopy_time.count());

Best Practices

1. Choose the Right Tool

Use sendfile() for:

  • File to socket transfers
  • Web servers
  • Simple file serving

Use splice() for:

  • Complex data pipelines
  • Non-file transfers
  • When you need more control

Use tee() for:

  • Broadcasting data
  • Data duplication without copying

Use vmsplice() for:

  • Efficient user-to-kernel transfer
  • Scattered data (iovec)

2. Handle Errors Gracefully

cpp
ssize_t result = sendfile(socket_fd, file_fd, NULL, file_size);
if (result == -1) {
    if (errno == EAGAIN) {
        // Handle non-blocking case
        // Retry or use select/poll
    } else {
        // Handle other errors
        perror("sendfile failed");
    }
}

3. Consider Buffer Sizes

Too small: More system calls, lower performance Too large: Memory pressure, potential blocking Rule of thumb: 64KB to 1MB chunks work well for most cases.

4. Monitor Performance

Key metrics:

  • System calls per second
  • Memory bandwidth usage
  • CPU usage per transfer
  • Network throughput

The Bottom Line

Zero-copy techniques are essential for high-performance systems:

  • Eliminate unnecessary data copying
  • Reduce CPU usage and memory bandwidth
  • Improve throughput and reduce latency
  • Scale better under load

The key is understanding when to use them and which technique fits your specific use case. For file serving, sendfile() is often the best choice. For complex pipelines, splice() provides more flexibility. For broadcasting, tee() is invaluable.

Remember: zero-copy isn't always the answer, but when it applies, the performance benefits can be dramatic. In high-frequency trading, web serving, and data processing, these techniques can make the difference between a system that scales and one that doesn't.

Questions

Q: What is the main benefit of zero-copy techniques?

Zero-copy reduces CPU usage, memory usage, and latency by eliminating data copying.

Q: Which system call enables zero-copy file transfer?

sendfile() enables zero-copy transfer from file to socket.

Q: What is the purpose of splice()?

splice() enables zero-copy transfer between any file descriptors.