Appearance
Zero-Copy Techniques
Let's say we're building a high-performance web server that needs to serve thousands of files per second. Every time a client requests a file, our server reads it from disk, copies it to a buffer, then copies it to the network socket. That's a lot of copying! What if we could eliminate these copies?
Zero-copy techniques do exactly that. They allow data to move from one place to another without being copied through intermediate buffers. Let's explore how this works and why it matters for performance.
The Problem: Too Much Copying
Let's start with a simple example. You want to send a file over the network using traditional methods:
Traditional File Transfer (Multiple Copies)
cpp
// Traditional approach - lots of copying
FILE* file = fopen("data.txt", "rb");
char buffer[4096];
while (!feof(file)) {
size_t bytes_read = fread(buffer, 1, sizeof(buffer), file); // Copy 1: disk to buffer
send(socket, buffer, bytes_read, 0); // Copy 2: buffer to socket
}What happens here:
- Copy 1: Data moves from disk to kernel buffer (via
fread) - Copy 2: Data moves from kernel buffer to user buffer
- Copy 3: Data moves from user buffer to socket buffer (via
send) - Copy 4: Data moves from socket buffer to network interface
That's 4 copies for a single file transfer! Each copy consumes CPU cycles and memory bandwidth.
The Solution: Zero-Copy
Zero-copy techniques eliminate these intermediate copies by allowing the kernel to transfer data directly between sources and destinations.
Zero-Copy File Transfer
cpp
// Zero-copy approach - direct transfer
int file_fd = open("data.txt", O_RDONLY);
struct stat file_stat;
fstat(file_fd, &file_stat);
sendfile(socket, file_fd, NULL, file_stat.st_size); // Direct transfer: file to socketWhat happens here:
- Direct transfer: Data moves from disk directly to network interface
- No user buffer: No copying to user space
- Kernel handles everything: Single system call does the work
That's 1 copy instead of 4! The performance improvement can be dramatic.
Why Zero-Copy Matters
Why Copying is Expensive
Memory bandwidth: Modern systems have limited memory bandwidth. Each copy consumes this precious resource. CPU cycles: Copying requires CPU time to move data around. Cache pollution: Copies pollute the CPU cache with data that might not be needed again. Context switches: Traditional I/O often requires multiple system calls and context switches.
Performance Numbers
Traditional approach:
- Latency: ~100-1000 μs per file
- Throughput: Limited by memory bandwidth
- CPU usage: High (lots of copying)
Zero-copy approach:
- Latency: ~10-100 μs per file
- Throughput: Limited by disk/network bandwidth
- CPU usage: Low (minimal copying)
Zero-Copy Techniques
1. sendfile(): File to Socket Transfer
sendfile() is the most common zero-copy technique for file transfers. How it works:
cpp
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);out_fd: Destination file descriptor (usually a socket)in_fd: Source file descriptor (usually a file)offset: Where to start reading from the filecount: How many bytes to transfer
Example: Web server serving files
cpp
// Serve a file directly to a client
int file_fd = open("index.html", O_RDONLY);
sendfile(client_socket, file_fd, NULL, file_size);Benefits:
- Eliminates user buffer copies
- Single system call
- Kernel optimizes the transfer
- Works with any file descriptor
2. splice(): File Descriptor to File Descriptor
splice() is more general than sendfile() - it works between any file descriptors. How it works:
cpp
ssize_t splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len, unsigned int flags);Example: Pipe to socket transfer
cpp
// Transfer data from a pipe to a socket
splice(pipe_fd, NULL, socket_fd, NULL, data_size, SPLICE_F_MOVE);Benefits:
- Works with any file descriptors
- Can move data between pipes, sockets, files
- Supports non-blocking operations
- More flexible than
sendfile()
3. tee(): Duplicate Data in a Pipe
tee() creates a copy of data in a pipe without consuming it. How it works:
cpp
ssize_t tee(int fd_in, int fd_out, size_t len, unsigned int flags);Example: Broadcasting data
cpp
// Send the same data to multiple destinations
tee(pipe_fd, broadcast_pipe_fd, data_size, 0); // Copy to broadcast pipe
splice(pipe_fd, NULL, socket1_fd, NULL, data_size, 0); // Send to client 1
splice(broadcast_pipe_fd, NULL, socket2_fd, NULL, data_size, 0); // Send to client 2Benefits:
- Efficient broadcasting
- No data copying to user space
- Maintains data integrity
4. vmsplice(): User Memory to Kernel
vmsplice() allows efficient transfer of user memory to kernel space. How it works:
cpp
ssize_t vmsplice(int fd, const struct iovec *iov, unsigned long nr_segs, unsigned int flags);Example: Efficient data transmission
cpp
struct iovec iov[2];
iov[0].iov_base = header_data;
iov[0].iov_len = header_size;
iov[1].iov_base = payload_data;
iov[1].iov_len = payload_size;
vmsplice(pipe_fd, iov, 2, SPLICE_F_GIFT); // Transfer to kernel
splice(pipe_fd, NULL, socket_fd, NULL, total_size, 0); // Send to networkBenefits:
- Efficient user-to-kernel transfer
- Works with scattered data (iovec)
- Can "gift" memory to kernel (avoid copying back)
Real-World Applications
1. High-Performance Web Servers
Traditional web server:
cpp
// Read file into buffer, then send
FILE* file = fopen("large_file.dat", "rb");
char buffer[8192];
while (fread(buffer, 1, sizeof(buffer), file) > 0) {
send(client_socket, buffer, bytes_read, 0);
}Zero-copy web server:
cpp
// Direct file to socket transfer
int file_fd = open("large_file.dat", O_RDONLY);
sendfile(client_socket, file_fd, NULL, file_size);Performance improvement: 2-5x faster, 50-80% less CPU usage.
2. Network File Systems
Example: NFS or SMB server
cpp
// Serve file data directly from storage to network
int file_fd = open(file_path, O_RDONLY);
sendfile(client_socket, file_fd, NULL, file_size);Benefits: Reduced server load, better scalability.
3. Data Processing Pipelines
Example: Log processing
cpp
// Efficient log forwarding
int log_fd = open("app.log", O_RDONLY);
splice(log_fd, NULL, processing_pipe_fd, NULL, chunk_size, 0);Benefits: High-throughput data movement between processes.
4. High-Frequency Trading Systems
Example: Market data distribution
cpp
// Efficient market data broadcasting
vmsplice(broadcast_pipe_fd, market_data_iov, num_segments, 0);
for (int i = 0; i < num_clients; i++) {
splice(broadcast_pipe_fd, NULL, client_sockets[i], NULL, data_size, 0);
}Benefits: Ultra-low latency data distribution.
Limitations and Considerations
1. Not Always Applicable
When zero-copy doesn't help:
- Small data transfers (overhead of system call > copying)
- Data that needs processing (must copy to user space anyway)
- Complex protocols requiring packetization
2. Platform Dependencies
Linux: Excellent zero-copy support (sendfile, splice, tee, vmsplice) Windows: Limited support (mostly through TransmitFile) macOS: Good support (sendfile)
3. Memory Management
With vmsplice(..., SPLICE_F_GIFT):
- Memory is "gifted" to kernel
- Cannot be reused by application
- Must be careful about memory allocation
4. Error Handling
Zero-copy operations can fail:
- Insufficient kernel memory
- File descriptor limitations
- Network buffer full
Performance Measurement
Measuring Zero-Copy Benefits
What to measure:
- Throughput: MB/s transferred
- Latency: Time per transfer
- CPU usage: % CPU consumed
- Memory usage: Memory bandwidth consumed
Example benchmark:
cpp
// Measure traditional vs zero-copy performance
auto start = std::chrono::high_resolution_clock::now();
// Traditional approach
FILE* file = fopen("large_file.dat", "rb");
char buffer[8192];
while (fread(buffer, 1, sizeof(buffer), file) > 0) {
send(socket, buffer, bytes_read, 0);
}
auto end = std::chrono::high_resolution_clock::now();
auto traditional_time = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
// Zero-copy approach
start = std::chrono::high_resolution_clock::now();
int file_fd = open("large_file.dat", O_RDONLY);
sendfile(socket, file_fd, NULL, file_size);
end = std::chrono::high_resolution_clock::now();
auto zerocopy_time = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
printf("Traditional: %ld μs, Zero-copy: %ld μs, Speedup: %.2fx\n",
traditional_time.count(), zerocopy_time.count(),
(double)traditional_time.count() / zerocopy_time.count());Best Practices
1. Choose the Right Tool
Use sendfile() for:
- File to socket transfers
- Web servers
- Simple file serving
Use splice() for:
- Complex data pipelines
- Non-file transfers
- When you need more control
Use tee() for:
- Broadcasting data
- Data duplication without copying
Use vmsplice() for:
- Efficient user-to-kernel transfer
- Scattered data (iovec)
2. Handle Errors Gracefully
cpp
ssize_t result = sendfile(socket_fd, file_fd, NULL, file_size);
if (result == -1) {
if (errno == EAGAIN) {
// Handle non-blocking case
// Retry or use select/poll
} else {
// Handle other errors
perror("sendfile failed");
}
}3. Consider Buffer Sizes
Too small: More system calls, lower performance Too large: Memory pressure, potential blocking Rule of thumb: 64KB to 1MB chunks work well for most cases.
4. Monitor Performance
Key metrics:
- System calls per second
- Memory bandwidth usage
- CPU usage per transfer
- Network throughput
The Bottom Line
Zero-copy techniques are essential for high-performance systems:
- Eliminate unnecessary data copying
- Reduce CPU usage and memory bandwidth
- Improve throughput and reduce latency
- Scale better under load
The key is understanding when to use them and which technique fits your specific use case. For file serving, sendfile() is often the best choice. For complex pipelines, splice() provides more flexibility. For broadcasting, tee() is invaluable.
Remember: zero-copy isn't always the answer, but when it applies, the performance benefits can be dramatic. In high-frequency trading, web serving, and data processing, these techniques can make the difference between a system that scales and one that doesn't.
Questions
Q: What is the main benefit of zero-copy techniques?
Zero-copy reduces CPU usage, memory usage, and latency by eliminating data copying.
Q: Which system call enables zero-copy file transfer?
sendfile() enables zero-copy transfer from file to socket.
Q: What is the purpose of splice()?
splice() enables zero-copy transfer between any file descriptors.