Skip to content

Process Creation and Lifecycle

Processes are the fundamental units of execution in operating systems. Understanding how they are created, how they relate to each other, and how they are properly terminated is crucial for writing robust, efficient programs. This knowledge is especially important for server applications, system utilities, and any program that needs to manage multiple processes.

Process Creation: The fork() System Call

What is fork()?

The fork() system call is the primary mechanism for creating new processes in Unix-like operating systems. It creates an exact copy of the calling process, resulting in two processes: the parent (original) and the child (newly created).

How fork() Works

cpp
#include <unistd.h>
#include <sys/wait.h>
#include <iostream>

int main() {
    std::cout << "Parent process starting (PID: " << getpid() << ")" << std::endl;

    // Create a child process
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // CHILD PROCESS
        std::cout << "Child process created (PID: " << getpid() << ")" << std::endl;
        std::cout << "Child's parent PID: " << getppid() << std::endl;

        // Child has its own copy of variables
        int child_var = 42;
        std::cout << "Child variable: " << child_var << std::endl;

        // Child process exits
        exit(0);
    } else if (child_pid > 0) {
        // PARENT PROCESS
        std::cout << "Parent process continuing (PID: " << getpid() << ")" << std::endl;
        std::cout << "Created child with PID: " << child_pid << std::endl;

        // Parent has its own copy of variables
        int parent_var = 100;
        std::cout << "Parent variable: " << parent_var << std::endl;

        // Wait for child to finish
        wait(nullptr);
        std::cout << "Child process completed!" << std::endl;
    } else {
        // FORK FAILED
        std::cerr << "Fork failed!" << std::endl;
        return 1;
    }

    return 0;
}

What Gets Copied During fork()

When fork() is called, the kernel creates a copy of the parent process with:

Copied Resources:

  • Memory space: Code, data, heap, and stack segments
  • File descriptors: Open files, sockets, pipes
  • Process attributes: User ID, group ID, working directory
  • Signal handlers: Signal disposition settings
  • Environment variables: Process environment

Not Copied (Unique to Child):

  • Process ID (PID): Child gets a new unique PID
  • Parent PID (PPID): Child's PPID is set to parent's PID
  • File locks: Child doesn't inherit file locks
  • Pending signals: Child starts with empty signal queue
  • Memory locks: Child doesn't inherit memory locks

fork() Return Values

The fork() system call has a unique return mechanism:

cpp
pid_t fork(void);

// Return values:
// > 0: Parent process - returns child's PID
// = 0: Child process - returns 0
// < 0: Error - returns -1

Why this design?

  • Parent needs child's PID: To track and manage the child
  • Child needs to know it's the child: To execute different code
  • Error handling: Negative values indicate failure

Copy-on-Write Optimization

The Problem with Traditional fork()

In the traditional implementation, fork() would immediately copy all of the parent's memory to the child. This is expensive:

cpp
// Traditional fork() - expensive
int main() {
    int large_array[1000000];  // 4MB of data
    // ... fill array with data ...

    pid_t child_pid = fork();  // Would copy 4MB immediately!

    if (child_pid == 0) {
        // Child might only read the data, not modify it
        std::cout << "Sum: " << calculate_sum(large_array) << std::endl;
        exit(0);
    } else {
        // Parent continues
        wait(nullptr);
    }
    return 0;
}

How Copy-on-Write Works

Copy-on-write (CoW) is an optimization where memory pages are shared between parent and child until one of them tries to write to a page. Only then is the page actually copied.

Step-by-step process:

  1. fork() called: Parent and child share the same physical memory pages
  2. Memory marked read-only: Both processes can read but not write
  3. Write attempt: When either process tries to write, a page fault occurs
  4. Page copied: Kernel copies the page and marks it writable for the writing process
  5. Execution continues: Both processes now have their own copy of that page

Implementation Details

cpp
// Copy-on-write example
int main() {
    int data[1000] = {1, 2, 3, 4, 5};  // Shared memory

    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child reading: " << data[0] << std::endl;  // No copy needed
        data[0] = 100;  // This triggers copy-on-write!
        std::cout << "Child modified: " << data[0] << std::endl;
        exit(0);
    } else {
        // Parent process
        std::cout << "Parent reading: " << data[0] << std::endl;  // No copy needed
        // Parent doesn't modify data, so no copy occurs
        wait(nullptr);
    }
    return 0;
}

Performance Benefits

Memory savings:

  • Without CoW: 8GB parent process → 16GB total after fork()
  • With CoW: 8GB parent process → 8GB total after fork() (until writes occur)

Time savings:

  • Without CoW: fork() takes time proportional to memory size
  • With CoW: fork() is nearly instantaneous

Example - Database server:

cpp
// Database with large buffer pool
class Database {
    char buffer_pool[8 * 1024 * 1024 * 1024];  // 8GB buffer

public:
    void start_backup_process() {
        pid_t backup_pid = fork();  // Nearly instant with CoW

        if (backup_pid == 0) {
            // Backup process - reads buffer pool, doesn't modify
            perform_backup(buffer_pool);
            exit(0);
        } else {
            // Main process continues serving requests
            continue_serving();
        }
    }
};

When Copy-on-Write Occurs

Triggers for CoW:

  • Any write operation: Modifying variables, arrays, or structures
  • Stack modifications: Function calls that modify stack variables
  • Heap modifications: malloc/free operations that modify heap metadata

What doesn't trigger CoW:

  • Read-only operations: Reading variables, arrays, or structures
  • Code execution: Running functions (code pages are typically read-only)
  • File operations: Reading/writing files (not memory pages)

Trade-offs

Advantages:

  • Fast fork(): Nearly instantaneous process creation
  • Memory efficient: Only copy what's actually modified
  • Transparent: Applications don't need to change

Disadvantages:

  • Unpredictable performance: First write to shared page is slower
  • Memory fragmentation: Can lead to fragmented memory layout
  • Complexity: Kernel must handle page faults and memory management

Real-World Impact

Web servers:

  • Apache: Uses fork() for each connection
  • Nginx: Uses fork() for worker processes
  • Without CoW: Would be prohibitively expensive

Database systems:

  • PostgreSQL: Uses fork() for connection handling
  • MySQL: Uses fork() for backup processes
  • With CoW: Can handle thousands of connections efficiently

Container systems:

  • Docker: Uses fork() for container processes
  • Kubernetes: Manages many containerized applications
  • CoW enables: Efficient container spawning and management

Process Synchronization: wait() and waitpid()

Why Wait for Children?

When a child process terminates, the kernel keeps its exit status until the parent collects it. If the parent doesn't collect this information, the child becomes a zombie process.

The wait() System Call

cpp
#include <sys/wait.h>

pid_t wait(int *status);
pid_t waitpid(pid_t pid, int *status, int options);

Basic wait() example:

cpp
int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child working..." << std::endl;
        sleep(2);
        exit(42);  // Exit with status 42
    } else {
        // Parent process
        int status;
        pid_t terminated_pid = wait(&status);

        std::cout << "Child " << terminated_pid << " terminated" << std::endl;

        if (WIFEXITED(status)) {
            std::cout << "Exit status: " << WEXITSTATUS(status) << std::endl;
        }
    }

    return 0;
}

wait() vs waitpid()

Featurewait()waitpid()
TargetAny childSpecific child by PID
BlockingAlways blocksCan be non-blocking
OptionsNoneMultiple options
FlexibilityLimitedHigh

waitpid() with options:

cpp
// Wait for specific child
waitpid(child_pid, &status, 0);

// Non-blocking wait
waitpid(child_pid, &status, WNOHANG);

// Wait for any child in group
waitpid(-1, &status, 0);

Status Macros

The status parameter contains encoded information about how the child terminated:

cpp
// Check how child terminated
if (WIFEXITED(status)) {
    // Normal termination
    int exit_code = WEXITSTATUS(status);
    std::cout << "Normal exit with code: " << exit_code << std::endl;
} else if (WIFSIGNALED(status)) {
    // Killed by signal
    int signal = WTERMSIG(status);
    std::cout << "Killed by signal: " << signal << std::endl;
} else if (WIFSTOPPED(status)) {
    // Stopped by signal
    int signal = WSTOPSIG(status);
    std::cout << "Stopped by signal: " << signal << std::endl;
}

Zombie Processes

What is a Zombie Process?

A zombie process is a process that has completed execution but still has an entry in the process table. The process is "dead" but not yet "buried" - its exit status is waiting to be collected by its parent.

How Zombies Are Created

cpp
// This creates a zombie process
int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child exits immediately
        std::cout << "Child exiting..." << std::endl;
        exit(0);
    } else {
        // Parent doesn't wait for child
        std::cout << "Parent continuing without waiting..." << std::endl;
        sleep(10);  // Parent sleeps, child becomes zombie

        // Check for zombies: ps aux | grep Z
    }

    return 0;
}

Zombie Process Characteristics

  • No memory usage: Zombies don't consume RAM
  • Process table entry: Takes up a slot in the process table
  • Cannot be killed: SIGKILL has no effect on zombies
  • Limited number: System has a maximum number of processes
  • Automatic cleanup: Reclaimed when parent calls wait()

Detecting Zombie Processes

cpp
# Check for zombie processes
ps aux | grep Z

# Count zombie processes
ps aux | grep -c Z

# More detailed zombie information
ps -eo pid,ppid,state,comm | grep Z

Orphaned Processes

What Happens When Parent Dies First?

When a parent process terminates before its children, the children become orphaned processes. The kernel automatically reassigns them to the init process (PID 1).

cpp
// This creates orphaned processes
int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child starting (PID: " << getpid() << ")" << std::endl;
        std::cout << "Child's parent: " << getppid() << std::endl;

        sleep(5);  // Child sleeps

        std::cout << "Child after parent died (PID: " << getpid() << ")" << std::endl;
        std::cout << "Child's new parent: " << getppid() << std::endl;  // Will be 1 (init)

        exit(0);
    } else {
        // Parent process
        std::cout << "Parent exiting..." << std::endl;
        exit(0);  // Parent dies, child becomes orphaned
    }
}

Init Process Adoption

The init process (PID 1) is responsible for:

  • Adopting orphans: Automatically becomes parent of orphaned processes
  • Cleaning up zombies: Collects exit status of adopted children
  • System stability: Ensures no processes are left without a parent

Preventing Zombie Processes

Method 1: Explicit wait()

cpp
int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child working..." << std::endl;
        sleep(2);
        exit(0);
    } else {
        // Parent waits for child
        wait(nullptr);
        std::cout << "Child completed, no zombie created" << std::endl;
    }

    return 0;
}

Method 2: Signal Handler for SIGCHLD

cpp
#include <signal.h>

void child_handler(int sig) {
    int status;
    pid_t pid;

    // Collect all terminated children
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        std::cout << "Child " << pid << " terminated" << std::endl;
    }
}

int main() {
    // Set up signal handler
    signal(SIGCHLD, child_handler);

    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child working..." << std::endl;
        sleep(2);
        exit(0);
    } else {
        // Parent continues without explicit wait
        std::cout << "Parent continuing..." << std::endl;
        sleep(5);  // Parent sleeps, signal handler will clean up child
    }

    return 0;
}

Method 3: Double fork() (Daemon Creation)

cpp
int main() {
    pid_t first_child = fork();

    if (first_child == 0) {
        // First child
        pid_t second_child = fork();

        if (second_child == 0) {
            // Second child (grandchild) - this becomes the daemon
            std::cout << "Daemon process (PID: " << getpid() << ")" << std::endl;
            std::cout << "Parent: " << getppid() << std::endl;  // Will be 1 (init)

            // Daemon work here
            sleep(10);
            exit(0);
        } else {
            // First child exits immediately
            exit(0);
        }
    } else {
        // Parent waits for first child
        wait(nullptr);
        std::cout << "First child completed, grandchild adopted by init" << std::endl;
    }

    return 0;
}

Process Lifecycle States

Complete Process State Machine

cpp
NEW → READY → RUNNING → WAITING → READY → RUNNING → TERMINATED
  ↓      ↓        ↓         ↓        ↓        ↓         ↓
Created  Ready   Executing  Blocked  Ready   Executing  Zombie

State Transitions

  1. NEW → READY: Process created, waiting for CPU
  2. READY → RUNNING: Scheduler selects process
  3. RUNNING → WAITING: Process blocks (I/O, sleep, wait)
  4. WAITING → READY: Event occurs, process unblocked
  5. RUNNING → READY: Time slice expired, preempted
  6. RUNNING → TERMINATED: Process exits
  7. TERMINATED → ZOMBIE: Exit status not collected

Monitoring Process States

cpp
// Get process state
#include <sys/types.h>
#include <sys/sysinfo.h>

int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child state: Running" << std::endl;
        sleep(5);
        exit(0);
    } else {
        // Parent process
        std::cout << "Parent monitoring child..." << std::endl;

        // Check child state
        char status_path[256];
        sprintf(status_path, "/proc/%d/status", child_pid);

        FILE* status_file = fopen(status_path, "r");
        if (status_file) {
            char line[256];
            while (fgets(line, sizeof(line), status_file)) {
                if (strncmp(line, "State:", 6) == 0) {
                    std::cout << "Child " << line;
                    break;
                }
            }
            fclose(status_file);
        }

        wait(nullptr);
    }

    return 0;
}

Real-World Examples

Web Server Process Management

cpp
// Simple web server with process-per-connection
int main() {
    int server_socket = create_server_socket(8080);

    while (true) {
        int client_socket = accept(server_socket, nullptr, nullptr);

        pid_t child_pid = fork();

        if (child_pid == 0) {
            // Child process handles client
            close(server_socket);  // Child doesn't need server socket

            handle_client(client_socket);
            close(client_socket);
            exit(0);
        } else {
            // Parent continues accepting connections
            close(client_socket);  // Parent doesn't need client socket

            // Clean up completed children
            int status;
            while (waitpid(-1, &status, WNOHANG) > 0) {
                // Child completed, continue
            }
        }
    }

    return 0;
}

Process Pool Pattern

cpp
// Process pool for handling multiple tasks
class ProcessPool {
private:
    std::vector<pid_t> workers;
    int pool_size;

public:
    ProcessPool(int size) : pool_size(size) {
        for (int i = 0; i < size; i++) {
            pid_t worker = fork();
            if (worker == 0) {
                // Worker process
                worker_loop();
                exit(0);
            } else {
                workers.push_back(worker);
            }
        }
    }

    ~ProcessPool() {
        // Clean up all workers
        for (pid_t worker : workers) {
            kill(worker, SIGTERM);
            waitpid(worker, nullptr, 0);
        }
    }

private:
    void worker_loop() {
        while (true) {
            // Wait for work
            Task task = get_next_task();
            if (task.is_valid()) {
                process_task(task);
            }
        }
    }
};

Key Takeaways

Process Creation Best Practices

  1. Always wait for children: Use wait() or waitpid() to prevent zombies
  2. Handle fork() failures: Check return values and handle errors
  3. Clean up resources: Close file descriptors in appropriate processes
  4. Use signal handlers: Set up SIGCHLD handlers for automatic cleanup
  5. Monitor process states: Understand what each state means

Common Pitfalls

  1. Zombie processes: Not calling wait() after fork()
  2. Resource leaks: Not closing file descriptors in child processes
  3. Race conditions: Not handling concurrent child termination
  4. Orphaned processes: Parent dying before children complete
  5. Signal conflicts: Multiple SIGCHLD handlers

Performance Considerations

  1. fork() overhead: Creating processes is expensive (1-3ms)
  2. Memory duplication: Child gets copy of parent's memory
  3. Context switching: Multiple processes increase scheduling overhead
  4. Resource sharing: File descriptors are shared, not duplicated
  5. Cleanup timing: Zombies consume process table entries