Process Creation and Lifecycle

Processes are the fundamental units of execution in operating systems. Understanding how they are created, how they relate to each other, and how they are properly terminated is crucial for writing robust, efficient programs. This knowledge is especially important for server applications, system utilities, and any program that needs to manage multiple processes.

Process Creation: The fork() System Call

What is fork()?

The fork() system call is the primary mechanism for creating new processes in Unix-like operating systems. It creates an exact copy of the calling process, resulting in two processes: the parent (original) and the child (newly created).

How fork() Works

cpp

#include <unistd.h>
#include <sys/wait.h>
#include <iostream>

int main() {
    std::cout << "Parent process starting (PID: " << getpid() << ")" << std::endl;

    // Create a child process
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // CHILD PROCESS
        std::cout << "Child process created (PID: " << getpid() << ")" << std::endl;
        std::cout << "Child's parent PID: " << getppid() << std::endl;

        // Child has its own copy of variables
        int child_var = 42;
        std::cout << "Child variable: " << child_var << std::endl;

        // Child process exits
        exit(0);
    } else if (child_pid > 0) {
        // PARENT PROCESS
        std::cout << "Parent process continuing (PID: " << getpid() << ")" << std::endl;
        std::cout << "Created child with PID: " << child_pid << std::endl;

        // Parent has its own copy of variables
        int parent_var = 100;
        std::cout << "Parent variable: " << parent_var << std::endl;

        // Wait for child to finish
        wait(nullptr);
        std::cout << "Child process completed!" << std::endl;
    } else {
        // FORK FAILED
        std::cerr << "Fork failed!" << std::endl;
        return 1;
    }

    return 0;
}

What Gets Copied During fork()

When fork() is called, the kernel creates a copy of the parent process with:

Copied Resources:

Memory space: Code, data, heap, and stack segments
File descriptors: Open files, sockets, pipes
Process attributes: User ID, group ID, working directory
Signal handlers: Signal disposition settings
Environment variables: Process environment

Not Copied (Unique to Child):

Process ID (PID): Child gets a new unique PID
Parent PID (PPID): Child's PPID is set to parent's PID
File locks: Child doesn't inherit file locks
Pending signals: Child starts with empty signal queue
Memory locks: Child doesn't inherit memory locks

fork() Return Values

The fork() system call has a unique return mechanism:

cpp

pid_t fork(void);

// Return values:
// > 0: Parent process - returns child's PID
// = 0: Child process - returns 0
// < 0: Error - returns -1

Why this design?

Parent needs child's PID: To track and manage the child
Child needs to know it's the child: To execute different code
Error handling: Negative values indicate failure

Copy-on-Write Optimization

The Problem with Traditional fork()

In the traditional implementation, fork() would immediately copy all of the parent's memory to the child. This is expensive:

cpp

// Traditional fork() - expensive
int main() {
    int large_array[1000000];  // 4MB of data
    // ... fill array with data ...

    pid_t child_pid = fork();  // Would copy 4MB immediately!

    if (child_pid == 0) {
        // Child might only read the data, not modify it
        std::cout << "Sum: " << calculate_sum(large_array) << std::endl;
        exit(0);
    } else {
        // Parent continues
        wait(nullptr);
    }
    return 0;
}

How Copy-on-Write Works

Copy-on-write (CoW) is an optimization where memory pages are shared between parent and child until one of them tries to write to a page. Only then is the page actually copied.

Step-by-step process:

fork() called: Parent and child share the same physical memory pages
Memory marked read-only: Both processes can read but not write
Write attempt: When either process tries to write, a page fault occurs
Page copied: Kernel copies the page and marks it writable for the writing process
Execution continues: Both processes now have their own copy of that page

Implementation Details

cpp

// Copy-on-write example
int main() {
    int data[1000] = {1, 2, 3, 4, 5};  // Shared memory

    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child reading: " << data[0] << std::endl;  // No copy needed
        data[0] = 100;  // This triggers copy-on-write!
        std::cout << "Child modified: " << data[0] << std::endl;
        exit(0);
    } else {
        // Parent process
        std::cout << "Parent reading: " << data[0] << std::endl;  // No copy needed
        // Parent doesn't modify data, so no copy occurs
        wait(nullptr);
    }
    return 0;
}

Performance Benefits

Memory savings:

Without CoW: 8GB parent process → 16GB total after fork()
With CoW: 8GB parent process → 8GB total after fork() (until writes occur)

Time savings:

Without CoW: fork() takes time proportional to memory size
With CoW: fork() is nearly instantaneous

Example - Database server:

cpp

// Database with large buffer pool
class Database {
    char buffer_pool[8 * 1024 * 1024 * 1024];  // 8GB buffer

public:
    void start_backup_process() {
        pid_t backup_pid = fork();  // Nearly instant with CoW

        if (backup_pid == 0) {
            // Backup process - reads buffer pool, doesn't modify
            perform_backup(buffer_pool);
            exit(0);
        } else {
            // Main process continues serving requests
            continue_serving();
        }
    }
};

When Copy-on-Write Occurs

Triggers for CoW:

Any write operation: Modifying variables, arrays, or structures
Stack modifications: Function calls that modify stack variables
Heap modifications: malloc/free operations that modify heap metadata

What doesn't trigger CoW:

Read-only operations: Reading variables, arrays, or structures
Code execution: Running functions (code pages are typically read-only)
File operations: Reading/writing files (not memory pages)

Trade-offs

Advantages:

Fast fork(): Nearly instantaneous process creation
Memory efficient: Only copy what's actually modified
Transparent: Applications don't need to change

Disadvantages:

Unpredictable performance: First write to shared page is slower
Memory fragmentation: Can lead to fragmented memory layout
Complexity: Kernel must handle page faults and memory management

Real-World Impact

Web servers:

Apache: Uses fork() for each connection
Nginx: Uses fork() for worker processes
Without CoW: Would be prohibitively expensive

Database systems:

PostgreSQL: Uses fork() for connection handling
MySQL: Uses fork() for backup processes
With CoW: Can handle thousands of connections efficiently

Container systems:

Docker: Uses fork() for container processes
Kubernetes: Manages many containerized applications
CoW enables: Efficient container spawning and management

Process Synchronization: wait() and waitpid()

Why Wait for Children?

When a child process terminates, the kernel keeps its exit status until the parent collects it. If the parent doesn't collect this information, the child becomes a zombie process.

The wait() System Call

cpp

#include <sys/wait.h>

pid_t wait(int *status);
pid_t waitpid(pid_t pid, int *status, int options);

Basic wait() example:

cpp

int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child working..." << std::endl;
        sleep(2);
        exit(42);  // Exit with status 42
    } else {
        // Parent process
        int status;
        pid_t terminated_pid = wait(&status);

        std::cout << "Child " << terminated_pid << " terminated" << std::endl;

        if (WIFEXITED(status)) {
            std::cout << "Exit status: " << WEXITSTATUS(status) << std::endl;
        }
    }

    return 0;
}

wait() vs waitpid()

Feature	wait()	waitpid()
Target	Any child	Specific child by PID
Blocking	Always blocks	Can be non-blocking
Options	None	Multiple options
Flexibility	Limited	High

waitpid() with options:

cpp

// Wait for specific child
waitpid(child_pid, &status, 0);

// Non-blocking wait
waitpid(child_pid, &status, WNOHANG);

// Wait for any child in group
waitpid(-1, &status, 0);

Status Macros

The status parameter contains encoded information about how the child terminated:

cpp

// Check how child terminated
if (WIFEXITED(status)) {
    // Normal termination
    int exit_code = WEXITSTATUS(status);
    std::cout << "Normal exit with code: " << exit_code << std::endl;
} else if (WIFSIGNALED(status)) {
    // Killed by signal
    int signal = WTERMSIG(status);
    std::cout << "Killed by signal: " << signal << std::endl;
} else if (WIFSTOPPED(status)) {
    // Stopped by signal
    int signal = WSTOPSIG(status);
    std::cout << "Stopped by signal: " << signal << std::endl;
}

Zombie Processes

What is a Zombie Process?

A zombie process is a process that has completed execution but still has an entry in the process table. The process is "dead" but not yet "buried" - its exit status is waiting to be collected by its parent.

How Zombies Are Created

cpp

// This creates a zombie process
int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child exits immediately
        std::cout << "Child exiting..." << std::endl;
        exit(0);
    } else {
        // Parent doesn't wait for child
        std::cout << "Parent continuing without waiting..." << std::endl;
        sleep(10);  // Parent sleeps, child becomes zombie

        // Check for zombies: ps aux | grep Z
    }

    return 0;
}

Zombie Process Characteristics

No memory usage: Zombies don't consume RAM
Process table entry: Takes up a slot in the process table
Cannot be killed: SIGKILL has no effect on zombies
Limited number: System has a maximum number of processes
Automatic cleanup: Reclaimed when parent calls wait()

Detecting Zombie Processes

cpp

# Check for zombie processes
ps aux | grep Z

# Count zombie processes
ps aux | grep -c Z

# More detailed zombie information
ps -eo pid,ppid,state,comm | grep Z

Orphaned Processes

What Happens When Parent Dies First?

When a parent process terminates before its children, the children become orphaned processes. The kernel automatically reassigns them to the init process (PID 1).

cpp

// This creates orphaned processes
int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child starting (PID: " << getpid() << ")" << std::endl;
        std::cout << "Child's parent: " << getppid() << std::endl;

        sleep(5);  // Child sleeps

        std::cout << "Child after parent died (PID: " << getpid() << ")" << std::endl;
        std::cout << "Child's new parent: " << getppid() << std::endl;  // Will be 1 (init)

        exit(0);
    } else {
        // Parent process
        std::cout << "Parent exiting..." << std::endl;
        exit(0);  // Parent dies, child becomes orphaned
    }
}

Init Process Adoption

The init process (PID 1) is responsible for:

Adopting orphans: Automatically becomes parent of orphaned processes
Cleaning up zombies: Collects exit status of adopted children
System stability: Ensures no processes are left without a parent

Preventing Zombie Processes

Method 1: Explicit wait()

cpp

int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child working..." << std::endl;
        sleep(2);
        exit(0);
    } else {
        // Parent waits for child
        wait(nullptr);
        std::cout << "Child completed, no zombie created" << std::endl;
    }

    return 0;
}

Method 2: Signal Handler for SIGCHLD

cpp

#include <signal.h>

void child_handler(int sig) {
    int status;
    pid_t pid;

    // Collect all terminated children
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        std::cout << "Child " << pid << " terminated" << std::endl;
    }
}

int main() {
    // Set up signal handler
    signal(SIGCHLD, child_handler);

    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child working..." << std::endl;
        sleep(2);
        exit(0);
    } else {
        // Parent continues without explicit wait
        std::cout << "Parent continuing..." << std::endl;
        sleep(5);  // Parent sleeps, signal handler will clean up child
    }

    return 0;
}

Method 3: Double fork() (Daemon Creation)

cpp

int main() {
    pid_t first_child = fork();

    if (first_child == 0) {
        // First child
        pid_t second_child = fork();

        if (second_child == 0) {
            // Second child (grandchild) - this becomes the daemon
            std::cout << "Daemon process (PID: " << getpid() << ")" << std::endl;
            std::cout << "Parent: " << getppid() << std::endl;  // Will be 1 (init)

            // Daemon work here
            sleep(10);
            exit(0);
        } else {
            // First child exits immediately
            exit(0);
        }
    } else {
        // Parent waits for first child
        wait(nullptr);
        std::cout << "First child completed, grandchild adopted by init" << std::endl;
    }

    return 0;
}

Process Lifecycle States

Complete Process State Machine

cpp

NEW → READY → RUNNING → WAITING → READY → RUNNING → TERMINATED
  ↓      ↓        ↓         ↓        ↓        ↓         ↓
Created  Ready   Executing  Blocked  Ready   Executing  Zombie

State Transitions

NEW → READY: Process created, waiting for CPU
READY → RUNNING: Scheduler selects process
RUNNING → WAITING: Process blocks (I/O, sleep, wait)
WAITING → READY: Event occurs, process unblocked
RUNNING → READY: Time slice expired, preempted
RUNNING → TERMINATED: Process exits
TERMINATED → ZOMBIE: Exit status not collected

Monitoring Process States

cpp

// Get process state
#include <sys/types.h>
#include <sys/sysinfo.h>

int main() {
    pid_t child_pid = fork();

    if (child_pid == 0) {
        // Child process
        std::cout << "Child state: Running" << std::endl;
        sleep(5);
        exit(0);
    } else {
        // Parent process
        std::cout << "Parent monitoring child..." << std::endl;

        // Check child state
        char status_path[256];
        sprintf(status_path, "/proc/%d/status", child_pid);

        FILE* status_file = fopen(status_path, "r");
        if (status_file) {
            char line[256];
            while (fgets(line, sizeof(line), status_file)) {
                if (strncmp(line, "State:", 6) == 0) {
                    std::cout << "Child " << line;
                    break;
                }
            }
            fclose(status_file);
        }

        wait(nullptr);
    }

    return 0;
}

Real-World Examples

Web Server Process Management

cpp

// Simple web server with process-per-connection
int main() {
    int server_socket = create_server_socket(8080);

    while (true) {
        int client_socket = accept(server_socket, nullptr, nullptr);

        pid_t child_pid = fork();

        if (child_pid == 0) {
            // Child process handles client
            close(server_socket);  // Child doesn't need server socket

            handle_client(client_socket);
            close(client_socket);
            exit(0);
        } else {
            // Parent continues accepting connections
            close(client_socket);  // Parent doesn't need client socket

            // Clean up completed children
            int status;
            while (waitpid(-1, &status, WNOHANG) > 0) {
                // Child completed, continue
            }
        }
    }

    return 0;
}

Process Pool Pattern

cpp

// Process pool for handling multiple tasks
class ProcessPool {
private:
    std::vector<pid_t> workers;
    int pool_size;

public:
    ProcessPool(int size) : pool_size(size) {
        for (int i = 0; i < size; i++) {
            pid_t worker = fork();
            if (worker == 0) {
                // Worker process
                worker_loop();
                exit(0);
            } else {
                workers.push_back(worker);
            }
        }
    }

    ~ProcessPool() {
        // Clean up all workers
        for (pid_t worker : workers) {
            kill(worker, SIGTERM);
            waitpid(worker, nullptr, 0);
        }
    }

private:
    void worker_loop() {
        while (true) {
            // Wait for work
            Task task = get_next_task();
            if (task.is_valid()) {
                process_task(task);
            }
        }
    }
};

Key Takeaways

Process Creation Best Practices

Always wait for children: Use wait() or waitpid() to prevent zombies
Handle fork() failures: Check return values and handle errors
Clean up resources: Close file descriptors in appropriate processes
Use signal handlers: Set up SIGCHLD handlers for automatic cleanup
Monitor process states: Understand what each state means

Common Pitfalls

Zombie processes: Not calling wait() after fork()
Resource leaks: Not closing file descriptors in child processes
Race conditions: Not handling concurrent child termination
Orphaned processes: Parent dying before children complete
Signal conflicts: Multiple SIGCHLD handlers

Performance Considerations

fork() overhead: Creating processes is expensive (1-3ms)
Memory duplication: Child gets copy of parent's memory
Context switching: Multiple processes increase scheduling overhead
Resource sharing: File descriptors are shared, not duplicated
Cleanup timing: Zombies consume process table entries

Process Creation and Lifecycle ​

Process Creation: The fork() System Call ​

What is fork()? ​

How fork() Works ​

What Gets Copied During fork() ​

fork() Return Values ​

Copy-on-Write Optimization ​

The Problem with Traditional fork() ​

How Copy-on-Write Works ​

Implementation Details ​

Performance Benefits ​

When Copy-on-Write Occurs ​

Trade-offs ​

Real-World Impact ​

Process Synchronization: wait() and waitpid() ​

Why Wait for Children? ​

The wait() System Call ​

wait() vs waitpid() ​

Status Macros ​

Zombie Processes ​

What is a Zombie Process? ​

How Zombies Are Created ​

Zombie Process Characteristics ​

Detecting Zombie Processes ​

Orphaned Processes ​

What Happens When Parent Dies First? ​

Init Process Adoption ​

Preventing Zombie Processes ​

Method 1: Explicit wait() ​

Method 2: Signal Handler for SIGCHLD ​

Method 3: Double fork() (Daemon Creation) ​

Process Lifecycle States ​

Complete Process State Machine ​

State Transitions ​

Monitoring Process States ​

Real-World Examples ​

Web Server Process Management ​

Process Pool Pattern ​

Key Takeaways ​

Process Creation Best Practices ​

Common Pitfalls ​

Performance Considerations ​