Appearance
Pipelined Processors and Instruction Execution
Video: CPU Pipeline - Computerphile
A pipelined processor divides instruction execution into multiple stages, allowing different instructions to be processed simultaneously. This creates an assembly line effect where multiple instructions are in different stages of execution at the same time.
Pipeline Benefits
- Increased Throughput: Multiple instructions in flight simultaneously
- Better Resource Utilization: Different pipeline stages use different CPU resources
- Higher Clock Frequencies: Shorter stages allow faster clock rates
- Improved Performance: Instructions per cycle (IPC) increases
Classic 5-Stage Pipeline
The classic RISC pipeline consists of five stages:
cpp
Instruction Fetch → Instruction Decode → Execute → Memory Access → Write Back
(IF) (ID) (EX) (MEM) (WB)1. Instruction Fetch (IF)
cpp
// Instruction fetch stage
struct IF_Stage {
uint64_t pc; // Program counter
uint32_t instruction; // Fetched instruction
bool valid; // Is instruction valid?
void fetch() {
instruction = memory_read(pc); // Read from instruction memory
pc += 4; // Increment program counter
valid = true;
}
};What happens:
- Read instruction from memory at current PCProgram Counter - the register that contains the address of the next instruction to execute
- Increment program counter
- Handle instruction cache misses
2. Instruction Decode (ID)
cpp
// Instruction decode stage
struct ID_Stage {
uint32_t instruction;
uint8_t opcode;
uint8_t rs, rt, rd; // Register specifiers
uint16_t immediate;
uint32_t funct; // Function field
void decode() {
opcode = (instruction >> 26) & 0x3F;
rs = (instruction >> 21) & 0x1F;
rt = (instruction >> 16) & 0x1F;
rd = (instruction >> 11) & 0x1F;
immediate = instruction & 0xFFFF;
funct = instruction & 0x3F;
}
};What happens:
- Parse instruction fields (opcode, registers, immediate)
- Read register values
- Determine instruction type and control signals
3. Execute (EX)
cpp
// Execute stage
struct EX_Stage {
uint64_t alu_result;
uint64_t operand1, operand2;
uint8_t alu_op;
void execute() {
switch (alu_op) {
case ALU_ADD:
alu_result = operand1 + operand2;
break;
case ALU_SUB:
alu_result = operand1 - operand2;
break;
case ALU_AND:
alu_result = operand1 & operand2;
break;
case ALU_OR:
alu_result = operand1 | operand2;
break;
// ... other operations
}
}
};What happens:
- Perform arithmetic/logical operations
- Calculate branch targets
- Handle ALU operations
4. Memory Access (MEM)
cpp
// Memory access stage
struct MEM_Stage {
uint64_t address;
uint64_t data;
bool mem_read, mem_write;
void memory_access() {
if (mem_read) {
data = memory_read(address);
}
if (mem_write) {
memory_write(address, data);
}
}
};What happens:
- Access data memory (load/store instructions)
- Handle data cache misses
- Memory address calculation
5. Write Back (WB)
cpp
// Write back stage
struct WB_Stage {
uint64_t result;
uint8_t write_reg;
bool reg_write;
void write_back() {
if (reg_write) {
registers[write_reg] = result;
}
}
};What happens:
- Write results back to register file
- Update program counter for branches
- Complete instruction execution
Instruction Parsing and Decoding
Instruction Format
Modern x86_64 instructions use a complex variable-length format:
cpp
// x86_64 instruction format
struct x86_instruction {
// Prefixes (0-4 bytes)
uint8_t prefixes[4];
uint8_t prefix_count;
// Opcode (1-3 bytes)
uint8_t opcode[3];
uint8_t opcode_length;
// ModR/M byte
uint8_t modrm;
// SIB byte (if needed)
uint8_t sib;
// Displacement (0-4 bytes)
int32_t displacement;
// Immediate (0-8 bytes)
int64_t immediate;
};Instruction Decoding Process
cpp
// Simplified x86 instruction decoder
class InstructionDecoder {
private:
uint8_t* instruction_stream;
size_t position;
public:
x86_instruction decode() {
x86_instruction instr = {};
// 1. Decode prefixes
instr.prefix_count = decode_prefixes(&instr.prefixes[0]);
position += instr.prefix_count;
// 2. Decode opcode
instr.opcode_length = decode_opcode(&instr.opcode[0]);
position += instr.opcode_length;
// 3. Decode ModR/M (if needed)
if (needs_modrm(instr.opcode[0])) {
instr.modrm = instruction_stream[position++];
}
// 4. Decode SIB (if needed)
if (needs_sib(instr.modrm)) {
instr.sib = instruction_stream[position++];
}
// 5. Decode displacement
instr.displacement = decode_displacement(instr.modrm);
// 6. Decode immediate
instr.immediate = decode_immediate(instr.opcode[0]);
return instr;
}
private:
uint8_t decode_prefixes(uint8_t* prefixes) {
uint8_t count = 0;
while (count < 4 && is_prefix(instruction_stream[position])) {
prefixes[count] = instruction_stream[position];
position++;
count++;
}
return count;
}
uint8_t decode_opcode(uint8_t* opcode) {
opcode[0] = instruction_stream[position];
// Handle multi-byte opcodes
if (opcode[0] == 0x0F) {
opcode[1] = instruction_stream[position + 1];
return 2;
}
return 1;
}
};Common Instruction Patterns
cpp
// Example: MOV instruction decoding
void decode_mov_instruction(x86_instruction& instr) {
switch (instr.opcode[0]) {
case 0x88: // MOV r/m8, r8
// Move register to memory/register
break;
case 0x89: // MOV r/m32, r32
// Move register to memory/register (32-bit)
break;
case 0x8A: // MOV r8, r/m8
// Move memory/register to register
break;
case 0x8B: // MOV r32, r/m32
// Move memory/register to register (32-bit)
break;
case 0xC7: // MOV r/m32, imm32
// Move immediate to memory/register
break;
}
}Pipeline Hazards and Solutions
Data Hazards
Data hazards occur when an instruction depends on the result of a previous instruction that hasn't completed yet.
cpp
// Data hazard example
add rax, rbx // Instruction 1: rax = rax + rbx
sub rcx, rax // Instruction 2: rcx = rcx - rax (depends on Instruction 1)Solutions:
1. Forwarding (Bypassing)
cpp
// Pipeline forwarding logic
struct ForwardingUnit {
bool forward_from_ex_to_ex;
bool forward_from_mem_to_ex;
bool forward_from_wb_to_ex;
void check_hazards(uint8_t rs_ex, uint8_t rt_ex,
uint8_t rd_mem, uint8_t rd_wb,
bool reg_write_mem, bool reg_write_wb) {
// Forward from EX/MEM to EX
forward_from_ex_to_ex = reg_write_mem &&
(rd_mem == rs_ex || rd_mem == rt_ex);
// Forward from MEM/WB to EX
forward_from_mem_to_ex = reg_write_wb &&
(rd_wb == rs_ex || rd_wb == rt_ex);
}
};2. Pipeline Stalling
cpp
// Pipeline stall logic
struct HazardUnit {
bool stall_if_id;
bool stall_id_ex;
bool flush_if_id;
void detect_hazards() {
// Load-use hazard
if (is_load_instruction(id_ex.opcode) &&
(id_ex.rt == if_id.rs || id_ex.rt == if_id.rt)) {
stall_if_id = true;
stall_id_ex = true;
}
}
};Control Hazards
Control hazards occur when the pipeline doesn't know the target of a branch instruction.
cpp
// Control hazard example
cmp rax, 0 // Compare rax with 0
je target // Jump if equal (branch)
add rbx, rcx // This might be wrong path
sub rdx, rsi // This might be wrong path
target:
mov rax, 1 // Target of branchSolutions:
1. Branch Prediction
cpp
// Simple branch predictor
class BranchPredictor {
private:
uint8_t prediction_table[1024]; // 2-bit saturating counters
public:
bool predict(uint64_t pc) {
uint32_t index = (pc >> 2) & 0x3FF; // Use PC bits as index
return (prediction_table[index] >> 1) & 1; // Predict taken if >= 2
}
void update(uint64_t pc, bool taken, bool correct) {
uint32_t index = (pc >> 2) & 0x3FF;
if (taken && prediction_table[index] < 3) {
prediction_table[index]++;
} else if (!taken && prediction_table[index] > 0) {
prediction_table[index]--;
}
}
};2. Speculative Execution
cpp
// Speculative execution with rollback
struct SpeculativeState {
uint64_t checkpoint_pc;
uint64_t checkpoint_registers[16];
bool speculative;
void checkpoint() {
checkpoint_pc = current_pc;
memcpy(checkpoint_registers, registers, sizeof(registers));
speculative = true;
}
void rollback() {
if (speculative) {
current_pc = checkpoint_pc;
memcpy(registers, checkpoint_registers, sizeof(registers));
speculative = false;
}
}
};Performance Analysis
Pipeline Performance Metrics
cpp
// Pipeline performance calculation
struct PipelineMetrics {
uint64_t total_instructions;
uint64_t total_cycles;
uint64_t stalls;
uint64_t mispredictions;
double calculate_cpi() {
return (double)total_cycles / total_instructions;
}
double calculate_ipc() {
return (double)total_instructions / total_cycles;
}
double calculate_efficiency() {
return (double)(total_cycles - stalls) / total_cycles;
}
};Performance Optimization
cpp
// Code optimization for pipeline efficiency
void optimized_loop() {
// Good: No data dependencies between iterations
for (int i = 0; i < 1000; i++) {
result[i] = a[i] + b[i]; // Independent operations
}
}
void unoptimized_loop() {
// Bad: Data dependency between iterations
for (int i = 1; i < 1000; i++) {
result[i] = result[i-1] + data[i]; // Depends on previous iteration
}
}Key Concepts Summary
Pipeline Stages
- IF: Instruction fetch from memory
- ID: Instruction decode and register read
- EX: Execute arithmetic/logical operations
- MEM: Memory access for load/store
- WB: Write results back to registers
Instruction Parsing
- Variable-length: x86 instructions can be 1-15 bytes
- Complex decoding: Multiple prefixes, opcodes, operands
- ModR/M byte: Specifies addressing modes
- SIB byte: Scaled indexed addressing
Pipeline Hazards
- Data hazards: Dependencies between instructions
- Control hazards: Branch prediction and speculation
- Structural hazards: Resource conflicts
Performance Optimization
- Forwarding: Bypass pipeline stages for data
- Branch prediction: Predict branch outcomes
- Speculation: Execute instructions before knowing if they're needed
- Code optimization: Minimize dependencies and branches
Questions
Q: Which of the following is NOT a stage in the classic 5-stage RISC pipeline?
The classic 5-stage RISC pipeline consists of: Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). Cache Access (CA) is not a standard pipeline stage - cache access is typically handled within the IF and MEM stages.
Q: What type of pipeline hazard occurs when an instruction depends on the result of a previous instruction that hasn't completed yet?
A data hazard occurs when an instruction needs a result from a previous instruction that is still in the pipeline. For example, if instruction A writes to register R1 and instruction B reads from R1, instruction B must wait for A to complete the write-back stage before it can read the correct value.
Q: Which technique is used to resolve data hazards by sending results directly from one pipeline stage to another?
Instruction forwarding (also called bypassing) resolves data hazards by sending results directly from the execute or memory stage to dependent instructions, bypassing the write-back stage. This allows dependent instructions to continue without waiting for the result to be written to registers.
Q: What is the purpose of branch prediction in a pipelined processor?
Branch prediction is used to avoid control hazards by guessing whether a branch will be taken or not taken. This allows the pipeline to continue executing instructions speculatively instead of stalling until the branch outcome is known. Correct predictions improve performance, while incorrect predictions require flushing the pipeline.
Q: Which of the following code patterns is most pipeline-friendly?
A loop with independent operations in each iteration is most pipeline-friendly because it allows the pipeline to execute multiple iterations simultaneously without data hazards. Operations that don't depend on previous results can be executed in parallel, maximizing pipeline utilization and throughput.
Q: What happens when a branch prediction is incorrect in a pipelined processor?
A: The processor continues normally
When a branch prediction is incorrect, the pipeline must be flushed (all instructions in the pipeline are discarded) and execution must restart from the correct branch target. This causes a performance penalty as the pipeline must refill with the correct instructions.
Q: Which pipeline stage is responsible for calculating memory addresses for load/store instructions?
The Execute (EX) stage is responsible for calculating memory addresses for load/store instructions. The address calculation typically involves adding a base register value and an offset, which is performed by the ALU in the execute stage. The Memory Access (MEM) stage then uses this calculated address to access memory.
Q: What is the main benefit of pipelining in processors?
The main benefit of pipelining is increased instruction throughput. By allowing multiple instructions to be in different stages of execution simultaneously, the pipeline can complete more instructions per clock cycle, improving overall performance even though individual instructions still take the same number of cycles to complete.