Skip to content

Binary Representation of Types

Integer Representation: Two's Complement

Computers represent integers using two's complement, which allows a single representation for both positive and negative numbers while enabling efficient arithmetic operations.

How Two's Complement Works

For an n-bit integer, the range is [-2^(n-1), 2^(n-1) - 1]. To represent a negative number:

  1. Flip all bits (convert 0s to 1s and 1s to 0s)
  2. Add 1 to the result

Example for 8-bit representation:

cpp
// Represent -5 in 8-bit two's complement
Original:  5 = 00000101
Flip bits:    11111010
Add 1:        11111011 = -5

// Verify: 11111011 + 00000101 = 100000000 (overflow, result is 0)

Why Two's Complement?

Two's complement has several advantages:

  • Single representation: No need for separate positive/negative zero
  • Efficient arithmetic: Addition and subtraction work the same for positive and negative numbers
  • Sign detection: The leftmost bit indicates the sign (0 = positive, 1 = negative)

Overflow and Underflow

Range and Overflow

For different integer sizes:

cpp
int8_t:   [-128, 127]     // 8 bits
int16_t:  [-32768, 32767] // 16 bits  
int32_t:  [-2147483648, 2147483647] // 32 bits
int64_t:  [-9223372036854775808, 9223372036854775807] // 64 bits

Floating Point: IEEE 754

Floating point numbers use the IEEE 754 standard, which represents numbers in scientific notation: (-1)^sign × 1.mantissa × 2^(exponent - bias)

IEEE 754 Format

Single Precision (32-bit)

cpp
Sign (1 bit) | Exponent (8 bits) | Mantissa (23 bits)
     S              E                    M

Double Precision (64-bit)

cpp
Sign (1 bit) | Exponent (11 bits) | Mantissa (52 bits)
     S              E                     M

Special Values

  • Zero: Exponent = 0, Mantissa = 0
  • Infinity: Exponent = all 1s, Mantissa = 0
  • NaN: Exponent = all 1s, Mantissa ≠ 0
  • Denormalized: Exponent = 0, Mantissa ≠ 0

Example: Converting 3.75 to IEEE 754

cpp
// Convert 3.75 to binary
3.75 = 11.11= 1.111₂ × 2¹

// IEEE 754 representation (single precision)
Sign: 0 (positive)
Exponent: 1 + 127 = 128 = 10000000
Mantissa: 11100000000000000000000₂ (23 bits)

Result: 0 10000000 11100000000000000000000

Integer and Float Overflow/Underflow

Integer Overflow

Integer overflow occurs when a calculation exceeds the representable range:

cpp
int8_t a = 127;
int8_t b = 1;
int8_t c = a + b;  // Overflow! Result is -128, not 128

// Unsigned overflow wraps around
uint8_t x = 255;
uint8_t y = x + 1;  // Result is 0, not 256

Floating Point Overflow

Floating point overflow occurs when the result is too large to represent:

cpp
float f1 = 1e38f;
float f2 = 1e38f;
float result = f1 * f2;  // Results in infinity

Floating Point Underflow

Underflow occurs when the result is too small to represent as a normalized number:

cpp
float tiny = 1e-45f;
float smaller = tiny / 2.0f;  // Results in 0.0 (underflow)

Detecting Overflow

cpp
// Check for integer overflow
int32_t a, b, result;
if (a > 0 && b > 0 && a > INT32_MAX - b) {
    // Overflow will occur
}

// Check for floating point overflow
if (std::isinf(result)) {
    // Overflow occurred
}

Precision and Rounding

Floating Point Precision

Floating point numbers have limited precision due to finite mantissa bits:

cpp
float f = 0.1f;
double d = 0.1;
// Neither f nor d exactly represents 0.1
// f ≈ 0.100000001490116119384765625
// d ≈ 0.1000000000000000055511151231257827021181583404541015625

Rounding Modes

IEEE 754 defines several rounding modes:

  • Round to nearest, ties to even (default)
  • Round toward zero
  • Round toward positive infinity
  • Round toward negative infinity
cpp
#include <cfenv>
#include <iostream>

// Set rounding mode
std::fesetround(FE_TONEAREST);  // Default
std::fesetround(FE_TOWARDZERO); // Round toward zero
std::fesetround(FE_UPWARD);     // Round toward positive infinity
std::fesetround(FE_DOWNWARD);   // Round toward negative infinity

Comparing Floating Point Numbers

Never use == to compare floating point numbers:

cpp
// Wrong
if (a == b) { /* ... */ }

// Correct
const float epsilon = 1e-6f;
if (std::abs(a - b) < epsilon) { /* ... */ }

Accumulating Floating Point Numbers

Accumulating many floating point numbers can lead to precision loss:

cpp
// Bad: Accumulating in order
float sum = 0.0f;
for (int i = 0; i < 1000000; i++) {
    sum += 0.1f;  // Accumulates rounding errors
}

// Better: Use double for intermediate calculations
double sum = 0.0;
for (int i = 0; i < 1000000; i++) {
    sum += 0.1;
}
float result = static_cast<float>(sum);

Practical Implications

Bit Manipulation with Floating Point

cpp
// Extract components of a float
float f = 3.75f;
uint32_t bits = *reinterpret_cast<uint32_t*>(&f);

uint32_t sign = (bits >> 31) & 1;
uint32_t exponent = (bits >> 23) & 0xFF;
uint32_t mantissa = bits & 0x7FFFFF;

// Reconstruct float
uint32_t reconstructed = (sign << 31) | (exponent << 23) | mantissa;
float f2 = *reinterpret_cast<float*>(&reconstructed);

Performance Considerations

Integer vs Floating Point

  • Integer arithmetic is typically faster than floating point
  • Floating point operations may have higher latency
  • SIMD instructions can accelerate both types
cpp
// Integer arithmetic (faster)
int32_t a = 10, b = 20;
int32_t c = a + b;

// Floating point arithmetic (slower)
float x = 10.0f, y = 20.0f;
float z = x + y;

Memory Layout

cpp
// Struct padding affects memory layout
struct BadLayout {
    char a;     // 1 byte
    int b;      // 4 bytes (3 bytes padding)
    char c;     // 1 byte (3 bytes padding)
}; // Total: 12 bytes

struct GoodLayout {
    int b;      // 4 bytes
    char a;     // 1 byte
    char c;     // 1 byte (2 bytes padding)
}; // Total: 8 bytes

Understanding binary representation is crucial for:

  • Debugging numerical issues
  • Optimizing performance-critical code
  • Writing portable code across different architectures
  • Implementing low-level algorithms

Given an array where every element appears exactly twice except for one element that appears only once, find the single element using XOR operations.

cpp
// Find the single element that appears only once in an array
// Every other element appears exactly twice

int findSingleElement(const std::vector<int>& nums) {
    // TODO: Implement the solution using XOR

    return 0; // Replace with your implementation
}