Appearance
Binary Representation of Types
Integer Representation: Two's Complement
Computers represent integers using two's complement, which allows a single representation for both positive and negative numbers while enabling efficient arithmetic operations.
How Two's Complement Works
For an n-bit integer, the range is [-2^(n-1), 2^(n-1) - 1]. To represent a negative number:
- Flip all bits (convert 0s to 1s and 1s to 0s)
- Add 1 to the result
Example for 8-bit representation:
cpp
// Represent -5 in 8-bit two's complement
Original: 5 = 00000101
Flip bits: 11111010
Add 1: 11111011 = -5
// Verify: 11111011 + 00000101 = 100000000 (overflow, result is 0)Why Two's Complement?
Two's complement has several advantages:
- Single representation: No need for separate positive/negative zero
- Efficient arithmetic: Addition and subtraction work the same for positive and negative numbers
- Sign detection: The leftmost bit indicates the sign (0 = positive, 1 = negative)
Overflow and Underflow
Range and Overflow
For different integer sizes:
cpp
int8_t: [-128, 127] // 8 bits
int16_t: [-32768, 32767] // 16 bits
int32_t: [-2147483648, 2147483647] // 32 bits
int64_t: [-9223372036854775808, 9223372036854775807] // 64 bitsFloating Point: IEEE 754
Floating point numbers use the IEEE 754 standard, which represents numbers in scientific notation: (-1)^sign × 1.mantissa × 2^(exponent - bias)
IEEE 754 Format
Single Precision (32-bit)
cpp
Sign (1 bit) | Exponent (8 bits) | Mantissa (23 bits)
S E MDouble Precision (64-bit)
cpp
Sign (1 bit) | Exponent (11 bits) | Mantissa (52 bits)
S E MSpecial Values
- Zero: Exponent = 0, Mantissa = 0
- Infinity: Exponent = all 1s, Mantissa = 0
- NaN: Exponent = all 1s, Mantissa ≠ 0
- Denormalized: Exponent = 0, Mantissa ≠ 0
Example: Converting 3.75 to IEEE 754
cpp
// Convert 3.75 to binary
3.75 = 11.11₂ = 1.111₂ × 2¹
// IEEE 754 representation (single precision)
Sign: 0 (positive)
Exponent: 1 + 127 = 128 = 10000000₂
Mantissa: 11100000000000000000000₂ (23 bits)
Result: 0 10000000 11100000000000000000000Integer and Float Overflow/Underflow
Integer Overflow
Integer overflow occurs when a calculation exceeds the representable range:
cpp
int8_t a = 127;
int8_t b = 1;
int8_t c = a + b; // Overflow! Result is -128, not 128
// Unsigned overflow wraps around
uint8_t x = 255;
uint8_t y = x + 1; // Result is 0, not 256Floating Point Overflow
Floating point overflow occurs when the result is too large to represent:
cpp
float f1 = 1e38f;
float f2 = 1e38f;
float result = f1 * f2; // Results in infinityFloating Point Underflow
Underflow occurs when the result is too small to represent as a normalized number:
cpp
float tiny = 1e-45f;
float smaller = tiny / 2.0f; // Results in 0.0 (underflow)Detecting Overflow
cpp
// Check for integer overflow
int32_t a, b, result;
if (a > 0 && b > 0 && a > INT32_MAX - b) {
// Overflow will occur
}
// Check for floating point overflow
if (std::isinf(result)) {
// Overflow occurred
}Precision and Rounding
Floating Point Precision
Floating point numbers have limited precision due to finite mantissa bits:
cpp
float f = 0.1f;
double d = 0.1;
// Neither f nor d exactly represents 0.1
// f ≈ 0.100000001490116119384765625
// d ≈ 0.1000000000000000055511151231257827021181583404541015625Rounding Modes
IEEE 754 defines several rounding modes:
- Round to nearest, ties to even (default)
- Round toward zero
- Round toward positive infinity
- Round toward negative infinity
cpp
#include <cfenv>
#include <iostream>
// Set rounding mode
std::fesetround(FE_TONEAREST); // Default
std::fesetround(FE_TOWARDZERO); // Round toward zero
std::fesetround(FE_UPWARD); // Round toward positive infinity
std::fesetround(FE_DOWNWARD); // Round toward negative infinityComparing Floating Point Numbers
Never use == to compare floating point numbers:
cpp
// Wrong
if (a == b) { /* ... */ }
// Correct
const float epsilon = 1e-6f;
if (std::abs(a - b) < epsilon) { /* ... */ }Accumulating Floating Point Numbers
Accumulating many floating point numbers can lead to precision loss:
cpp
// Bad: Accumulating in order
float sum = 0.0f;
for (int i = 0; i < 1000000; i++) {
sum += 0.1f; // Accumulates rounding errors
}
// Better: Use double for intermediate calculations
double sum = 0.0;
for (int i = 0; i < 1000000; i++) {
sum += 0.1;
}
float result = static_cast<float>(sum);Practical Implications
Bit Manipulation with Floating Point
cpp
// Extract components of a float
float f = 3.75f;
uint32_t bits = *reinterpret_cast<uint32_t*>(&f);
uint32_t sign = (bits >> 31) & 1;
uint32_t exponent = (bits >> 23) & 0xFF;
uint32_t mantissa = bits & 0x7FFFFF;
// Reconstruct float
uint32_t reconstructed = (sign << 31) | (exponent << 23) | mantissa;
float f2 = *reinterpret_cast<float*>(&reconstructed);Performance Considerations
Integer vs Floating Point
- Integer arithmetic is typically faster than floating point
- Floating point operations may have higher latency
- SIMD instructions can accelerate both types
cpp
// Integer arithmetic (faster)
int32_t a = 10, b = 20;
int32_t c = a + b;
// Floating point arithmetic (slower)
float x = 10.0f, y = 20.0f;
float z = x + y;Memory Layout
cpp
// Struct padding affects memory layout
struct BadLayout {
char a; // 1 byte
int b; // 4 bytes (3 bytes padding)
char c; // 1 byte (3 bytes padding)
}; // Total: 12 bytes
struct GoodLayout {
int b; // 4 bytes
char a; // 1 byte
char c; // 1 byte (2 bytes padding)
}; // Total: 8 bytesUnderstanding binary representation is crucial for:
- Debugging numerical issues
- Optimizing performance-critical code
- Writing portable code across different architectures
- Implementing low-level algorithms
Given an array where every element appears exactly twice except for one element that appears only once, find the single element using XOR operations.
cpp
// Find the single element that appears only once in an array
// Every other element appears exactly twice
int findSingleElement(const std::vector<int>& nums) {
// TODO: Implement the solution using XOR
return 0; // Replace with your implementation
}