Appearance
Hashing Containers
Hash-based containers provide the fastest average lookup times in the STL, offering O(1) average complexity for insertions, deletions, and searches. Understanding how hashing works and when to use these containers is crucial for building high-performance applications.
What is Hashing?
Hashing is a technique that maps data of arbitrary size to fixed-size values (hash codes). A hash function takes an input and produces a hash value that determines where the data is stored in a hash table.
Basic Hashing Concept
cpp
// Simple hash function for integers
size_t simpleHash(int value) {
return value % 1000; // Maps to buckets 0-999
}
// Hash function for strings
size_t stringHash(const std::string& str) {
size_t hash = 0;
for (char c : str) {
hash = hash * 31 + c; // Simple but effective
}
return hash;
}
// Example usage
int num = 42;
size_t hash1 = simpleHash(num); // hash1 = 42
std::string text = "hello";
size_t hash2 = stringHash(text); // hash2 = some valueKey Properties of Good Hash Functions:
- Deterministic: Same input always produces same output
- Uniform Distribution: Outputs spread evenly across range
- Fast Computation: Should be quick to calculate
- Minimal Collisions: Different inputs should rarely produce same output
How Hash Tables Work
cpp
// Conceptual hash table structure
struct HashTable {
std::vector<std::list<std::pair<std::string, int>>> buckets;
size_t size;
float load_factor;
};
// Simplified example
std::vector<std::list<std::pair<std::string, int>>> hash_table(10);
// Inserting "apple" -> 5
size_t hash = stringHash("apple") % 10; // hash = 3
hash_table[3].push_back({"apple", 5});
// Looking up "apple"
hash = stringHash("apple") % 10; // hash = 3
for (const auto& pair : hash_table[3]) {
if (pair.first == "apple") {
std::cout << "Found: " << pair.second << std::endl; // 5
break;
}
}Hash Table Components:
- Buckets: Array of containers (usually linked lists)
- Hash Function: Maps keys to bucket indices
- Collision Resolution: Handles multiple keys mapping to same bucket
- Load Factor: Ratio of elements to buckets
std::unordered_map
Basic Usage
cpp
#include <unordered_map>
#include <string>
#include <iostream>
// Create an unordered map
std::unordered_map<std::string, int> scores;
// Insert elements
scores["Alice"] = 95;
scores["Bob"] = 87;
scores["Charlie"] = 92;
// Or use insert method
scores.insert({"David", 89});
scores.emplace("Eve", 91);
// Access elements
std::cout << "Alice's score: " << scores["Alice"] << std::endl; // 95
// Check if key exists
if (scores.find("Frank") != scores.end()) {
std::cout << "Frank's score: " << scores["Frank"] << std::endl;
} else {
std::cout << "Frank not found" << std::endl;
}
// Safe access with default value
int frank_score = scores["Frank"]; // Creates entry with value 0Iterating Over unordered_map
cpp
std::unordered_map<std::string, int> scores = {
{"Alice", 95}, {"Bob", 87}, {"Charlie", 92}
};
// Range-based for loop
for (const auto& [name, score] : scores) {
std::cout << name << ": " << score << std::endl;
}
// Iterator-based loop
for (auto it = scores.begin(); it != scores.end(); ++it) {
std::cout << it->first << ": " << it->second << std::endl;
}
// Note: Order is not guaranteed!
// Output might be: Charlie: 92, Alice: 95, Bob: 87Advanced Operations
cpp
std::unordered_map<std::string, int> scores;
// Insert with duplicate handling
auto result = scores.insert({"Alice", 95});
if (!result.second) {
std::cout << "Alice already exists with score: " << result.first->second << std::endl;
}
// Update existing value
scores["Alice"] = 98; // Overwrites existing value
// Remove elements
scores.erase("Bob");
// Get size and check emptiness
std::cout << "Size: " << scores.size() << std::endl;
std::cout << "Empty: " << scores.empty() << std::endl;
// Clear all elements
scores.clear();std::unordered_set
Basic Usage
cpp
#include <unordered_set>
#include <string>
// Create an unordered set
std::unordered_set<std::string> unique_names;
// Insert elements
unique_names.insert("Alice");
unique_names.emplace("Bob");
unique_names.insert("Charlie");
// Check membership
if (unique_names.find("Alice") != unique_names.end()) {
std::cout << "Alice is in the set" << std::endl;
}
// Count occurrences (0 or 1 for sets)
size_t count = unique_names.count("Bob"); // 1
// Remove elements
unique_names.erase("Charlie");
// Get size
std::cout << "Number of unique names: " << unique_names.size() << std::endl;Set Operations
cpp
std::unordered_set<int> set1 = {1, 2, 3, 4, 5};
std::unordered_set<int> set2 = {4, 5, 6, 7, 8};
// Check if element exists
bool has_three = set1.contains(3); // C++20
bool has_six = set2.find(6) != set2.end(); // C++17 and earlier
// Iterate over elements
for (const auto& num : set1) {
std::cout << num << " "; // Order not guaranteed
}
std::cout << std::endl;Performance and Memory
Understanding Load Factor
The load factor is the ratio of the number of elements to the number of buckets:
cpp
std::unordered_map<std::string, int> scores;
// Get current load factor
float current_load = scores.load_factor();
std::cout << "Current load factor: " << current_load << std::endl;
// Get maximum load factor
float max_load = scores.max_load_factor();
std::cout << "Maximum load factor: " << max_load << std::endl;
// Set custom maximum load factor
scores.max_load_factor(0.5f); // More aggressive rehashing
// Reserve buckets to control load factor
scores.reserve(1000); // Pre-allocate 1000 bucketsHow Rehashing Works
cpp
std::unordered_map<std::string, int> scores;
// Insert elements and watch rehashing
for (int i = 0; i < 1000; ++i) {
std::string name = "User" + std::to_string(i);
scores[name] = i;
// Check when rehashing occurs
if (i % 100 == 0) {
std::cout << "Elements: " << scores.size()
<< ", Buckets: " << scores.bucket_count()
<< ", Load Factor: " << scores.load_factor() << std::endl;
}
}Rehashing Process:
- Trigger: Load factor exceeds maximum load factor
- Growth: Number of buckets increases (usually doubles)
- Redistribution: All elements are rehashed and redistributed
- Cost: O(n) operation, but amortized over many insertions
Time Complexity
| Operation | Average Case | Worst Case | Notes |
|---|---|---|---|
| Insertion | O(1) | O(n) | Depends on hash function quality |
| Lookup | O(1) | O(n) | Collisions can degrade performance |
| Deletion | O(1) | O(n) | Same as lookup |
| Iteration | O(n) | O(n) | Must visit all elements |
Memory Overhead
cpp
// Compare memory usage
std::vector<std::pair<std::string, int>> vector_storage;
std::unordered_map<std::string, int> hash_storage;
// Add same elements to both
for (int i = 0; i < 1000; ++i) {
std::string key = "key" + std::to_string(i);
vector_storage.push_back({key, i});
hash_storage[key] = i;
}
// Hash containers have overhead:
// - Bucket array
// - Hash table metadata
// - Potential unused buckets
// - Linked list overhead for collisionsCustom Hash Functions
For Custom Types
cpp
struct Person {
std::string name;
int age;
bool operator==(const Person& other) const {
return name == other.name && age == other.age;
}
};
// Custom hash function for Person
struct PersonHash {
size_t operator()(const Person& person) const {
size_t name_hash = std::hash<std::string>{}(person.name);
size_t age_hash = std::hash<int>{}(person.age);
return name_hash ^ (age_hash << 1); // Combine hashes
}
};
// Use custom hash function
std::unordered_set<Person, PersonHash> people;
people.insert({"Alice", 25});
people.insert({"Bob", 30});For Enums
cpp
enum class Color { Red, Green, Blue };
// Hash function for enums
struct ColorHash {
size_t operator()(Color color) const {
return static_cast<size_t>(color);
}
};
std::unordered_map<Color, std::string, ColorHash> color_names = {
{Color::Red, "Red"},
{Color::Green, "Green"},
{Color::Blue, "Blue"}
};When to Use Hash Containers
Use std::unordered_map/unordered_set when:
- Fast lookup is critical: O(1) average access time
- Order doesn't matter: Elements can be in any order
- Frequent insertions/deletions: Good average performance
- Large datasets: Hash tables scale well
- Key-based access: Need to find elements by key quickly
Consider alternatives when:
- Ordered iteration needed: Use
std::maporstd::set - Memory is critical: Hash containers have overhead
- Predictable performance: Hash tables have variable performance
- Small datasets: Overhead might not be worth it
Summary
Hash containers (std::unordered_map and std::unordered_set) provide:
Advantages:
- Fast average access: O(1) insertion, lookup, and deletion
- Scalable performance: Good performance with large datasets
- Flexible key types: Can hash any type with proper hash function
Trade-offs:
- No ordering guarantee: Elements are not sorted
- Variable performance: Depends on hash function quality and load factor
- Memory overhead: Additional storage for buckets and metadata
- Rehashing cost: Periodic O(n) rehashing operations
Best Use Cases:
- Fast lookups: When you need O(1) average access time
- Unordered data: When element order doesn't matter
- Large datasets: When the benefits of hashing outweigh overhead
- Key-value storage: When you need efficient key-based access
Questions
Q: What is the average time complexity of insertion and lookup in std::unordered_map?
std::unordered_map provides average O(1) insertion and lookup time. This is because hash tables use hashing to distribute elements across buckets, allowing direct access to elements in most cases. However, worst-case can be O(n) if there are many hash collisions.
Q: What happens when the load factor of a hash container exceeds the maximum load factor?
When the load factor exceeds the maximum load factor, the container automatically rehashes by increasing the number of buckets and redistributing all elements. This maintains good performance but can be expensive as it requires rehashing all existing elements.
Q: What is the main difference between std::map and std::unordered_map?
std::map maintains elements in sorted order (using a balanced tree), while std::unordered_map does not guarantee any specific order. This is the fundamental difference - std::map provides ordered iteration, std::unordered_map provides faster average access.
Q: When should you use std::unordered_set over std::set?
Use std::unordered_set when you need fast average O(1) lookup and don't care about the order of elements. std::set provides O(log n) lookup but maintains sorted order. Choose based on whether you need ordering or speed.
Q: What is the purpose of a hash function in hash containers?
The hash function's purpose is to distribute elements evenly across buckets. A good hash function minimizes collisions by spreading elements uniformly across the available buckets, which maintains O(1) average performance for insertions and lookups.