Go Data Structures & Algorithms
Hash Tables in Go
Table of Contents
Hash tables are fundamental data structures that provide efficient key-value storage and retrieval. They are widely used in various applications due to their average-case constant time complexity for basic operations. In this comprehensive guide, we'll explore hash tables in Go, from basic concepts to advanced implementations and practical applications.
What are Hash Tables?
A hash table (also known as a hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.
The key features of hash tables include:
- Fast lookups: Average O(1) time complexity for search, insert, and delete operations
- Key-value storage: Each value is associated with a unique key
- Dynamic size: Can grow or shrink as needed
- Unordered: Elements are not stored in any particular order
Hash Functions
A hash function is a function that maps data of arbitrary size to fixed-size values. In the context of hash tables, it converts keys into array indices.
A good hash function should have the following properties:
- Deterministic: The same key should always produce the same hash value
- Uniform distribution: Hash values should be evenly distributed across the array
- Efficiency: The hash function should be computationally inexpensive
- Minimal collisions: Different keys should rarely produce the same hash value
Common hash functions include:
- Division method: h(k) = k mod m, where m is the size of the hash table
- Multiplication method: h(k) = ⌊m(kA mod 1)⌋, where A is a constant between 0 and 1
- Universal hashing: A family of hash functions chosen at random
- Cryptographic hash functions: Such as SHA-256, which are designed to be secure but are often too slow for hash tables
For non-integer keys (like strings), we first need to convert them to integers before applying the hash function:
// Simple hash function for strings
func hashString(s string, tableSize int) int {
hash := 0
for i := 0; i < len(s); i++ {
hash = (hash*31 + int(s[i])) % tableSize
}
return hash
}
Collision Resolution
A collision occurs when two different keys hash to the same index. There are several strategies to handle collisions:
Separate Chaining
In separate chaining, each bucket contains a linked list (or another data structure) of all key-value pairs that hash to the same index. When a collision occurs, the new key-value pair is simply added to the list.
Advantages:
- Simple to implement
- Hash table never fills up
- Less sensitive to the hash function or load factors
Disadvantages:
- Requires additional memory for linked list pointers
- Cache performance can be poor due to the linked list traversal
Open Addressing
In open addressing, all key-value pairs are stored in the hash table array itself. When a collision occurs, the algorithm searches for the next available slot according to a probing sequence.
Common probing methods include:
- Linear Probing: Check the next slot sequentially (h(k) + i) mod m
- Quadratic Probing: Check slots according to a quadratic function (h(k) + i^2) mod m
- Double Hashing: Use a second hash function to determine the step size (h1(k) + i*h2(k)) mod m
Advantages:
- Better cache performance
- No extra memory for pointers
Disadvantages:
- More sensitive to the hash function and load factor
- More complex deletion (typically using tombstones)
- Performance degrades as the table fills up
Robin Hood Hashing
Robin Hood hashing is a variant of open addressing that aims to reduce the variance of probe sequence lengths by moving elements during insertion.
Cuckoo Hashing
Cuckoo hashing uses multiple hash functions and allows a key to be stored in one of several possible locations. When a collision occurs, it may move existing keys to their alternate locations.
Hash Table Operations
The primary operations on a hash table are:
- Insert: Add a new key-value pair to the hash table
- Search: Find the value associated with a given key
- Delete: Remove a key-value pair from the hash table
- Resize: Increase or decrease the size of the hash table to maintain performance
Load Factor
The load factor of a hash table is the ratio of the number of stored elements to the number of buckets. It's a critical parameter that affects performance:
Load Factor = Number of Elements / Number of Buckets
As the load factor increases, the probability of collisions increases, which can degrade performance. Hash tables typically resize (rehash) when the load factor exceeds a certain threshold (commonly 0.7 or 0.75).
Go's Built-in Maps
Go provides a built-in map type that implements a hash table. It's highly optimized and suitable for most use cases.
Creating and Using Maps
package main
import (
"fmt"
)
func main() {
// Create a map with string keys and int values
scores := make(map[string]int)
// Add key-value pairs
scores["Alice"] = 95
scores["Bob"] = 82
scores["Charlie"] = 88
// Create a map with initialization
ages := map[string]int{
"Alice": 30,
"Bob": 25,
"Charlie": 35,
}
// Access values
fmt.Println("Alice's score:", scores["Alice"])
// Check if a key exists
bobAge, exists := ages["Bob"]
if exists {
fmt.Println("Bob's age:", bobAge)
} else {
fmt.Println("Bob's age is unknown")
}
// Accessing a non-existent key returns the zero value
daveScore := scores["Dave"]
fmt.Println("Dave's score:", daveScore) // Prints 0 (zero value for int)
// Delete a key-value pair
delete(scores, "Bob")
// Iterate over a map
fmt.Println("Scores:")
for name, score := range scores {
fmt.Printf("%s: %d\n", name, score)
}
// Get the number of elements
fmt.Println("Number of scores:", len(scores))
// Clear a map
for k := range scores {
delete(scores, k)
}
fmt.Println("After clearing, number of scores:", len(scores))
}
Map Internals
Go's built-in map is implemented as a hash table with buckets containing key-value pairs. It uses a variant of separate chaining for collision resolution.
Key features of Go's map implementation:
- Automatically grows when the load factor exceeds a threshold
- Uses a good hash function that provides uniform distribution
- Not safe for concurrent use (requires external synchronization)
- Iteration order is not guaranteed (and deliberately randomized in some Go versions)
Maps with Composite Keys
Sometimes you need to use a composite key (multiple values) as a map key. In Go, any comparable type can be a key, including structs:
package main
import (
"fmt"
)
// Define a composite key type
type Point struct {
X, Y int
}
func main() {
// Map with struct keys
grid := make(map[Point]string)
// Add key-value pairs
grid[Point{1, 2}] = "A"
grid[Point{3, 4}] = "B"
// Access values
fmt.Println("Value at (1,2):", grid[Point{1, 2}])
// Iterate over the map
for point, value := range grid {
fmt.Printf("(%d,%d): %s\n", point.X, point.Y, value)
}
}
For non-comparable types (like slices or maps), you need to convert them to a comparable type first, typically by serializing them to a string:
package main
import (
"encoding/json"
"fmt"
)
func main() {
// Map with string keys (serialized slices)
sliceMap := make(map[string]string)
// Add key-value pairs with slice keys
key1 := []int{1, 2, 3}
key2 := []int{4, 5, 6}
// Serialize slices to strings
key1Str, _ := json.Marshal(key1)
key2Str, _ := json.Marshal(key2)
sliceMap[string(key1Str)] = "Value for [1,2,3]"
sliceMap[string(key2Str)] = "Value for [4,5,6]"
// Access values
fmt.Println("Value for [1,2,3]:", sliceMap[string(key1Str)])
// To look up a value, you need to serialize the key again
searchKey := []int{1, 2, 3}
searchKeyStr, _ := json.Marshal(searchKey)
fmt.Println("Found value:", sliceMap[string(searchKeyStr)])
}
Thread-Safe Maps
Go's built-in maps are not safe for concurrent use. For concurrent access, you can use sync.Map or implement your own thread-safe map using mutexes:
package main
import (
"fmt"
"sync"
)
// ThreadSafeMap is a thread-safe map implementation
type ThreadSafeMap struct {
data map[string]interface{}
mu sync.RWMutex
}
// NewThreadSafeMap creates a new thread-safe map
func NewThreadSafeMap() *ThreadSafeMap {
return &ThreadSafeMap{
data: make(map[string]interface{}),
}
}
// Set adds or updates a key-value pair
func (m *ThreadSafeMap) Set(key string, value interface{}) {
m.mu.Lock()
defer m.mu.Unlock()
m.data[key] = value
}
// Get retrieves a value by key
func (m *ThreadSafeMap) Get(key string) (interface{}, bool) {
m.mu.RLock()
defer m.mu.RUnlock()
value, exists := m.data[key]
return value, exists
}
// Delete removes a key-value pair
func (m *ThreadSafeMap) Delete(key string) {
m.mu.Lock()
defer m.mu.Unlock()
delete(m.data, key)
}
// Len returns the number of elements
func (m *ThreadSafeMap) Len() int {
m.mu.RLock()
defer m.mu.RUnlock()
return len(m.data)
}
// Keys returns all keys in the map
func (m *ThreadSafeMap) Keys() []string {
m.mu.RLock()
defer m.mu.RUnlock()
keys := make([]string, 0, len(m.data))
for k := range m.data {
keys = append(keys, k)
}
return keys
}
func main() {
tsm := NewThreadSafeMap()
// Concurrent writes
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
key := fmt.Sprintf("key%d", i)
tsm.Set(key, i)
}(i)
}
wg.Wait()
// Concurrent reads
for i := 0; i < 10; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
key := fmt.Sprintf("key%d", i)
value, exists := tsm.Get(key)
if exists {
fmt.Printf("Key: %s, Value: %v\n", key, value)
}
}(i)
}
wg.Wait()
fmt.Println("Map size:", tsm.Len())
fmt.Println("Keys:", tsm.Keys())
}
Alternatively, you can use Go's sync.Map which is optimized for specific use cases:
package main
import (
"fmt"
"sync"
)
func main() {
var m sync.Map
// Store key-value pairs
m.Store("key1", "value1")
m.Store("key2", "value2")
// Load a value
value, exists := m.Load("key1")
if exists {
fmt.Println("Value for key1:", value)
}
// LoadOrStore returns the existing value if the key exists,
// otherwise it stores and returns the given value
value, loaded := m.LoadOrStore("key3", "value3")
if loaded {
fmt.Println("Key3 already exists with value:", value)
} else {
fmt.Println("Stored new key3 with value:", value)
}
// Delete a key
m.Delete("key2")
// Iterate over all key-value pairs
m.Range(func(key, value interface{}) bool {
fmt.Printf("%v: %v\n", key, value)
return true // continue iteration
})
}
sync.Map is optimized for two common use cases:
- When the entry for a given key is only ever written once but read many times
- When multiple goroutines read, write, and overwrite entries for disjoint sets of keys
For other use cases, using a mutex-protected map might perform better.
Implementing a Hash Table
While Go's built-in map is sufficient for most use cases, implementing your own hash table can be educational and sometimes necessary for specialized requirements.
Let's implement a simple hash table with separate chaining:
package main
import (
"fmt"
)
// KeyValue represents a key-value pair
type KeyValue struct {
Key string
Value interface{}
Next *KeyValue // For chaining
}
// HashTable represents a hash table
type HashTable struct {
Buckets []*KeyValue
Size int
}
// NewHashTable creates a new hash table with the specified number of buckets
func NewHashTable(bucketCount int) *HashTable {
return &HashTable{
Buckets: make([]*KeyValue, bucketCount),
Size: 0,
}
}
// hash computes the hash value for a key
func (ht *HashTable) hash(key string) int {
hash := 0
for i := 0; i < len(key); i++ {
hash = (hash*31 + int(key[i])) % len(ht.Buckets)
}
return hash
}
// Put adds or updates a key-value pair
func (ht *HashTable) Put(key string, value interface{}) {
index := ht.hash(key)
// Check if key already exists
current := ht.Buckets[index]
for current != nil {
if current.Key == key {
// Key exists, update value
current.Value = value
return
}
current = current.Next
}
// Key doesn't exist, add new node at the beginning of the chain
newNode := &KeyValue{
Key: key,
Value: value,
Next: ht.Buckets[index],
}
ht.Buckets[index] = newNode
ht.Size++
// Check if rehashing is needed
if float64(ht.Size)/float64(len(ht.Buckets)) > 0.75 {
ht.rehash()
}
}
// Get retrieves a value by key
func (ht *HashTable) Get(key string) (interface{}, bool) {
index := ht.hash(key)
current := ht.Buckets[index]
for current != nil {
if current.Key == key {
return current.Value, true
}
current = current.Next
}
return nil, false
}
// Remove deletes a key-value pair
func (ht *HashTable) Remove(key string) bool {
index := ht.hash(key)
// Special case: first node matches
if ht.Buckets[index] != nil && ht.Buckets[index].Key == key {
ht.Buckets[index] = ht.Buckets[index].Next
ht.Size--
return true
}
// Check the rest of the chain
current := ht.Buckets[index]
for current != nil && current.Next != nil {
if current.Next.Key == key {
current.Next = current.Next.Next
ht.Size--
return true
}
current = current.Next
}
return false
}
// rehash increases the size of the hash table and redistributes the elements
func (ht *HashTable) rehash() {
oldBuckets := ht.Buckets
// Create a new hash table with double the size
newBucketCount := len(ht.Buckets) * 2
ht.Buckets = make([]*KeyValue, newBucketCount)
ht.Size = 0
// Rehash all existing key-value pairs
for _, bucket := range oldBuckets {
current := bucket
for current != nil {
ht.Put(current.Key, current.Value)
current = current.Next
}
}
}
// Keys returns all keys in the hash table
func (ht *HashTable) Keys() []string {
keys := make([]string, 0, ht.Size)
for _, bucket := range ht.Buckets {
current := bucket
for current != nil {
keys = append(keys, current.Key)
current = current.Next
}
}
return keys
}
// Values returns all values in the hash table
func (ht *HashTable) Values() []interface{} {
values := make([]interface{}, 0, ht.Size)
for _, bucket := range ht.Buckets {
current := bucket
for current != nil {
values = append(values, current.Value)
current = current.Next
}
}
return values
}
// Len returns the number of elements in the hash table
func (ht *HashTable) Len() int {
return ht.Size
}
// Clear removes all elements from the hash table
func (ht *HashTable) Clear() {
ht.Buckets = make([]*KeyValue, len(ht.Buckets))
ht.Size = 0
}
func main() {
ht := NewHashTable(16)
// Add key-value pairs
ht.Put("apple", 5)
ht.Put("banana", 10)
ht.Put("cherry", 15)
// Get values
apple, exists := ht.Get("apple")
if exists {
fmt.Println("Apple count:", apple)
}
// Update a value
ht.Put("apple", 7)
apple, _ = ht.Get("apple")
fmt.Println("Updated apple count:", apple)
// Remove a key
ht.Remove("banana")
// Check if key exists
_, exists = ht.Get("banana")
fmt.Println("Banana exists:", exists)
// Get all keys and values
fmt.Println("Keys:", ht.Keys())
fmt.Println("Values:", ht.Values())
fmt.Println("Size:", ht.Len())
// Clear the hash table
ht.Clear()
fmt.Println("Size after clearing:", ht.Len())
}
This implementation provides a basic hash table with separate chaining for collision resolution and automatic rehashing when the load factor exceeds 0.75. It supports the standard operations: put, get, remove, and clear.
Performance Analysis
Let's analyze the time complexity of hash table operations:
| Operation | Average Case | Worst Case |
|---|---|---|
| Insert (Put) | O(1) | O(n) |
| Search (Get) | O(1) | O(n) |
| Delete (Remove) | O(1) | O(n) |
| Rehash | O(n) | O(n) |
The worst-case scenario occurs when all keys hash to the same bucket, creating a long chain that must be traversed linearly. This is rare with a good hash function and proper load factor management.
The space complexity of a hash table is O(n), where n is the number of key-value pairs stored.
Factors Affecting Performance
- Hash Function: A good hash function distributes keys uniformly, minimizing collisions
- Load Factor: A lower load factor reduces collisions but increases memory usage
- Collision Resolution Strategy: Different strategies have different trade-offs
- Initial Capacity: Setting an appropriate initial capacity can reduce the need for rehashing
Use Cases
Hash tables are versatile data structures with numerous applications:
- Caching: Storing computed results for quick retrieval
- Database Indexing: Speeding up data retrieval operations
- Symbol Tables: Used in compilers and interpreters to store variable information
- Associative Arrays: Implementing key-value mappings
- Counting Frequencies: Counting occurrences of elements in a collection
- De-duplication: Identifying and removing duplicate elements
- Graph Representation: Storing adjacency lists
- Implementing Sets: Using hash tables with dummy values to implement sets
Best Practices
Here are some best practices for working with hash tables in Go:
Use Built-in Maps When Possible
Go's built-in map type is highly optimized and suitable for most use cases. Implement your own hash table only when you have specific requirements that the built-in map doesn't satisfy.
Choose Appropriate Key Types
In Go, map keys must be comparable types (boolean, numeric, string, pointer, channel, interface, or structs/arrays containing only comparable types). For non-comparable types, convert them to a comparable form first.
Pre-allocate Maps When Size is Known
If you know the approximate size of your map, pre-allocate it to avoid frequent resizing:
// Pre-allocate a map with capacity for 100 elements
m := make(map[string]int, 100)
Handle Concurrent Access
Go's built-in maps are not safe for concurrent use. Use sync.Map or implement your own thread-safe map using mutexes for concurrent access.
Check for Key Existence
Always check if a key exists before using its value, especially when the zero value of the value type is meaningful:
value, exists := m["key"]
if exists {
// Use value
} else {
// Handle missing key
}
Clear Maps Efficiently
To clear a map, either create a new one or delete all keys:
// Option 1: Create a new map
m = make(map[string]int)
// Option 2: Delete all keys
for k := range m {
delete(m, k)
}
Use Maps for Lookups, Not Iteration
Maps are optimized for lookups, not for ordered iteration. If you need to iterate over elements in a specific order, consider using a separate slice to maintain the order.
Conclusion
Hash tables are powerful data structures that provide efficient key-value storage and retrieval. In Go, the built-in map type offers a high-performance implementation suitable for most use cases. Understanding the principles behind hash tables, including hash functions, collision resolution, and performance characteristics, allows you to use them effectively in your applications.
In the next section, we'll explore specialized data structures that are designed for specific use cases, such as tries, bloom filters, and skip lists.