Go Data Structures & Algorithms

code

Hash Tables in Go

Hash tables are fundamental data structures that provide efficient key-value storage and retrieval. They are widely used in various applications due to their average-case constant time complexity for basic operations. In this comprehensive guide, we'll explore hash tables in Go, from basic concepts to advanced implementations and practical applications.

What are Hash Tables?

A hash table (also known as a hash map) is a data structure that implements an associative array abstract data type, a structure that can map keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.

The key features of hash tables include:

  • Fast lookups: Average O(1) time complexity for search, insert, and delete operations
  • Key-value storage: Each value is associated with a unique key
  • Dynamic size: Can grow or shrink as needed
  • Unordered: Elements are not stored in any particular order

Hash Functions

A hash function is a function that maps data of arbitrary size to fixed-size values. In the context of hash tables, it converts keys into array indices.

A good hash function should have the following properties:

  • Deterministic: The same key should always produce the same hash value
  • Uniform distribution: Hash values should be evenly distributed across the array
  • Efficiency: The hash function should be computationally inexpensive
  • Minimal collisions: Different keys should rarely produce the same hash value

Common hash functions include:

  • Division method: h(k) = k mod m, where m is the size of the hash table
  • Multiplication method: h(k) = ⌊m(kA mod 1)⌋, where A is a constant between 0 and 1
  • Universal hashing: A family of hash functions chosen at random
  • Cryptographic hash functions: Such as SHA-256, which are designed to be secure but are often too slow for hash tables

For non-integer keys (like strings), we first need to convert them to integers before applying the hash function:

// Simple hash function for strings
func hashString(s string, tableSize int) int {
    hash := 0
    for i := 0; i < len(s); i++ {
        hash = (hash*31 + int(s[i])) % tableSize
    }
    return hash
}

Collision Resolution

A collision occurs when two different keys hash to the same index. There are several strategies to handle collisions:

Separate Chaining

In separate chaining, each bucket contains a linked list (or another data structure) of all key-value pairs that hash to the same index. When a collision occurs, the new key-value pair is simply added to the list.

Advantages:

  • Simple to implement
  • Hash table never fills up
  • Less sensitive to the hash function or load factors

Disadvantages:

  • Requires additional memory for linked list pointers
  • Cache performance can be poor due to the linked list traversal

Open Addressing

In open addressing, all key-value pairs are stored in the hash table array itself. When a collision occurs, the algorithm searches for the next available slot according to a probing sequence.

Common probing methods include:

  • Linear Probing: Check the next slot sequentially (h(k) + i) mod m
  • Quadratic Probing: Check slots according to a quadratic function (h(k) + i^2) mod m
  • Double Hashing: Use a second hash function to determine the step size (h1(k) + i*h2(k)) mod m

Advantages:

  • Better cache performance
  • No extra memory for pointers

Disadvantages:

  • More sensitive to the hash function and load factor
  • More complex deletion (typically using tombstones)
  • Performance degrades as the table fills up

Robin Hood Hashing

Robin Hood hashing is a variant of open addressing that aims to reduce the variance of probe sequence lengths by moving elements during insertion.

Cuckoo Hashing

Cuckoo hashing uses multiple hash functions and allows a key to be stored in one of several possible locations. When a collision occurs, it may move existing keys to their alternate locations.

Hash Table Operations

The primary operations on a hash table are:

  • Insert: Add a new key-value pair to the hash table
  • Search: Find the value associated with a given key
  • Delete: Remove a key-value pair from the hash table
  • Resize: Increase or decrease the size of the hash table to maintain performance

Load Factor

The load factor of a hash table is the ratio of the number of stored elements to the number of buckets. It's a critical parameter that affects performance:

Load Factor = Number of Elements / Number of Buckets

As the load factor increases, the probability of collisions increases, which can degrade performance. Hash tables typically resize (rehash) when the load factor exceeds a certain threshold (commonly 0.7 or 0.75).

Go's Built-in Maps

Go provides a built-in map type that implements a hash table. It's highly optimized and suitable for most use cases.

Creating and Using Maps

package main

import (
    "fmt"
)

func main() {
    // Create a map with string keys and int values
    scores := make(map[string]int)
    
    // Add key-value pairs
    scores["Alice"] = 95
    scores["Bob"] = 82
    scores["Charlie"] = 88
    
    // Create a map with initialization
    ages := map[string]int{
        "Alice":   30,
        "Bob":     25,
        "Charlie": 35,
    }
    
    // Access values
    fmt.Println("Alice's score:", scores["Alice"])
    
    // Check if a key exists
    bobAge, exists := ages["Bob"]
    if exists {
        fmt.Println("Bob's age:", bobAge)
    } else {
        fmt.Println("Bob's age is unknown")
    }
    
    // Accessing a non-existent key returns the zero value
    daveScore := scores["Dave"]
    fmt.Println("Dave's score:", daveScore) // Prints 0 (zero value for int)
    
    // Delete a key-value pair
    delete(scores, "Bob")
    
    // Iterate over a map
    fmt.Println("Scores:")
    for name, score := range scores {
        fmt.Printf("%s: %d\n", name, score)
    }
    
    // Get the number of elements
    fmt.Println("Number of scores:", len(scores))
    
    // Clear a map
    for k := range scores {
        delete(scores, k)
    }
    
    fmt.Println("After clearing, number of scores:", len(scores))
}

Map Internals

Go's built-in map is implemented as a hash table with buckets containing key-value pairs. It uses a variant of separate chaining for collision resolution.

Key features of Go's map implementation:

  • Automatically grows when the load factor exceeds a threshold
  • Uses a good hash function that provides uniform distribution
  • Not safe for concurrent use (requires external synchronization)
  • Iteration order is not guaranteed (and deliberately randomized in some Go versions)

Maps with Composite Keys

Sometimes you need to use a composite key (multiple values) as a map key. In Go, any comparable type can be a key, including structs:

package main

import (
    "fmt"
)

// Define a composite key type
type Point struct {
    X, Y int
}

func main() {
    // Map with struct keys
    grid := make(map[Point]string)
    
    // Add key-value pairs
    grid[Point{1, 2}] = "A"
    grid[Point{3, 4}] = "B"
    
    // Access values
    fmt.Println("Value at (1,2):", grid[Point{1, 2}])
    
    // Iterate over the map
    for point, value := range grid {
        fmt.Printf("(%d,%d): %s\n", point.X, point.Y, value)
    }
}

For non-comparable types (like slices or maps), you need to convert them to a comparable type first, typically by serializing them to a string:

package main

import (
    "encoding/json"
    "fmt"
)

func main() {
    // Map with string keys (serialized slices)
    sliceMap := make(map[string]string)
    
    // Add key-value pairs with slice keys
    key1 := []int{1, 2, 3}
    key2 := []int{4, 5, 6}
    
    // Serialize slices to strings
    key1Str, _ := json.Marshal(key1)
    key2Str, _ := json.Marshal(key2)
    
    sliceMap[string(key1Str)] = "Value for [1,2,3]"
    sliceMap[string(key2Str)] = "Value for [4,5,6]"
    
    // Access values
    fmt.Println("Value for [1,2,3]:", sliceMap[string(key1Str)])
    
    // To look up a value, you need to serialize the key again
    searchKey := []int{1, 2, 3}
    searchKeyStr, _ := json.Marshal(searchKey)
    fmt.Println("Found value:", sliceMap[string(searchKeyStr)])
}

Thread-Safe Maps

Go's built-in maps are not safe for concurrent use. For concurrent access, you can use sync.Map or implement your own thread-safe map using mutexes:

package main

import (
    "fmt"
    "sync"
)

// ThreadSafeMap is a thread-safe map implementation
type ThreadSafeMap struct {
    data map[string]interface{}
    mu   sync.RWMutex
}

// NewThreadSafeMap creates a new thread-safe map
func NewThreadSafeMap() *ThreadSafeMap {
    return &ThreadSafeMap{
        data: make(map[string]interface{}),
    }
}

// Set adds or updates a key-value pair
func (m *ThreadSafeMap) Set(key string, value interface{}) {
    m.mu.Lock()
    defer m.mu.Unlock()
    m.data[key] = value
}

// Get retrieves a value by key
func (m *ThreadSafeMap) Get(key string) (interface{}, bool) {
    m.mu.RLock()
    defer m.mu.RUnlock()
    value, exists := m.data[key]
    return value, exists
}

// Delete removes a key-value pair
func (m *ThreadSafeMap) Delete(key string) {
    m.mu.Lock()
    defer m.mu.Unlock()
    delete(m.data, key)
}

// Len returns the number of elements
func (m *ThreadSafeMap) Len() int {
    m.mu.RLock()
    defer m.mu.RUnlock()
    return len(m.data)
}

// Keys returns all keys in the map
func (m *ThreadSafeMap) Keys() []string {
    m.mu.RLock()
    defer m.mu.RUnlock()
    
    keys := make([]string, 0, len(m.data))
    for k := range m.data {
        keys = append(keys, k)
    }
    return keys
}

func main() {
    tsm := NewThreadSafeMap()
    
    // Concurrent writes
    var wg sync.WaitGroup
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            key := fmt.Sprintf("key%d", i)
            tsm.Set(key, i)
        }(i)
    }
    
    wg.Wait()
    
    // Concurrent reads
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func(i int) {
            defer wg.Done()
            key := fmt.Sprintf("key%d", i)
            value, exists := tsm.Get(key)
            if exists {
                fmt.Printf("Key: %s, Value: %v\n", key, value)
            }
        }(i)
    }
    
    wg.Wait()
    
    fmt.Println("Map size:", tsm.Len())
    fmt.Println("Keys:", tsm.Keys())
}

Alternatively, you can use Go's sync.Map which is optimized for specific use cases:

package main

import (
    "fmt"
    "sync"
)

func main() {
    var m sync.Map
    
    // Store key-value pairs
    m.Store("key1", "value1")
    m.Store("key2", "value2")
    
    // Load a value
    value, exists := m.Load("key1")
    if exists {
        fmt.Println("Value for key1:", value)
    }
    
    // LoadOrStore returns the existing value if the key exists,
    // otherwise it stores and returns the given value
    value, loaded := m.LoadOrStore("key3", "value3")
    if loaded {
        fmt.Println("Key3 already exists with value:", value)
    } else {
        fmt.Println("Stored new key3 with value:", value)
    }
    
    // Delete a key
    m.Delete("key2")
    
    // Iterate over all key-value pairs
    m.Range(func(key, value interface{}) bool {
        fmt.Printf("%v: %v\n", key, value)
        return true // continue iteration
    })
}

sync.Map is optimized for two common use cases:

  1. When the entry for a given key is only ever written once but read many times
  2. When multiple goroutines read, write, and overwrite entries for disjoint sets of keys

For other use cases, using a mutex-protected map might perform better.

Implementing a Hash Table

While Go's built-in map is sufficient for most use cases, implementing your own hash table can be educational and sometimes necessary for specialized requirements.

Let's implement a simple hash table with separate chaining:

package main

import (
    "fmt"
)

// KeyValue represents a key-value pair
type KeyValue struct {
    Key   string
    Value interface{}
    Next  *KeyValue // For chaining
}

// HashTable represents a hash table
type HashTable struct {
    Buckets []*KeyValue
    Size    int
}

// NewHashTable creates a new hash table with the specified number of buckets
func NewHashTable(bucketCount int) *HashTable {
    return &HashTable{
        Buckets: make([]*KeyValue, bucketCount),
        Size:    0,
    }
}

// hash computes the hash value for a key
func (ht *HashTable) hash(key string) int {
    hash := 0
    for i := 0; i < len(key); i++ {
        hash = (hash*31 + int(key[i])) % len(ht.Buckets)
    }
    return hash
}

// Put adds or updates a key-value pair
func (ht *HashTable) Put(key string, value interface{}) {
    index := ht.hash(key)
    
    // Check if key already exists
    current := ht.Buckets[index]
    for current != nil {
        if current.Key == key {
            // Key exists, update value
            current.Value = value
            return
        }
        current = current.Next
    }
    
    // Key doesn't exist, add new node at the beginning of the chain
    newNode := &KeyValue{
        Key:   key,
        Value: value,
        Next:  ht.Buckets[index],
    }
    ht.Buckets[index] = newNode
    ht.Size++
    
    // Check if rehashing is needed
    if float64(ht.Size)/float64(len(ht.Buckets)) > 0.75 {
        ht.rehash()
    }
}

// Get retrieves a value by key
func (ht *HashTable) Get(key string) (interface{}, bool) {
    index := ht.hash(key)
    
    current := ht.Buckets[index]
    for current != nil {
        if current.Key == key {
            return current.Value, true
        }
        current = current.Next
    }
    
    return nil, false
}

// Remove deletes a key-value pair
func (ht *HashTable) Remove(key string) bool {
    index := ht.hash(key)
    
    // Special case: first node matches
    if ht.Buckets[index] != nil && ht.Buckets[index].Key == key {
        ht.Buckets[index] = ht.Buckets[index].Next
        ht.Size--
        return true
    }
    
    // Check the rest of the chain
    current := ht.Buckets[index]
    for current != nil && current.Next != nil {
        if current.Next.Key == key {
            current.Next = current.Next.Next
            ht.Size--
            return true
        }
        current = current.Next
    }
    
    return false
}

// rehash increases the size of the hash table and redistributes the elements
func (ht *HashTable) rehash() {
    oldBuckets := ht.Buckets
    
    // Create a new hash table with double the size
    newBucketCount := len(ht.Buckets) * 2
    ht.Buckets = make([]*KeyValue, newBucketCount)
    ht.Size = 0
    
    // Rehash all existing key-value pairs
    for _, bucket := range oldBuckets {
        current := bucket
        for current != nil {
            ht.Put(current.Key, current.Value)
            current = current.Next
        }
    }
}

// Keys returns all keys in the hash table
func (ht *HashTable) Keys() []string {
    keys := make([]string, 0, ht.Size)
    
    for _, bucket := range ht.Buckets {
        current := bucket
        for current != nil {
            keys = append(keys, current.Key)
            current = current.Next
        }
    }
    
    return keys
}

// Values returns all values in the hash table
func (ht *HashTable) Values() []interface{} {
    values := make([]interface{}, 0, ht.Size)
    
    for _, bucket := range ht.Buckets {
        current := bucket
        for current != nil {
            values = append(values, current.Value)
            current = current.Next
        }
    }
    
    return values
}

// Len returns the number of elements in the hash table
func (ht *HashTable) Len() int {
    return ht.Size
}

// Clear removes all elements from the hash table
func (ht *HashTable) Clear() {
    ht.Buckets = make([]*KeyValue, len(ht.Buckets))
    ht.Size = 0
}

func main() {
    ht := NewHashTable(16)
    
    // Add key-value pairs
    ht.Put("apple", 5)
    ht.Put("banana", 10)
    ht.Put("cherry", 15)
    
    // Get values
    apple, exists := ht.Get("apple")
    if exists {
        fmt.Println("Apple count:", apple)
    }
    
    // Update a value
    ht.Put("apple", 7)
    apple, _ = ht.Get("apple")
    fmt.Println("Updated apple count:", apple)
    
    // Remove a key
    ht.Remove("banana")
    
    // Check if key exists
    _, exists = ht.Get("banana")
    fmt.Println("Banana exists:", exists)
    
    // Get all keys and values
    fmt.Println("Keys:", ht.Keys())
    fmt.Println("Values:", ht.Values())
    fmt.Println("Size:", ht.Len())
    
    // Clear the hash table
    ht.Clear()
    fmt.Println("Size after clearing:", ht.Len())
}

This implementation provides a basic hash table with separate chaining for collision resolution and automatic rehashing when the load factor exceeds 0.75. It supports the standard operations: put, get, remove, and clear.

Performance Analysis

Let's analyze the time complexity of hash table operations:

Operation Average Case Worst Case
Insert (Put) O(1) O(n)
Search (Get) O(1) O(n)
Delete (Remove) O(1) O(n)
Rehash O(n) O(n)

The worst-case scenario occurs when all keys hash to the same bucket, creating a long chain that must be traversed linearly. This is rare with a good hash function and proper load factor management.

The space complexity of a hash table is O(n), where n is the number of key-value pairs stored.

Factors Affecting Performance

  • Hash Function: A good hash function distributes keys uniformly, minimizing collisions
  • Load Factor: A lower load factor reduces collisions but increases memory usage
  • Collision Resolution Strategy: Different strategies have different trade-offs
  • Initial Capacity: Setting an appropriate initial capacity can reduce the need for rehashing

Use Cases

Hash tables are versatile data structures with numerous applications:

  • Caching: Storing computed results for quick retrieval
  • Database Indexing: Speeding up data retrieval operations
  • Symbol Tables: Used in compilers and interpreters to store variable information
  • Associative Arrays: Implementing key-value mappings
  • Counting Frequencies: Counting occurrences of elements in a collection
  • De-duplication: Identifying and removing duplicate elements
  • Graph Representation: Storing adjacency lists
  • Implementing Sets: Using hash tables with dummy values to implement sets

Best Practices

Here are some best practices for working with hash tables in Go:

Use Built-in Maps When Possible

Go's built-in map type is highly optimized and suitable for most use cases. Implement your own hash table only when you have specific requirements that the built-in map doesn't satisfy.

Choose Appropriate Key Types

In Go, map keys must be comparable types (boolean, numeric, string, pointer, channel, interface, or structs/arrays containing only comparable types). For non-comparable types, convert them to a comparable form first.

Pre-allocate Maps When Size is Known

If you know the approximate size of your map, pre-allocate it to avoid frequent resizing:

// Pre-allocate a map with capacity for 100 elements
m := make(map[string]int, 100)

Handle Concurrent Access

Go's built-in maps are not safe for concurrent use. Use sync.Map or implement your own thread-safe map using mutexes for concurrent access.

Check for Key Existence

Always check if a key exists before using its value, especially when the zero value of the value type is meaningful:

value, exists := m["key"]
if exists {
    // Use value
} else {
    // Handle missing key
}

Clear Maps Efficiently

To clear a map, either create a new one or delete all keys:

// Option 1: Create a new map
m = make(map[string]int)

// Option 2: Delete all keys
for k := range m {
    delete(m, k)
}

Use Maps for Lookups, Not Iteration

Maps are optimized for lookups, not for ordered iteration. If you need to iterate over elements in a specific order, consider using a separate slice to maintain the order.

Conclusion

Hash tables are powerful data structures that provide efficient key-value storage and retrieval. In Go, the built-in map type offers a high-performance implementation suitable for most use cases. Understanding the principles behind hash tables, including hash functions, collision resolution, and performance characteristics, allows you to use them effectively in your applications.

In the next section, we'll explore specialized data structures that are designed for specific use cases, such as tries, bloom filters, and skip lists.