Go Data Structures & Algorithms

code

Sets in Go

Sets are collections of unique elements with no specific order. They are fundamental data structures used in various algorithms and applications where uniqueness and membership testing are important. In this comprehensive guide, we'll explore sets in Go, including their implementations, operations, and practical applications.

What are Sets?

A set is an abstract data type that represents a collection of distinct elements. Unlike arrays or slices, sets do not allow duplicate elements and typically do not maintain any specific order. The primary operations on sets include adding elements, removing elements, and testing for membership.

Sets are based on the mathematical concept of a set, which is a collection of distinct objects. They are particularly useful when you need to:

  • Ensure uniqueness of elements
  • Perform set operations like union, intersection, and difference
  • Quickly check if an element exists in a collection

Set Operations

The fundamental operations on sets include:

  • Add: Add an element to the set if it's not already present
  • Remove: Remove an element from the set
  • Contains: Check if an element exists in the set
  • Size: Get the number of elements in the set
  • Clear: Remove all elements from the set

Additionally, sets support several mathematical operations:

  • Union: Create a new set containing all elements from both sets
  • Intersection: Create a new set containing only elements that exist in both sets
  • Difference: Create a new set containing elements that exist in the first set but not in the second
  • Symmetric Difference: Create a new set containing elements that exist in either set but not in both
  • Subset: Check if every element in the first set exists in the second set
  • Superset: Check if every element in the second set exists in the first set
  • Disjoint: Check if the sets have no elements in common

Implementing Sets in Go

Go does not have a built-in set type in its standard library. However, there are several ways to implement sets in Go:

  1. Map-Based Sets: Using maps with empty struct values
  2. Slice-Based Sets: Using slices with search and insert operations
  3. BitSets: Using bit arrays for sets of integers within a limited range
  4. Third-Party Libraries: Using libraries like Gods (Go Data Structures)

Let's explore each of these implementations in detail.

Map-Based Set Implementation

The most common and efficient way to implement a set in Go is using a map with empty struct values. The keys of the map represent the elements of the set, and the values are empty structs (which take up no memory).

package main

import (
    "fmt"
    "sort"
    "strings"
)

// Set represents a set of unique elements
type Set struct {
    elements map[interface{}]struct{}
}

// NewSet creates a new empty set
func NewSet() *Set {
    return &Set{
        elements: make(map[interface{}]struct{}),
    }
}

// NewSetFromSlice creates a new set from a slice of elements
func NewSetFromSlice(items []interface{}) *Set {
    set := NewSet()
    for _, item := range items {
        set.Add(item)
    }
    return set
}

// Add adds an element to the set
func (s *Set) Add(element interface{}) {
    s.elements[element] = struct{}{}
}

// Remove removes an element from the set
func (s *Set) Remove(element interface{}) {
    delete(s.elements, element)
}

// Contains checks if an element exists in the set
func (s *Set) Contains(element interface{}) bool {
    _, exists := s.elements[element]
    return exists
}

// Size returns the number of elements in the set
func (s *Set) Size() int {
    return len(s.elements)
}

// Clear removes all elements from the set
func (s *Set) Clear() {
    s.elements = make(map[interface{}]struct{})
}

// Elements returns a slice of all elements in the set
func (s *Set) Elements() []interface{} {
    elements := make([]interface{}, 0, len(s.elements))
    for element := range s.elements {
        elements = append(elements, element)
    }
    return elements
}

// Union returns a new set containing all elements from both sets
func (s *Set) Union(other *Set) *Set {
    unionSet := NewSet()
    
    // Add elements from the first set
    for element := range s.elements {
        unionSet.Add(element)
    }
    
    // Add elements from the second set
    for element := range other.elements {
        unionSet.Add(element)
    }
    
    return unionSet
}

// Intersection returns a new set containing only elements that exist in both sets
func (s *Set) Intersection(other *Set) *Set {
    intersectionSet := NewSet()
    
    // Find the smaller set to iterate over
    var smaller, larger *Set
    if s.Size() <= other.Size() {
        smaller, larger = s, other
    } else {
        smaller, larger = other, s
    }
    
    // Add elements that exist in both sets
    for element := range smaller.elements {
        if larger.Contains(element) {
            intersectionSet.Add(element)
        }
    }
    
    return intersectionSet
}

// Difference returns a new set containing elements that exist in the first set but not in the second
func (s *Set) Difference(other *Set) *Set {
    differenceSet := NewSet()
    
    // Add elements from the first set that don't exist in the second set
    for element := range s.elements {
        if !other.Contains(element) {
            differenceSet.Add(element)
        }
    }
    
    return differenceSet
}

// SymmetricDifference returns a new set containing elements that exist in either set but not in both
func (s *Set) SymmetricDifference(other *Set) *Set {
    symmetricDifferenceSet := NewSet()
    
    // Add elements from the first set that don't exist in the second set
    for element := range s.elements {
        if !other.Contains(element) {
            symmetricDifferenceSet.Add(element)
        }
    }
    
    // Add elements from the second set that don't exist in the first set
    for element := range other.elements {
        if !s.Contains(element) {
            symmetricDifferenceSet.Add(element)
        }
    }
    
    return symmetricDifferenceSet
}

// IsSubset checks if every element in the set exists in the other set
func (s *Set) IsSubset(other *Set) bool {
    // If this set has more elements, it can't be a subset
    if s.Size() > other.Size() {
        return false
    }
    
    // Check if every element in this set exists in the other set
    for element := range s.elements {
        if !other.Contains(element) {
            return false
        }
    }
    
    return true
}

// IsSuperset checks if every element in the other set exists in this set
func (s *Set) IsSuperset(other *Set) bool {
    return other.IsSubset(s)
}

// IsDisjoint checks if the sets have no elements in common
func (s *Set) IsDisjoint(other *Set) bool {
    // Find the smaller set to iterate over
    var smaller, larger *Set
    if s.Size() <= other.Size() {
        smaller, larger = s, other
    } else {
        smaller, larger = other, s
    }
    
    // Check if any element in the smaller set exists in the larger set
    for element := range smaller.elements {
        if larger.Contains(element) {
            return false
        }
    }
    
    return true
}

// String returns a string representation of the set
func (s *Set) String() string {
    elements := s.Elements()
    
    // Convert elements to strings
    stringElements := make([]string, len(elements))
    for i, element := range elements {
        stringElements[i] = fmt.Sprintf("%v", element)
    }
    
    // Sort for consistent output
    sort.Strings(stringElements)
    
    return "{" + strings.Join(stringElements, ", ") + "}"
}

func main() {
    // Create sets
    set1 := NewSet()
    set1.Add(1)
    set1.Add(2)
    set1.Add(3)
    
    set2 := NewSet()
    set2.Add(3)
    set2.Add(4)
    set2.Add(5)
    
    fmt.Println("Set 1:", set1)
    fmt.Println("Set 2:", set2)
    
    // Demonstrate set operations
    fmt.Println("Union:", set1.Union(set2))
    fmt.Println("Intersection:", set1.Intersection(set2))
    fmt.Println("Difference (Set1 - Set2):", set1.Difference(set2))
    fmt.Println("Symmetric Difference:", set1.SymmetricDifference(set2))
    
    // Check relationships
    fmt.Println("Set1 is subset of Set2:", set1.IsSubset(set2))
    fmt.Println("Set1 is superset of Set2:", set1.IsSuperset(set2))
    fmt.Println("Set1 is disjoint with Set2:", set1.IsDisjoint(set2))
    
    // Create a subset
    subset := NewSet()
    subset.Add(1)
    subset.Add(2)
    
    fmt.Println("Subset:", subset)
    fmt.Println("Subset is subset of Set1:", subset.IsSubset(set1))
    
    // Demonstrate other operations
    fmt.Println("Set1 contains 2:", set1.Contains(2))
    fmt.Println("Set1 contains 5:", set1.Contains(5))
    
    set1.Remove(2)
    fmt.Println("Set1 after removing 2:", set1)
    
    fmt.Println("Set1 size:", set1.Size())
    
    set1.Clear()
    fmt.Println("Set1 after clearing:", set1)
}

This implementation provides a generic set that can store elements of any type. It uses interface{} as the key type in the map, allowing for flexibility but sacrificing some type safety. For a type-specific set, you can replace interface{} with the desired type.

Type-Specific Set

For better type safety and performance, you can create a set for a specific type:

// IntSet represents a set of integers
type IntSet struct {
    elements map[int]struct{}
}

// NewIntSet creates a new empty integer set
func NewIntSet() *IntSet {
    return &IntSet{
        elements: make(map[int]struct{}),
    }
}

// Add adds an integer to the set
func (s *IntSet) Add(element int) {
    s.elements[element] = struct{}{}
}

// Other methods would be similar to the generic Set implementation
// but with int instead of interface{}

Slice-Based Set Implementation

For small sets or when order matters, you can implement a set using a slice. This approach is less efficient for large sets but can be useful in specific scenarios.

package main

import (
    "fmt"
    "sort"
)

// SliceSet represents a set of integers using a slice
type SliceSet struct {
    elements []int
    sorted   bool
}

// NewSliceSet creates a new empty slice-based set
func NewSliceSet() *SliceSet {
    return &SliceSet{
        elements: make([]int, 0),
        sorted:   true,
    }
}

// Add adds an element to the set if it's not already present
func (s *SliceSet) Add(element int) {
    // Check if the element already exists
    if s.Contains(element) {
        return
    }
    
    // Add the element
    s.elements = append(s.elements, element)
    s.sorted = false
}

// Remove removes an element from the set
func (s *SliceSet) Remove(element int) {
    for i, e := range s.elements {
        if e == element {
            // Remove the element by replacing it with the last element
            // and truncating the slice
            s.elements[i] = s.elements[len(s.elements)-1]
            s.elements = s.elements[:len(s.elements)-1]
            s.sorted = false
            return
        }
    }
}

// Contains checks if an element exists in the set
func (s *SliceSet) Contains(element int) bool {
    // If the set is sorted, use binary search
    if s.sorted {
        index := sort.SearchInts(s.elements, element)
        return index < len(s.elements) && s.elements[index] == element
    }
    
    // Otherwise, use linear search
    for _, e := range s.elements {
        if e == element {
            return true
        }
    }
    return false
}

// Size returns the number of elements in the set
func (s *SliceSet) Size() int {
    return len(s.elements)
}

// Clear removes all elements from the set
func (s *SliceSet) Clear() {
    s.elements = make([]int, 0)
    s.sorted = true
}

// Elements returns a slice of all elements in the set
func (s *SliceSet) Elements() []int {
    // Ensure the elements are sorted
    s.ensureSorted()
    
    // Return a copy to prevent external modification
    result := make([]int, len(s.elements))
    copy(result, s.elements)
    return result
}

// ensureSorted sorts the elements if they are not already sorted
func (s *SliceSet) ensureSorted() {
    if !s.sorted {
        sort.Ints(s.elements)
        s.sorted = true
    }
}

// String returns a string representation of the set
func (s *SliceSet) String() string {
    s.ensureSorted()
    return fmt.Sprintf("%v", s.elements)
}

func main() {
    set := NewSliceSet()
    
    set.Add(3)
    set.Add(1)
    set.Add(4)
    set.Add(1) // Duplicate, should not be added
    
    fmt.Println("Set:", set)
    fmt.Println("Contains 3:", set.Contains(3))
    fmt.Println("Contains 2:", set.Contains(2))
    
    set.Remove(3)
    fmt.Println("After removing 3:", set)
    
    fmt.Println("Size:", set.Size())
    
    elements := set.Elements()
    fmt.Println("Elements:", elements)
    
    set.Clear()
    fmt.Println("After clearing:", set)
}

This slice-based implementation maintains elements in a sorted order when needed, allowing for efficient membership testing using binary search. However, insertions and deletions are less efficient than in a map-based implementation, especially for large sets.

BitSet Implementation

For sets of integers within a limited range, a BitSet (or bit array) can be an extremely memory-efficient implementation. Each bit in the array represents the presence or absence of a specific integer.

package main

import (
    "fmt"
    "strings"
)

// BitSet represents a set of non-negative integers using bits
type BitSet struct {
    bits []uint64
    size int
}

// NewBitSet creates a new bit set with capacity for integers up to maxElement
func NewBitSet(maxElement int) *BitSet {
    // Calculate the number of uint64 values needed
    numBits := (maxElement + 63) / 64
    return &BitSet{
        bits: make([]uint64, numBits),
        size: 0,
    }
}

// Add adds an element to the set
func (bs *BitSet) Add(element int) {
    if element < 0 {
        panic("BitSet only supports non-negative integers")
    }
    
    // Calculate the index and bit position
    index := element / 64
    position := uint(element % 64)
    
    // Ensure the slice is large enough
    for index >= len(bs.bits) {
        bs.bits = append(bs.bits, 0)
    }
    
    // Check if the bit is already set
    if (bs.bits[index] & (1 << position)) == 0 {
        bs.bits[index] |= (1 << position)
        bs.size++
    }
}

// Remove removes an element from the set
func (bs *BitSet) Remove(element int) {
    if element < 0 {
        return
    }
    
    // Calculate the index and bit position
    index := element / 64
    position := uint(element % 64)
    
    // Check if the index is valid
    if index >= len(bs.bits) {
        return
    }
    
    // Check if the bit is set
    if (bs.bits[index] & (1 << position)) != 0 {
        bs.bits[index] &= ^(1 << position)
        bs.size--
    }
}

// Contains checks if an element exists in the set
func (bs *BitSet) Contains(element int) bool {
    if element < 0 {
        return false
    }
    
    // Calculate the index and bit position
    index := element / 64
    position := uint(element % 64)
    
    // Check if the index is valid
    if index >= len(bs.bits) {
        return false
    }
    
    // Check if the bit is set
    return (bs.bits[index] & (1 << position)) != 0
}

// Size returns the number of elements in the set
func (bs *BitSet) Size() int {
    return bs.size
}

// Clear removes all elements from the set
func (bs *BitSet) Clear() {
    for i := range bs.bits {
        bs.bits[i] = 0
    }
    bs.size = 0
}

// Elements returns a slice of all elements in the set
func (bs *BitSet) Elements() []int {
    elements := make([]int, 0, bs.size)
    
    for i, bits := range bs.bits {
        if bits == 0 {
            continue
        }
        
        for j := 0; j < 64; j++ {
            if (bits & (1 << uint(j))) != 0 {
                elements = append(elements, i*64+j)
            }
        }
    }
    
    return elements
}

// Union returns a new bit set containing all elements from both sets
func (bs *BitSet) Union(other *BitSet) *BitSet {
    // Create a new bit set with the maximum capacity
    maxLen := len(bs.bits)
    if len(other.bits) > maxLen {
        maxLen = len(other.bits)
    }
    
    result := &BitSet{
        bits: make([]uint64, maxLen),
        size: 0,
    }
    
    // Perform bitwise OR for each uint64
    for i := 0; i < maxLen; i++ {
        var value uint64
        
        if i < len(bs.bits) {
            value |= bs.bits[i]
        }
        
        if i < len(other.bits) {
            value |= other.bits[i]
        }
        
        result.bits[i] = value
    }
    
    // Calculate the size
    for _, bits := range result.bits {
        result.size += countBits(bits)
    }
    
    return result
}

// Intersection returns a new bit set containing only elements that exist in both sets
func (bs *BitSet) Intersection(other *BitSet) *BitSet {
    // Create a new bit set with the minimum capacity
    minLen := len(bs.bits)
    if len(other.bits) < minLen {
        minLen = len(other.bits)
    }
    
    result := &BitSet{
        bits: make([]uint64, minLen),
        size: 0,
    }
    
    // Perform bitwise AND for each uint64
    for i := 0; i < minLen; i++ {
        result.bits[i] = bs.bits[i] & other.bits[i]
    }
    
    // Calculate the size
    for _, bits := range result.bits {
        result.size += countBits(bits)
    }
    
    return result
}

// countBits counts the number of set bits in a uint64
func countBits(n uint64) int {
    count := 0
    for n != 0 {
        count++
        n &= n - 1 // Clear the least significant bit set
    }
    return count
}

// String returns a string representation of the bit set
func (bs *BitSet) String() string {
    elements := bs.Elements()
    
    // Convert elements to strings
    stringElements := make([]string, len(elements))
    for i, element := range elements {
        stringElements[i] = fmt.Sprintf("%d", element)
    }
    
    return "{" + strings.Join(stringElements, ", ") + "}"
}

func main() {
    // Create a bit set with capacity for integers up to 100
    bs := NewBitSet(100)
    
    // Add elements
    bs.Add(1)
    bs.Add(5)
    bs.Add(10)
    bs.Add(50)
    
    fmt.Println("BitSet:", bs)
    fmt.Println("Contains 5:", bs.Contains(5))
    fmt.Println("Contains 7:", bs.Contains(7))
    
    // Create another bit set
    bs2 := NewBitSet(100)
    bs2.Add(5)
    bs2.Add(15)
    bs2.Add(25)
    
    fmt.Println("BitSet 2:", bs2)
    
    // Perform set operations
    union := bs.Union(bs2)
    intersection := bs.Intersection(bs2)
    
    fmt.Println("Union:", union)
    fmt.Println("Intersection:", intersection)
    
    // Remove an element
    bs.Remove(5)
    fmt.Println("After removing 5:", bs)
    
    // Clear the set
    bs.Clear()
    fmt.Println("After clearing:", bs)
}

BitSets are extremely memory-efficient for sets of integers within a limited range. They provide O(1) time complexity for add, remove, and contains operations, and set operations like union and intersection can be performed using fast bitwise operations.

Thread-Safe Sets

If you need to access a set from multiple goroutines concurrently, you should implement thread safety using mutexes or other synchronization primitives.

package main

import (
    "fmt"
    "sync"
)

// ThreadSafeSet represents a thread-safe set of elements
type ThreadSafeSet struct {
    elements map[interface{}]struct{}
    mutex    sync.RWMutex
}

// NewThreadSafeSet creates a new empty thread-safe set
func NewThreadSafeSet() *ThreadSafeSet {
    return &ThreadSafeSet{
        elements: make(map[interface{}]struct{}),
    }
}

// Add adds an element to the set
func (s *ThreadSafeSet) Add(element interface{}) {
    s.mutex.Lock()
    defer s.mutex.Unlock()
    
    s.elements[element] = struct{}{}
}

// Remove removes an element from the set
func (s *ThreadSafeSet) Remove(element interface{}) {
    s.mutex.Lock()
    defer s.mutex.Unlock()
    
    delete(s.elements, element)
}

// Contains checks if an element exists in the set
func (s *ThreadSafeSet) Contains(element interface{}) bool {
    s.mutex.RLock()
    defer s.mutex.RUnlock()
    
    _, exists := s.elements[element]
    return exists
}

// Size returns the number of elements in the set
func (s *ThreadSafeSet) Size() int {
    s.mutex.RLock()
    defer s.mutex.RUnlock()
    
    return len(s.elements)
}

// Clear removes all elements from the set
func (s *ThreadSafeSet) Clear() {
    s.mutex.Lock()
    defer s.mutex.Unlock()
    
    s.elements = make(map[interface{}]struct{})
}

// Elements returns a slice of all elements in the set
func (s *ThreadSafeSet) Elements() []interface{} {
    s.mutex.RLock()
    defer s.mutex.RUnlock()
    
    elements := make([]interface{}, 0, len(s.elements))
    for element := range s.elements {
        elements = append(elements, element)
    }
    return elements
}

// String returns a string representation of the set
func (s *ThreadSafeSet) String() string {
    s.mutex.RLock()
    defer s.mutex.RUnlock()
    
    return fmt.Sprintf("%v", s.elements)
}

func main() {
    set := NewThreadSafeSet()
    
    // Concurrent operations
    var wg sync.WaitGroup
    
    // Add elements concurrently
    for i := 0; i < 10; i++ {
        wg.Add(1)
        go func(val int) {
            defer wg.Done()
            set.Add(val)
        }(i)
    }
    
    wg.Wait()
    
    fmt.Println("Set size:", set.Size())
    fmt.Println("Elements:", set.Elements())
    
    // Check elements concurrently
    for i := 0; i < 15; i++ {
        wg.Add(1)
        go func(val int) {
            defer wg.Done()
            if set.Contains(val) {
                fmt.Printf("Set contains %d\n", val)
            } else {
                fmt.Printf("Set does not contain %d\n", val)
            }
        }(i)
    }
    
    wg.Wait()
}

This thread-safe set implementation uses a read-write mutex to allow concurrent reads while ensuring exclusive access during writes. This approach is suitable for scenarios where the set is accessed by multiple goroutines.

Performance Analysis

Let's analyze the time complexity of common set operations for different implementations:

Operation Map-Based Set Slice-Based Set (Unsorted) Slice-Based Set (Sorted) BitSet
Add O(1) average O(n) (to check for duplicates) O(log n) for search + O(n) for insertion O(1)
Remove O(1) average O(n) O(log n) for search + O(n) for removal O(1)
Contains O(1) average O(n) O(log n) O(1)
Union O(n + m) O(n * m) O(n + m) O(k) where k is the number of words
Intersection O(min(n, m)) O(n * m) O(n + m) O(k) where k is the number of words
Space Complexity O(n) O(n) O(n) O(m) where m is the maximum element value

Based on this analysis:

  • Map-based sets offer the best overall performance for general-purpose use.
  • BitSets are extremely efficient for sets of integers within a limited range.
  • Slice-based sets can be useful for small sets or when order matters, but they don't scale well.

Use Cases

Sets are useful in a variety of scenarios:

  • Removing Duplicates: Converting a list to a set and back to remove duplicate elements.
  • Membership Testing: Efficiently checking if an element exists in a collection.
  • Mathematical Set Operations: Performing union, intersection, and difference operations.
  • Graph Algorithms: Tracking visited nodes in graph traversal algorithms.
  • Caching: Implementing simple caches or lookup tables.
  • Spell Checking: Storing dictionaries for efficient word lookup.
  • Unique Counting: Counting unique elements in a stream of data.

Best Practices

Here are some best practices for working with sets in Go:

Choose the Right Implementation

  • Use map-based sets for general-purpose use with good performance.
  • Use BitSets for sets of integers within a limited range.
  • Use slice-based sets only for small sets or when order matters.

Consider Type Safety

When possible, create type-specific sets (e.g., IntSet, StringSet) instead of using interface{} for better type safety and performance.

Implement Thread Safety When Needed

If a set will be accessed by multiple goroutines concurrently, implement proper synchronization using mutexes or other primitives.

Use Existing Libraries

Consider using established libraries like Gods (Go Data Structures) or golang-set instead of implementing sets from scratch.

Optimize Set Operations

When performing set operations like intersection, iterate over the smaller set to minimize the number of lookups.

Conclusion

Sets are versatile data structures that provide efficient solutions for managing unique collections of elements. In Go, you can implement sets using maps, slices, or bit arrays, each with its own advantages and trade-offs. By understanding these implementations and their performance characteristics, you can choose the right approach for your specific needs.

In the next section, we'll explore hash tables, which are closely related to sets but provide key-value storage rather than just element membership.