Go Data Structures & Algorithms
Sets in Go
Table of Contents
Sets are collections of unique elements with no specific order. They are fundamental data structures used in various algorithms and applications where uniqueness and membership testing are important. In this comprehensive guide, we'll explore sets in Go, including their implementations, operations, and practical applications.
What are Sets?
A set is an abstract data type that represents a collection of distinct elements. Unlike arrays or slices, sets do not allow duplicate elements and typically do not maintain any specific order. The primary operations on sets include adding elements, removing elements, and testing for membership.
Sets are based on the mathematical concept of a set, which is a collection of distinct objects. They are particularly useful when you need to:
- Ensure uniqueness of elements
- Perform set operations like union, intersection, and difference
- Quickly check if an element exists in a collection
Set Operations
The fundamental operations on sets include:
- Add: Add an element to the set if it's not already present
- Remove: Remove an element from the set
- Contains: Check if an element exists in the set
- Size: Get the number of elements in the set
- Clear: Remove all elements from the set
Additionally, sets support several mathematical operations:
- Union: Create a new set containing all elements from both sets
- Intersection: Create a new set containing only elements that exist in both sets
- Difference: Create a new set containing elements that exist in the first set but not in the second
- Symmetric Difference: Create a new set containing elements that exist in either set but not in both
- Subset: Check if every element in the first set exists in the second set
- Superset: Check if every element in the second set exists in the first set
- Disjoint: Check if the sets have no elements in common
Implementing Sets in Go
Go does not have a built-in set type in its standard library. However, there are several ways to implement sets in Go:
- Map-Based Sets: Using maps with empty struct values
- Slice-Based Sets: Using slices with search and insert operations
- BitSets: Using bit arrays for sets of integers within a limited range
- Third-Party Libraries: Using libraries like Gods (Go Data Structures)
Let's explore each of these implementations in detail.
Map-Based Set Implementation
The most common and efficient way to implement a set in Go is using a map with empty struct values. The keys of the map represent the elements of the set, and the values are empty structs (which take up no memory).
package main
import (
"fmt"
"sort"
"strings"
)
// Set represents a set of unique elements
type Set struct {
elements map[interface{}]struct{}
}
// NewSet creates a new empty set
func NewSet() *Set {
return &Set{
elements: make(map[interface{}]struct{}),
}
}
// NewSetFromSlice creates a new set from a slice of elements
func NewSetFromSlice(items []interface{}) *Set {
set := NewSet()
for _, item := range items {
set.Add(item)
}
return set
}
// Add adds an element to the set
func (s *Set) Add(element interface{}) {
s.elements[element] = struct{}{}
}
// Remove removes an element from the set
func (s *Set) Remove(element interface{}) {
delete(s.elements, element)
}
// Contains checks if an element exists in the set
func (s *Set) Contains(element interface{}) bool {
_, exists := s.elements[element]
return exists
}
// Size returns the number of elements in the set
func (s *Set) Size() int {
return len(s.elements)
}
// Clear removes all elements from the set
func (s *Set) Clear() {
s.elements = make(map[interface{}]struct{})
}
// Elements returns a slice of all elements in the set
func (s *Set) Elements() []interface{} {
elements := make([]interface{}, 0, len(s.elements))
for element := range s.elements {
elements = append(elements, element)
}
return elements
}
// Union returns a new set containing all elements from both sets
func (s *Set) Union(other *Set) *Set {
unionSet := NewSet()
// Add elements from the first set
for element := range s.elements {
unionSet.Add(element)
}
// Add elements from the second set
for element := range other.elements {
unionSet.Add(element)
}
return unionSet
}
// Intersection returns a new set containing only elements that exist in both sets
func (s *Set) Intersection(other *Set) *Set {
intersectionSet := NewSet()
// Find the smaller set to iterate over
var smaller, larger *Set
if s.Size() <= other.Size() {
smaller, larger = s, other
} else {
smaller, larger = other, s
}
// Add elements that exist in both sets
for element := range smaller.elements {
if larger.Contains(element) {
intersectionSet.Add(element)
}
}
return intersectionSet
}
// Difference returns a new set containing elements that exist in the first set but not in the second
func (s *Set) Difference(other *Set) *Set {
differenceSet := NewSet()
// Add elements from the first set that don't exist in the second set
for element := range s.elements {
if !other.Contains(element) {
differenceSet.Add(element)
}
}
return differenceSet
}
// SymmetricDifference returns a new set containing elements that exist in either set but not in both
func (s *Set) SymmetricDifference(other *Set) *Set {
symmetricDifferenceSet := NewSet()
// Add elements from the first set that don't exist in the second set
for element := range s.elements {
if !other.Contains(element) {
symmetricDifferenceSet.Add(element)
}
}
// Add elements from the second set that don't exist in the first set
for element := range other.elements {
if !s.Contains(element) {
symmetricDifferenceSet.Add(element)
}
}
return symmetricDifferenceSet
}
// IsSubset checks if every element in the set exists in the other set
func (s *Set) IsSubset(other *Set) bool {
// If this set has more elements, it can't be a subset
if s.Size() > other.Size() {
return false
}
// Check if every element in this set exists in the other set
for element := range s.elements {
if !other.Contains(element) {
return false
}
}
return true
}
// IsSuperset checks if every element in the other set exists in this set
func (s *Set) IsSuperset(other *Set) bool {
return other.IsSubset(s)
}
// IsDisjoint checks if the sets have no elements in common
func (s *Set) IsDisjoint(other *Set) bool {
// Find the smaller set to iterate over
var smaller, larger *Set
if s.Size() <= other.Size() {
smaller, larger = s, other
} else {
smaller, larger = other, s
}
// Check if any element in the smaller set exists in the larger set
for element := range smaller.elements {
if larger.Contains(element) {
return false
}
}
return true
}
// String returns a string representation of the set
func (s *Set) String() string {
elements := s.Elements()
// Convert elements to strings
stringElements := make([]string, len(elements))
for i, element := range elements {
stringElements[i] = fmt.Sprintf("%v", element)
}
// Sort for consistent output
sort.Strings(stringElements)
return "{" + strings.Join(stringElements, ", ") + "}"
}
func main() {
// Create sets
set1 := NewSet()
set1.Add(1)
set1.Add(2)
set1.Add(3)
set2 := NewSet()
set2.Add(3)
set2.Add(4)
set2.Add(5)
fmt.Println("Set 1:", set1)
fmt.Println("Set 2:", set2)
// Demonstrate set operations
fmt.Println("Union:", set1.Union(set2))
fmt.Println("Intersection:", set1.Intersection(set2))
fmt.Println("Difference (Set1 - Set2):", set1.Difference(set2))
fmt.Println("Symmetric Difference:", set1.SymmetricDifference(set2))
// Check relationships
fmt.Println("Set1 is subset of Set2:", set1.IsSubset(set2))
fmt.Println("Set1 is superset of Set2:", set1.IsSuperset(set2))
fmt.Println("Set1 is disjoint with Set2:", set1.IsDisjoint(set2))
// Create a subset
subset := NewSet()
subset.Add(1)
subset.Add(2)
fmt.Println("Subset:", subset)
fmt.Println("Subset is subset of Set1:", subset.IsSubset(set1))
// Demonstrate other operations
fmt.Println("Set1 contains 2:", set1.Contains(2))
fmt.Println("Set1 contains 5:", set1.Contains(5))
set1.Remove(2)
fmt.Println("Set1 after removing 2:", set1)
fmt.Println("Set1 size:", set1.Size())
set1.Clear()
fmt.Println("Set1 after clearing:", set1)
}
This implementation provides a generic set that can store elements of any type. It uses interface{} as the key type in the map, allowing for flexibility but sacrificing some type safety. For a type-specific set, you can replace interface{} with the desired type.
Type-Specific Set
For better type safety and performance, you can create a set for a specific type:
// IntSet represents a set of integers
type IntSet struct {
elements map[int]struct{}
}
// NewIntSet creates a new empty integer set
func NewIntSet() *IntSet {
return &IntSet{
elements: make(map[int]struct{}),
}
}
// Add adds an integer to the set
func (s *IntSet) Add(element int) {
s.elements[element] = struct{}{}
}
// Other methods would be similar to the generic Set implementation
// but with int instead of interface{}
Slice-Based Set Implementation
For small sets or when order matters, you can implement a set using a slice. This approach is less efficient for large sets but can be useful in specific scenarios.
package main
import (
"fmt"
"sort"
)
// SliceSet represents a set of integers using a slice
type SliceSet struct {
elements []int
sorted bool
}
// NewSliceSet creates a new empty slice-based set
func NewSliceSet() *SliceSet {
return &SliceSet{
elements: make([]int, 0),
sorted: true,
}
}
// Add adds an element to the set if it's not already present
func (s *SliceSet) Add(element int) {
// Check if the element already exists
if s.Contains(element) {
return
}
// Add the element
s.elements = append(s.elements, element)
s.sorted = false
}
// Remove removes an element from the set
func (s *SliceSet) Remove(element int) {
for i, e := range s.elements {
if e == element {
// Remove the element by replacing it with the last element
// and truncating the slice
s.elements[i] = s.elements[len(s.elements)-1]
s.elements = s.elements[:len(s.elements)-1]
s.sorted = false
return
}
}
}
// Contains checks if an element exists in the set
func (s *SliceSet) Contains(element int) bool {
// If the set is sorted, use binary search
if s.sorted {
index := sort.SearchInts(s.elements, element)
return index < len(s.elements) && s.elements[index] == element
}
// Otherwise, use linear search
for _, e := range s.elements {
if e == element {
return true
}
}
return false
}
// Size returns the number of elements in the set
func (s *SliceSet) Size() int {
return len(s.elements)
}
// Clear removes all elements from the set
func (s *SliceSet) Clear() {
s.elements = make([]int, 0)
s.sorted = true
}
// Elements returns a slice of all elements in the set
func (s *SliceSet) Elements() []int {
// Ensure the elements are sorted
s.ensureSorted()
// Return a copy to prevent external modification
result := make([]int, len(s.elements))
copy(result, s.elements)
return result
}
// ensureSorted sorts the elements if they are not already sorted
func (s *SliceSet) ensureSorted() {
if !s.sorted {
sort.Ints(s.elements)
s.sorted = true
}
}
// String returns a string representation of the set
func (s *SliceSet) String() string {
s.ensureSorted()
return fmt.Sprintf("%v", s.elements)
}
func main() {
set := NewSliceSet()
set.Add(3)
set.Add(1)
set.Add(4)
set.Add(1) // Duplicate, should not be added
fmt.Println("Set:", set)
fmt.Println("Contains 3:", set.Contains(3))
fmt.Println("Contains 2:", set.Contains(2))
set.Remove(3)
fmt.Println("After removing 3:", set)
fmt.Println("Size:", set.Size())
elements := set.Elements()
fmt.Println("Elements:", elements)
set.Clear()
fmt.Println("After clearing:", set)
}
This slice-based implementation maintains elements in a sorted order when needed, allowing for efficient membership testing using binary search. However, insertions and deletions are less efficient than in a map-based implementation, especially for large sets.
BitSet Implementation
For sets of integers within a limited range, a BitSet (or bit array) can be an extremely memory-efficient implementation. Each bit in the array represents the presence or absence of a specific integer.
package main
import (
"fmt"
"strings"
)
// BitSet represents a set of non-negative integers using bits
type BitSet struct {
bits []uint64
size int
}
// NewBitSet creates a new bit set with capacity for integers up to maxElement
func NewBitSet(maxElement int) *BitSet {
// Calculate the number of uint64 values needed
numBits := (maxElement + 63) / 64
return &BitSet{
bits: make([]uint64, numBits),
size: 0,
}
}
// Add adds an element to the set
func (bs *BitSet) Add(element int) {
if element < 0 {
panic("BitSet only supports non-negative integers")
}
// Calculate the index and bit position
index := element / 64
position := uint(element % 64)
// Ensure the slice is large enough
for index >= len(bs.bits) {
bs.bits = append(bs.bits, 0)
}
// Check if the bit is already set
if (bs.bits[index] & (1 << position)) == 0 {
bs.bits[index] |= (1 << position)
bs.size++
}
}
// Remove removes an element from the set
func (bs *BitSet) Remove(element int) {
if element < 0 {
return
}
// Calculate the index and bit position
index := element / 64
position := uint(element % 64)
// Check if the index is valid
if index >= len(bs.bits) {
return
}
// Check if the bit is set
if (bs.bits[index] & (1 << position)) != 0 {
bs.bits[index] &= ^(1 << position)
bs.size--
}
}
// Contains checks if an element exists in the set
func (bs *BitSet) Contains(element int) bool {
if element < 0 {
return false
}
// Calculate the index and bit position
index := element / 64
position := uint(element % 64)
// Check if the index is valid
if index >= len(bs.bits) {
return false
}
// Check if the bit is set
return (bs.bits[index] & (1 << position)) != 0
}
// Size returns the number of elements in the set
func (bs *BitSet) Size() int {
return bs.size
}
// Clear removes all elements from the set
func (bs *BitSet) Clear() {
for i := range bs.bits {
bs.bits[i] = 0
}
bs.size = 0
}
// Elements returns a slice of all elements in the set
func (bs *BitSet) Elements() []int {
elements := make([]int, 0, bs.size)
for i, bits := range bs.bits {
if bits == 0 {
continue
}
for j := 0; j < 64; j++ {
if (bits & (1 << uint(j))) != 0 {
elements = append(elements, i*64+j)
}
}
}
return elements
}
// Union returns a new bit set containing all elements from both sets
func (bs *BitSet) Union(other *BitSet) *BitSet {
// Create a new bit set with the maximum capacity
maxLen := len(bs.bits)
if len(other.bits) > maxLen {
maxLen = len(other.bits)
}
result := &BitSet{
bits: make([]uint64, maxLen),
size: 0,
}
// Perform bitwise OR for each uint64
for i := 0; i < maxLen; i++ {
var value uint64
if i < len(bs.bits) {
value |= bs.bits[i]
}
if i < len(other.bits) {
value |= other.bits[i]
}
result.bits[i] = value
}
// Calculate the size
for _, bits := range result.bits {
result.size += countBits(bits)
}
return result
}
// Intersection returns a new bit set containing only elements that exist in both sets
func (bs *BitSet) Intersection(other *BitSet) *BitSet {
// Create a new bit set with the minimum capacity
minLen := len(bs.bits)
if len(other.bits) < minLen {
minLen = len(other.bits)
}
result := &BitSet{
bits: make([]uint64, minLen),
size: 0,
}
// Perform bitwise AND for each uint64
for i := 0; i < minLen; i++ {
result.bits[i] = bs.bits[i] & other.bits[i]
}
// Calculate the size
for _, bits := range result.bits {
result.size += countBits(bits)
}
return result
}
// countBits counts the number of set bits in a uint64
func countBits(n uint64) int {
count := 0
for n != 0 {
count++
n &= n - 1 // Clear the least significant bit set
}
return count
}
// String returns a string representation of the bit set
func (bs *BitSet) String() string {
elements := bs.Elements()
// Convert elements to strings
stringElements := make([]string, len(elements))
for i, element := range elements {
stringElements[i] = fmt.Sprintf("%d", element)
}
return "{" + strings.Join(stringElements, ", ") + "}"
}
func main() {
// Create a bit set with capacity for integers up to 100
bs := NewBitSet(100)
// Add elements
bs.Add(1)
bs.Add(5)
bs.Add(10)
bs.Add(50)
fmt.Println("BitSet:", bs)
fmt.Println("Contains 5:", bs.Contains(5))
fmt.Println("Contains 7:", bs.Contains(7))
// Create another bit set
bs2 := NewBitSet(100)
bs2.Add(5)
bs2.Add(15)
bs2.Add(25)
fmt.Println("BitSet 2:", bs2)
// Perform set operations
union := bs.Union(bs2)
intersection := bs.Intersection(bs2)
fmt.Println("Union:", union)
fmt.Println("Intersection:", intersection)
// Remove an element
bs.Remove(5)
fmt.Println("After removing 5:", bs)
// Clear the set
bs.Clear()
fmt.Println("After clearing:", bs)
}
BitSets are extremely memory-efficient for sets of integers within a limited range. They provide O(1) time complexity for add, remove, and contains operations, and set operations like union and intersection can be performed using fast bitwise operations.
Thread-Safe Sets
If you need to access a set from multiple goroutines concurrently, you should implement thread safety using mutexes or other synchronization primitives.
package main
import (
"fmt"
"sync"
)
// ThreadSafeSet represents a thread-safe set of elements
type ThreadSafeSet struct {
elements map[interface{}]struct{}
mutex sync.RWMutex
}
// NewThreadSafeSet creates a new empty thread-safe set
func NewThreadSafeSet() *ThreadSafeSet {
return &ThreadSafeSet{
elements: make(map[interface{}]struct{}),
}
}
// Add adds an element to the set
func (s *ThreadSafeSet) Add(element interface{}) {
s.mutex.Lock()
defer s.mutex.Unlock()
s.elements[element] = struct{}{}
}
// Remove removes an element from the set
func (s *ThreadSafeSet) Remove(element interface{}) {
s.mutex.Lock()
defer s.mutex.Unlock()
delete(s.elements, element)
}
// Contains checks if an element exists in the set
func (s *ThreadSafeSet) Contains(element interface{}) bool {
s.mutex.RLock()
defer s.mutex.RUnlock()
_, exists := s.elements[element]
return exists
}
// Size returns the number of elements in the set
func (s *ThreadSafeSet) Size() int {
s.mutex.RLock()
defer s.mutex.RUnlock()
return len(s.elements)
}
// Clear removes all elements from the set
func (s *ThreadSafeSet) Clear() {
s.mutex.Lock()
defer s.mutex.Unlock()
s.elements = make(map[interface{}]struct{})
}
// Elements returns a slice of all elements in the set
func (s *ThreadSafeSet) Elements() []interface{} {
s.mutex.RLock()
defer s.mutex.RUnlock()
elements := make([]interface{}, 0, len(s.elements))
for element := range s.elements {
elements = append(elements, element)
}
return elements
}
// String returns a string representation of the set
func (s *ThreadSafeSet) String() string {
s.mutex.RLock()
defer s.mutex.RUnlock()
return fmt.Sprintf("%v", s.elements)
}
func main() {
set := NewThreadSafeSet()
// Concurrent operations
var wg sync.WaitGroup
// Add elements concurrently
for i := 0; i < 10; i++ {
wg.Add(1)
go func(val int) {
defer wg.Done()
set.Add(val)
}(i)
}
wg.Wait()
fmt.Println("Set size:", set.Size())
fmt.Println("Elements:", set.Elements())
// Check elements concurrently
for i := 0; i < 15; i++ {
wg.Add(1)
go func(val int) {
defer wg.Done()
if set.Contains(val) {
fmt.Printf("Set contains %d\n", val)
} else {
fmt.Printf("Set does not contain %d\n", val)
}
}(i)
}
wg.Wait()
}
This thread-safe set implementation uses a read-write mutex to allow concurrent reads while ensuring exclusive access during writes. This approach is suitable for scenarios where the set is accessed by multiple goroutines.
Performance Analysis
Let's analyze the time complexity of common set operations for different implementations:
| Operation | Map-Based Set | Slice-Based Set (Unsorted) | Slice-Based Set (Sorted) | BitSet |
|---|---|---|---|---|
| Add | O(1) average | O(n) (to check for duplicates) | O(log n) for search + O(n) for insertion | O(1) |
| Remove | O(1) average | O(n) | O(log n) for search + O(n) for removal | O(1) |
| Contains | O(1) average | O(n) | O(log n) | O(1) |
| Union | O(n + m) | O(n * m) | O(n + m) | O(k) where k is the number of words |
| Intersection | O(min(n, m)) | O(n * m) | O(n + m) | O(k) where k is the number of words |
| Space Complexity | O(n) | O(n) | O(n) | O(m) where m is the maximum element value |
Based on this analysis:
- Map-based sets offer the best overall performance for general-purpose use.
- BitSets are extremely efficient for sets of integers within a limited range.
- Slice-based sets can be useful for small sets or when order matters, but they don't scale well.
Use Cases
Sets are useful in a variety of scenarios:
- Removing Duplicates: Converting a list to a set and back to remove duplicate elements.
- Membership Testing: Efficiently checking if an element exists in a collection.
- Mathematical Set Operations: Performing union, intersection, and difference operations.
- Graph Algorithms: Tracking visited nodes in graph traversal algorithms.
- Caching: Implementing simple caches or lookup tables.
- Spell Checking: Storing dictionaries for efficient word lookup.
- Unique Counting: Counting unique elements in a stream of data.
Best Practices
Here are some best practices for working with sets in Go:
Choose the Right Implementation
- Use map-based sets for general-purpose use with good performance.
- Use BitSets for sets of integers within a limited range.
- Use slice-based sets only for small sets or when order matters.
Consider Type Safety
When possible, create type-specific sets (e.g., IntSet, StringSet) instead of using interface{} for better type safety and performance.
Implement Thread Safety When Needed
If a set will be accessed by multiple goroutines concurrently, implement proper synchronization using mutexes or other primitives.
Use Existing Libraries
Consider using established libraries like Gods (Go Data Structures) or golang-set instead of implementing sets from scratch.
Optimize Set Operations
When performing set operations like intersection, iterate over the smaller set to minimize the number of lookups.
Conclusion
Sets are versatile data structures that provide efficient solutions for managing unique collections of elements. In Go, you can implement sets using maps, slices, or bit arrays, each with its own advantages and trade-offs. By understanding these implementations and their performance characteristics, you can choose the right approach for your specific needs.
In the next section, we'll explore hash tables, which are closely related to sets but provide key-value storage rather than just element membership.