Top Related Projects
Probabilistic data structures for processing continuous, unbounded streams.
Go package implementing Bloom filters, used by Milvus and Beego.
Quick Overview
The seiflotfy/cuckoofilter repository is a Go implementation of Cuckoo Filters. Cuckoo filters are a probabilistic data structure used for approximate set membership queries, similar to Bloom filters but with better performance and support for deletion.
Pros
- Efficient space usage compared to Bloom filters
- Supports item deletion, unlike standard Bloom filters
- High performance for lookups and insertions
- Implements configurable false positive rates
Cons
- May have slightly higher false positive rates than Bloom filters in some cases
- Requires more complex implementation compared to simpler probabilistic data structures
- Limited documentation and examples in the repository
- Not actively maintained (last commit was in 2019)
Code Examples
- Creating a new Cuckoo Filter:
import "github.com/seiflotfy/cuckoofilter"
cf := cuckoofilter.NewFilter(1000000) // Create a filter with capacity for 1 million items
- Inserting and looking up items:
item := []byte("example-item")
cf.Insert(item)
isContained := cf.Lookup(item) // Returns true
notContained := cf.Lookup([]byte("non-existent-item")) // Returns false
- Deleting items from the filter:
item := []byte("example-item")
cf.Insert(item)
cf.Delete(item)
isContained := cf.Lookup(item) // Returns false after deletion
- Counting items and getting filter information:
count := cf.Count() // Get the number of items in the filter
info := cf.Info() // Get information about the filter (capacity, size, etc.)
Getting Started
To use the Cuckoo Filter in your Go project, follow these steps:
-
Install the package:
go get github.com/seiflotfy/cuckoofilter
-
Import the package in your Go code:
import "github.com/seiflotfy/cuckoofilter"
-
Create a new filter and start using it:
cf := cuckoofilter.NewFilter(1000000) // Create a filter with capacity for 1 million items cf.Insert([]byte("example-item")) isContained := cf.Lookup([]byte("example-item"))
Remember to handle errors and adjust the filter size based on your specific use case and desired false positive rate.
Competitor Comparisons
Probabilistic data structures for processing continuous, unbounded streams.
Pros of BoomFilters
- Offers a wider variety of probabilistic data structures, including Bloom filters, Cuckoo filters, Count-Min Sketch, and more
- Provides a unified interface for different filter types, making it easier to switch between implementations
- Includes benchmarking tools for performance comparison
Cons of BoomFilters
- May have slightly higher memory overhead due to the abstraction layer
- Less focused on a single implementation, potentially sacrificing some optimization for flexibility
- Written in Go, which might not be suitable for all projects (Cuckoofilter is in C)
Code Comparison
BoomFilters:
filter := boom.NewCuckooFilter(1000, 0.01)
filter.Add([]byte("item"))
exists := filter.Test([]byte("item"))
Cuckoofilter:
struct cuckoo_filter *filter = cuckoo_filter_new(1000);
cuckoo_filter_add(filter, "item", strlen("item"));
bool exists = cuckoo_filter_contains(filter, "item", strlen("item"));
Both implementations offer similar functionality, but BoomFilters provides a more Go-idiomatic interface, while Cuckoofilter offers a C-style API.
Go package implementing Bloom filters, used by Milvus and Beego.
Pros of bloom
- More comprehensive and feature-rich, offering various types of Bloom filters and related data structures
- Better documentation and examples, making it easier for developers to understand and implement
- Actively maintained with regular updates and improvements
Cons of bloom
- Potentially higher memory usage for simple use cases
- May have a steeper learning curve due to more options and configurations
- Slightly slower insertion and lookup times for basic Bloom filter operations
Code Comparison
bloom:
filter := bloom.NewWithEstimates(1000000, 0.01)
filter.Add([]byte("test"))
if filter.Test([]byte("test")) {
fmt.Println("Element may be in set")
}
cuckoofilter:
cf := cuckoo.NewFilter(1000000)
cf.Insert([]byte("test"))
if cf.Lookup([]byte("test")) {
fmt.Println("Element may be in set")
}
Both libraries provide similar basic functionality, but bloom offers more advanced features and customization options, while cuckoofilter focuses on a simpler, more specific implementation.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Cuckoo Filter
Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.
Cuckoo ï¬lters provide the ï¬exibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom ï¬lters, for applications that require low false positive rates (< 3%).
For details about the algorithm and citations please use this article for now
"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky
Implementation details
The paper cited above leaves several parameters to choose. In this implementation
- Every element has 2 possible bucket indices
- Buckets have a static size of 4 fingerprints
- Fingerprints have a static size of 8 bits
1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of r
and a bucket size b
, they suggest choosing the fingerprint size f
using
f >= log2(2b/r) bits
With the 8 bit fingerprint size in this repository, you can expect r ~= 0.03
.
Other implementations use 16 bit, which correspond to a false positive rate of r ~= 0.0001
.
Example usage:
package main
import "fmt"
import cuckoo "github.com/seiflotfy/cuckoofilter"
func main() {
cf := cuckoo.NewFilter(1000)
cf.InsertUnique([]byte("geeky ogre"))
// Lookup a string (and it a miss) if it exists in the cuckoofilter
cf.Lookup([]byte("hello"))
count := cf.Count()
fmt.Println(count) // count == 1
// Delete a string (and it a miss)
cf.Delete([]byte("hello"))
count = cf.Count()
fmt.Println(count) // count == 1
// Delete a string (a hit)
cf.Delete([]byte("geeky ogre"))
count = cf.Count()
fmt.Println(count) // count == 0
cf.Reset() // reset
}
Documentation:
Top Related Projects
Probabilistic data structures for processing continuous, unbounded streams.
Go package implementing Bloom filters, used by Milvus and Beego.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot