cuckoofilter

Cuckoo Filter: Practically Better Than Bloom

1,189

117

1,189

View on GitHub

Top Related Projects

BoomFilters

1,622

Probabilistic data structures for processing continuous, unbounded streams.

bloom

2,669

Go package implementing Bloom filters, used by Milvus and Beego.

Quick Overview

The seiflotfy/cuckoofilter repository is a Go implementation of Cuckoo Filters. Cuckoo filters are a probabilistic data structure used for approximate set membership queries, similar to Bloom filters but with better performance and support for deletion.

Pros

Efficient space usage compared to Bloom filters
Supports item deletion, unlike standard Bloom filters
High performance for lookups and insertions
Implements configurable false positive rates

Cons

May have slightly higher false positive rates than Bloom filters in some cases
Requires more complex implementation compared to simpler probabilistic data structures
Limited documentation and examples in the repository
Not actively maintained (last commit was in 2019)

Code Examples

Creating a new Cuckoo Filter:

import "github.com/seiflotfy/cuckoofilter"

cf := cuckoofilter.NewFilter(1000000) // Create a filter with capacity for 1 million items

Inserting and looking up items:

item := []byte("example-item")
cf.Insert(item)
isContained := cf.Lookup(item) // Returns true
notContained := cf.Lookup([]byte("non-existent-item")) // Returns false

Deleting items from the filter:

item := []byte("example-item")
cf.Insert(item)
cf.Delete(item)
isContained := cf.Lookup(item) // Returns false after deletion

Counting items and getting filter information:

count := cf.Count() // Get the number of items in the filter
info := cf.Info() // Get information about the filter (capacity, size, etc.)

Getting Started

To use the Cuckoo Filter in your Go project, follow these steps:

Install the package:

go get github.com/seiflotfy/cuckoofilter

Import the package in your Go code:

import "github.com/seiflotfy/cuckoofilter"

Create a new filter and start using it:

cf := cuckoofilter.NewFilter(1000000) // Create a filter with capacity for 1 million items
cf.Insert([]byte("example-item"))
isContained := cf.Lookup([]byte("example-item"))

Remember to handle errors and adjust the filter size based on your specific use case and desired false positive rate.

Competitor Comparisons

BoomFilters

1,622

Probabilistic data structures for processing continuous, unbounded streams.

Pros of BoomFilters

Offers a wider variety of probabilistic data structures, including Bloom filters, Cuckoo filters, Count-Min Sketch, and more
Provides a unified interface for different filter types, making it easier to switch between implementations
Includes benchmarking tools for performance comparison

Cons of BoomFilters

May have slightly higher memory overhead due to the abstraction layer
Less focused on a single implementation, potentially sacrificing some optimization for flexibility
Written in Go, which might not be suitable for all projects (Cuckoofilter is in C)

Code Comparison

BoomFilters:

filter := boom.NewCuckooFilter(1000, 0.01)
filter.Add([]byte("item"))
exists := filter.Test([]byte("item"))

Cuckoofilter:

struct cuckoo_filter *filter = cuckoo_filter_new(1000);
cuckoo_filter_add(filter, "item", strlen("item"));
bool exists = cuckoo_filter_contains(filter, "item", strlen("item"));

Both implementations offer similar functionality, but BoomFilters provides a more Go-idiomatic interface, while Cuckoofilter offers a C-style API.

bloom

2,669

Go package implementing Bloom filters, used by Milvus and Beego.

Pros of bloom

More comprehensive and feature-rich, offering various types of Bloom filters and related data structures
Better documentation and examples, making it easier for developers to understand and implement
Actively maintained with regular updates and improvements

Cons of bloom

Potentially higher memory usage for simple use cases
May have a steeper learning curve due to more options and configurations
Slightly slower insertion and lookup times for basic Bloom filter operations

Code Comparison

bloom:

filter := bloom.NewWithEstimates(1000000, 0.01)
filter.Add([]byte("test"))
if filter.Test([]byte("test")) {
    fmt.Println("Element may be in set")
}

cuckoofilter:

cf := cuckoo.NewFilter(1000000)
cf.Insert([]byte("test"))
if cf.Lookup([]byte("test")) {
    fmt.Println("Element may be in set")
}

Both libraries provide similar basic functionality, but bloom offers more advanced features and customization options, while cuckoofilter focuses on a simpler, more specific implementation.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Cuckoo Filter

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo ï¬lters provide the ï¬exibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom ï¬lters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use this article for now

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky

Implementation details

The paper cited above leaves several parameters to choose. In this implementation

Every element has 2 possible bucket indices
Buckets have a static size of 4 fingerprints
Fingerprints have a static size of 8 bits

1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of r and a bucket size b, they suggest choosing the fingerprint size f using

f >= log2(2b/r) bits

With the 8 bit fingerprint size in this repository, you can expect r ~= 0.03. Other implementations use 16 bit, which correspond to a false positive rate of r ~= 0.0001.

Example usage:

package main

import "fmt"
import cuckoo "github.com/seiflotfy/cuckoofilter"

func main() {
  cf := cuckoo.NewFilter(1000)
  cf.InsertUnique([]byte("geeky ogre"))

  // Lookup a string (and it a miss) if it exists in the cuckoofilter
  cf.Lookup([]byte("hello"))

  count := cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (and it a miss)
  cf.Delete([]byte("hello"))

  count = cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (a hit)
  cf.Delete([]byte("geeky ogre"))

  count = cf.Count()
  fmt.Println(count) // count == 0
  
  cf.Reset()    // reset
}

Documentation:

"Cuckoo Filter on GoDoc"

Top Related Projects

BoomFilters

1,622

Probabilistic data structures for processing continuous, unbounded streams.

bloom

2,669

Go package implementing Bloom filters, used by Milvus and Beego.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot