Convert Figma logo to code with AI

seiflotfy logocuckoofilter

Cuckoo Filter: Practically Better Than Bloom

1,108
107
1,108
14

Top Related Projects

Probabilistic data structures for processing continuous, unbounded streams.

2,394

Go package implementing Bloom filters, used by Milvus and Beego.

Quick Overview

The seiflotfy/cuckoofilter repository is a Go implementation of Cuckoo Filters. Cuckoo filters are a probabilistic data structure used for approximate set membership queries, similar to Bloom filters but with better performance and support for deletion.

Pros

  • Efficient space usage compared to Bloom filters
  • Supports item deletion, unlike standard Bloom filters
  • High performance for lookups and insertions
  • Implements configurable false positive rates

Cons

  • May have slightly higher false positive rates than Bloom filters in some cases
  • Requires more complex implementation compared to simpler probabilistic data structures
  • Limited documentation and examples in the repository
  • Not actively maintained (last commit was in 2019)

Code Examples

  1. Creating a new Cuckoo Filter:
import "github.com/seiflotfy/cuckoofilter"

cf := cuckoofilter.NewFilter(1000000) // Create a filter with capacity for 1 million items
  1. Inserting and looking up items:
item := []byte("example-item")
cf.Insert(item)
isContained := cf.Lookup(item) // Returns true
notContained := cf.Lookup([]byte("non-existent-item")) // Returns false
  1. Deleting items from the filter:
item := []byte("example-item")
cf.Insert(item)
cf.Delete(item)
isContained := cf.Lookup(item) // Returns false after deletion
  1. Counting items and getting filter information:
count := cf.Count() // Get the number of items in the filter
info := cf.Info() // Get information about the filter (capacity, size, etc.)

Getting Started

To use the Cuckoo Filter in your Go project, follow these steps:

  1. Install the package:

    go get github.com/seiflotfy/cuckoofilter
    
  2. Import the package in your Go code:

    import "github.com/seiflotfy/cuckoofilter"
    
  3. Create a new filter and start using it:

    cf := cuckoofilter.NewFilter(1000000) // Create a filter with capacity for 1 million items
    cf.Insert([]byte("example-item"))
    isContained := cf.Lookup([]byte("example-item"))
    

Remember to handle errors and adjust the filter size based on your specific use case and desired false positive rate.

Competitor Comparisons

Probabilistic data structures for processing continuous, unbounded streams.

Pros of BoomFilters

  • Offers a wider variety of probabilistic data structures, including Bloom filters, Cuckoo filters, Count-Min Sketch, and more
  • Provides a unified interface for different filter types, making it easier to switch between implementations
  • Includes benchmarking tools for performance comparison

Cons of BoomFilters

  • May have slightly higher memory overhead due to the abstraction layer
  • Less focused on a single implementation, potentially sacrificing some optimization for flexibility
  • Written in Go, which might not be suitable for all projects (Cuckoofilter is in C)

Code Comparison

BoomFilters:

filter := boom.NewCuckooFilter(1000, 0.01)
filter.Add([]byte("item"))
exists := filter.Test([]byte("item"))

Cuckoofilter:

struct cuckoo_filter *filter = cuckoo_filter_new(1000);
cuckoo_filter_add(filter, "item", strlen("item"));
bool exists = cuckoo_filter_contains(filter, "item", strlen("item"));

Both implementations offer similar functionality, but BoomFilters provides a more Go-idiomatic interface, while Cuckoofilter offers a C-style API.

2,394

Go package implementing Bloom filters, used by Milvus and Beego.

Pros of bloom

  • More comprehensive and feature-rich, offering various types of Bloom filters and related data structures
  • Better documentation and examples, making it easier for developers to understand and implement
  • Actively maintained with regular updates and improvements

Cons of bloom

  • Potentially higher memory usage for simple use cases
  • May have a steeper learning curve due to more options and configurations
  • Slightly slower insertion and lookup times for basic Bloom filter operations

Code Comparison

bloom:

filter := bloom.NewWithEstimates(1000000, 0.01)
filter.Add([]byte("test"))
if filter.Test([]byte("test")) {
    fmt.Println("Element may be in set")
}

cuckoofilter:

cf := cuckoo.NewFilter(1000000)
cf.Insert([]byte("test"))
if cf.Lookup([]byte("test")) {
    fmt.Println("Element may be in set")
}

Both libraries provide similar basic functionality, but bloom offers more advanced features and customization options, while cuckoofilter focuses on a simpler, more specific implementation.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Cuckoo Filter

GoDoc CodeHunt.io

Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.

Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).

For details about the algorithm and citations please use this article for now

"Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky

Implementation details

The paper cited above leaves several parameters to choose. In this implementation

  1. Every element has 2 possible bucket indices
  2. Buckets have a static size of 4 fingerprints
  3. Fingerprints have a static size of 8 bits

1 and 2 are suggested to be the optimum by the authors. The choice of 3 comes down to the desired false positive rate. Given a target false positive rate of r and a bucket size b, they suggest choosing the fingerprint size f using

f >= log2(2b/r) bits

With the 8 bit fingerprint size in this repository, you can expect r ~= 0.03. Other implementations use 16 bit, which correspond to a false positive rate of r ~= 0.0001.

Example usage:

package main

import "fmt"
import cuckoo "github.com/seiflotfy/cuckoofilter"

func main() {
  cf := cuckoo.NewFilter(1000)
  cf.InsertUnique([]byte("geeky ogre"))

  // Lookup a string (and it a miss) if it exists in the cuckoofilter
  cf.Lookup([]byte("hello"))

  count := cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (and it a miss)
  cf.Delete([]byte("hello"))

  count = cf.Count()
  fmt.Println(count) // count == 1

  // Delete a string (a hit)
  cf.Delete([]byte("geeky ogre"))

  count = cf.Count()
  fmt.Println(count) // count == 0
  
  cf.Reset()    // reset
}

Documentation:

"Cuckoo Filter on GoDoc"