ristretto

A high performance memory-bound Go cache

5,930

387

5,930

View on GitHub

Top Related Projects

llama2.c

18,317

Inference Llama 2 in one file of pure C

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Quick Overview

Ristretto is an open-source project by Hypermode Inc. that provides a high-performance, distributed caching system. It is designed to be scalable, fault-tolerant, and easy to integrate into existing applications, making it suitable for large-scale distributed systems and microservices architectures.

Pros

High performance and low latency caching
Distributed architecture for scalability and fault tolerance
Easy integration with various programming languages and frameworks
Support for multiple cache eviction policies

Cons

Limited documentation and examples available
Relatively new project, which may lead to potential stability issues
Smaller community compared to more established caching solutions
May require additional configuration for optimal performance in complex environments

Code Examples

// Initialize a new Ristretto cache
cache, err := ristretto.NewCache(&ristretto.Config{
    NumCounters: 1e7,     // number of keys to track frequency of (10M)
    MaxCost:     1 << 30, // maximum cost of cache (1GB)
    BufferItems: 64,      // number of keys per Get buffer
})
if err != nil {
    panic(err)
}

// Set a value in the cache
cache.Set("key", "value", 1)

// Get a value from the cache
value, found := cache.Get("key")
if found {
    fmt.Println(value)
}

// Delete a value from the cache
cache.Del("key")

// Clear the entire cache
cache.Clear()

Getting Started

To use Ristretto in your Go project, follow these steps:

Install Ristretto:
```
go get github.com/dgraph-io/ristretto
```

Import Ristretto in your Go code:

import "github.com/dgraph-io/ristretto"

Create a new cache instance:

cache, err := ristretto.NewCache(&ristretto.Config{
    NumCounters: 1e7,
    MaxCost:     1 << 30,
    BufferItems: 64,
})
if err != nil {
    panic(err)
}

Use the cache in your application:

cache.Set("key", "value", 1)
value, found := cache.Get("key")

Competitor Comparisons

llama.cpp

78,890

LLM inference in C/C++

Pros of llama.cpp

Highly optimized for running LLMs on consumer hardware
Supports a wide range of quantization methods
Active community and frequent updates

Cons of llama.cpp

Limited to specific model architectures (primarily LLaMA-based)
Requires more manual setup and configuration
Less focus on serving models in production environments

Code Comparison

llama.cpp:

int main(int argc, char ** argv) {
    gpt_params params;
    if (!gpt_params_parse(argc, argv, params)) {
        return 1;
    }
    llama_init_backend();
    // ... (model loading and inference code)
}

Ristretto:

from ristretto import Server

server = Server()
server.load_model("gpt2")
server.start()

# In another file:
from ristretto import Client
client = Client()
response = client.generate("Hello, world!")

Key Differences

llama.cpp is a C++ library focused on efficient LLM inference, while Ristretto is a Python-based server for deploying and serving language models.
llama.cpp provides lower-level control and optimization, whereas Ristretto offers a higher-level API for easier deployment and integration.
Ristretto emphasizes scalability and production readiness, while llama.cpp prioritizes performance on consumer hardware.

Both projects aim to make large language models more accessible, but they approach this goal from different angles and with different primary use cases in mind.

llama2.c

18,317

Inference Llama 2 in one file of pure C

Pros of llama2.c

Lightweight and efficient implementation in C
Focused on running Llama 2 models with minimal dependencies
Clear and well-documented code structure

Cons of llama2.c

Limited to Llama 2 models only
Less feature-rich compared to Ristretto's broader scope
May require more manual setup for advanced use cases

Code Comparison

llama2.c:

int main(int argc, char* argv[]) {
    // initialize the model
    Transformer transformer;
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <checkpoint_file>\n", argv[0]);
        return 1;
    }
    int err = load_model(argv[1], &transformer);
    if (err) { return 1; }

Ristretto:

def main():
    parser = argparse.ArgumentParser(description="Ristretto CLI")
    parser.add_argument("--model", type=str, required=True, help="Path to the model file")
    parser.add_argument("--prompt", type=str, required=True, help="Input prompt for generation")
    args = parser.parse_args()

    model = load_model(args.model)

Summary

llama2.c is a specialized, lightweight implementation for running Llama 2 models, offering efficiency and simplicity. Ristretto, on the other hand, provides a more comprehensive framework with support for various models and additional features. The choice between them depends on specific use cases and requirements.

llama

58,164

Inference code for Llama models

Pros of Llama

Developed by Meta, benefiting from extensive resources and research
Supports multiple languages and has a large, active community
Offers pre-trained models with impressive performance on various NLP tasks

Cons of Llama

Requires significant computational resources for training and inference
Limited customization options for specific use cases
Stricter licensing terms compared to more open-source alternatives

Code Comparison

Ristretto (Python):

from ristretto import Ristretto

model = Ristretto.from_pretrained("hypermodeinc/ristretto-1.3b")
output = model.generate("Hello, how are you?")
print(output)

Llama (Python):

from llama import Llama

model = Llama.load("path/to/model")
output = model.generate("Hello, how are you?")
print(output)

Both repositories provide language models, but they differ in scale, focus, and implementation. Ristretto aims for efficiency and ease of use, while Llama offers more powerful models with broader capabilities. The code snippets demonstrate similar usage patterns, but Ristretto's API appears more streamlined for quick deployment.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

More comprehensive optimization toolkit for deep learning
Supports a wider range of model architectures and training scenarios
Offers advanced features like ZeRO optimizer and pipeline parallelism

Cons of DeepSpeed

Steeper learning curve due to its extensive feature set
May introduce more complexity for simpler projects
Requires more configuration and setup compared to Ristretto

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

Ristretto:

from ristretto import Ristretto
ristretto = Ristretto(model)
ristretto.quantize()

Summary

DeepSpeed offers a more comprehensive suite of optimization techniques for deep learning, supporting a wider range of models and scenarios. However, it comes with a steeper learning curve and increased complexity. Ristretto, on the other hand, focuses primarily on quantization and provides a simpler interface for this specific task. The choice between the two depends on the project's requirements, with DeepSpeed being more suitable for large-scale, complex deep learning projects, while Ristretto may be preferable for projects specifically focused on model quantization.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Extensive library with support for a wide range of pre-trained models and architectures
Large and active community, frequent updates, and comprehensive documentation
Seamless integration with popular deep learning frameworks like PyTorch and TensorFlow

Cons of Transformers

Can be resource-intensive, especially for large models
Learning curve may be steeper for beginners due to the extensive feature set
May include unnecessary components for projects focused solely on specific tasks

Code Comparison

Transformers:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model(**inputs)

Ristretto:

from ristretto import Ristretto

model = Ristretto.from_pretrained("bert-base-uncased")
outputs = model.generate("Hello world!")

Ristretto appears to offer a more streamlined API for specific use cases, while Transformers provides a more comprehensive and flexible approach to working with various models and tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Ristretto

Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.

The motivation to build Ristretto comes from the need for a contention-free cache in Dgraph.

Features

High Hit Ratios - with our unique admission/eviction policy pairing, Ristretto's performance is best in class.
- Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
- Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
Fast Throughput - we use a variety of techniques for managing contention and the result is excellent throughput.
Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
Fully Concurrent - you can use as many goroutines as you want with little throughput degradation.
Metrics - optional performance metrics for throughput, hit ratios, and other stats.
Simple API - just figure out your ideal Config values and you're off and running.

Status

Ristretto is production-ready. See Projects using Ristretto.

Getting Started

Installing

To start using Ristretto, install Go 1.21 or above. Ristretto needs go modules. From your project, run the following command

go get github.com/dgraph-io/ristretto/v2

This will retrieve the library.

Choosing a version

Following these rules:

v1.x.x is the first version used in most programs with Ristretto dependencies.
v2.x.x is the new version with support for generics, for which it has a slightly different interface. This version is designed to solve compatibility problems of programs using the old version of Ristretto. If you start writing a new program, it is recommended to use this version.

Usage

package main

import (
  "fmt"

  "github.com/dgraph-io/ristretto/v2"
)

func main() {
  cache, err := ristretto.NewCache(&ristretto.Config[string, string]{
    NumCounters: 1e7,     // number of keys to track frequency of (10M).
    MaxCost:     1 << 30, // maximum cost of cache (1GB).
    BufferItems: 64,      // number of keys per Get buffer.
  })
  if err != nil {
    panic(err)
  }
  defer cache.Close()

  // set a value with a cost of 1
  cache.Set("key", "value", 1)

  // wait for value to pass through buffers
  cache.Wait()

  // get value from cache
  value, found := cache.Get("key")
  if !found {
    panic("missing value")
  }
  fmt.Println(value)

  // del value from cache
  cache.Del("key")
}

Benchmarks

The benchmarks can be found in https://github.com/hypermodeinc/dgraph-benchmarks/tree/main/cachebench/ristretto.

Hit Ratios for Search

This trace is described as "disk read accesses initiated by a large commercial search engine in response to various web search requests."

Graph showing hit ratios comparison for search workload

Hit Ratio for Database

This trace is described as "a database server running at a commercial site running an ERP application on top of a commercial database."

Graph showing hit ratios comparison for database workload

Hit Ratio for Looping

This trace demonstrates a looping access pattern.

Graph showing hit ratios comparison for looping access pattern

Hit Ratio for CODASYL

This trace is described as "references to a CODASYL database for a one hour period."

Graph showing hit ratios comparison for CODASYL workload

Throughput for Mixed Workload

Graph showing throughput comparison for mixed workload

Throughput ffor Read Workload

Graph showing throughput comparison for read workload

Through for Write Workload

Graph showing throughput comparison for write workload

Projects Using Ristretto

Below is a list of known projects that use Ristretto:

Badger - Embeddable key-value DB in Go
Dgraph - Horizontally scalable and distributed GraphQL database with a graph backend

FAQ

How are you achieving this performance? What shortcuts are you taking?

We go into detail in the Ristretto blog post, but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent admission policy and SampledLFU eviction policy.

As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache.

Is Ristretto distributed?

No, it's just like any other Go library that you can import into your project and use in a single process.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot