Convert Figma logo to code with AI

hypermodeinc logoristretto

A high performance memory-bound Go cache

5,930
387
5,930
7

Top Related Projects

71,020

LLM inference in C/C++

17,879

Inference Llama 2 in one file of pure C

57,265

Inference code for Llama models

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Quick Overview

Ristretto is an open-source project by Hypermode Inc. that provides a high-performance, distributed caching system. It is designed to be scalable, fault-tolerant, and easy to integrate into existing applications, making it suitable for large-scale distributed systems and microservices architectures.

Pros

  • High performance and low latency caching
  • Distributed architecture for scalability and fault tolerance
  • Easy integration with various programming languages and frameworks
  • Support for multiple cache eviction policies

Cons

  • Limited documentation and examples available
  • Relatively new project, which may lead to potential stability issues
  • Smaller community compared to more established caching solutions
  • May require additional configuration for optimal performance in complex environments

Code Examples

// Initialize a new Ristretto cache
cache, err := ristretto.NewCache(&ristretto.Config{
    NumCounters: 1e7,     // number of keys to track frequency of (10M)
    MaxCost:     1 << 30, // maximum cost of cache (1GB)
    BufferItems: 64,      // number of keys per Get buffer
})
if err != nil {
    panic(err)
}
// Set a value in the cache
cache.Set("key", "value", 1)

// Get a value from the cache
value, found := cache.Get("key")
if found {
    fmt.Println(value)
}
// Delete a value from the cache
cache.Del("key")

// Clear the entire cache
cache.Clear()

Getting Started

To use Ristretto in your Go project, follow these steps:

  1. Install Ristretto:

    go get github.com/dgraph-io/ristretto
    
  2. Import Ristretto in your Go code:

    import "github.com/dgraph-io/ristretto"
    
  3. Create a new cache instance:

    cache, err := ristretto.NewCache(&ristretto.Config{
        NumCounters: 1e7,
        MaxCost:     1 << 30,
        BufferItems: 64,
    })
    if err != nil {
        panic(err)
    }
    
  4. Use the cache in your application:

    cache.Set("key", "value", 1)
    value, found := cache.Get("key")
    

Competitor Comparisons

71,020

LLM inference in C/C++

Pros of llama.cpp

  • Highly optimized C++ implementation for efficient inference
  • Supports quantization techniques for reduced memory usage
  • Extensive documentation and active community support

Cons of llama.cpp

  • Limited to LLaMA-based models
  • Requires more technical expertise to set up and use
  • Less focus on model fine-tuning capabilities

Code Comparison

llama.cpp:

int main(int argc, char ** argv) {
    gpt_params params;
    if (gpt_params_parse(argc, argv, params) == false) {
        return 1;
    }
    llama_init_backend();
    // ... (implementation continues)
}

Ristretto:

def main():
    model = load_model("path/to/model")
    tokenizer = load_tokenizer("path/to/tokenizer")
    prompt = "Hello, how are you?"
    response = generate(model, tokenizer, prompt)
    print(response)

Key Differences

  • Language: llama.cpp is written in C++, while Ristretto is primarily Python-based
  • Focus: llama.cpp emphasizes LLaMA model optimization, Ristretto aims for broader model support
  • Ease of use: Ristretto provides a more user-friendly interface for non-technical users
  • Flexibility: Ristretto offers more options for model customization and fine-tuning
  • Performance: llama.cpp may have an edge in raw inference speed due to C++ implementation

Both projects aim to make large language models more accessible, but they cater to different user needs and technical requirements.

17,879

Inference Llama 2 in one file of pure C

Pros of llama2.c

  • Lightweight and efficient implementation in C
  • Focused on running Llama 2 models with minimal dependencies
  • Clear and well-documented code structure

Cons of llama2.c

  • Limited to Llama 2 models only
  • Less feature-rich compared to Ristretto's broader scope
  • May require more manual setup for advanced use cases

Code Comparison

llama2.c:

int main(int argc, char* argv[]) {
    // initialize the model
    Transformer transformer;
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <checkpoint_file>\n", argv[0]);
        return 1;
    }
    int err = load_model(argv[1], &transformer);
    if (err) { return 1; }

Ristretto:

def main():
    parser = argparse.ArgumentParser(description="Ristretto CLI")
    parser.add_argument("--model", type=str, required=True, help="Path to the model file")
    parser.add_argument("--prompt", type=str, required=True, help="Input prompt for generation")
    args = parser.parse_args()

    model = load_model(args.model)

Summary

llama2.c is a specialized, lightweight implementation for running Llama 2 models, offering efficiency and simplicity. Ristretto, on the other hand, provides a more comprehensive framework with support for various models and additional features. The choice between them depends on specific use cases and requirements.

57,265

Inference code for Llama models

Pros of Llama

  • Developed by Meta, benefiting from extensive resources and research
  • Supports multiple languages and has a large, active community
  • Offers pre-trained models with impressive performance on various NLP tasks

Cons of Llama

  • Requires significant computational resources for training and inference
  • Limited customization options for specific use cases
  • Stricter licensing terms compared to more open-source alternatives

Code Comparison

Ristretto (Python):

from ristretto import Ristretto

model = Ristretto.from_pretrained("hypermodeinc/ristretto-1.3b")
output = model.generate("Hello, how are you?")
print(output)

Llama (Python):

from llama import Llama

model = Llama.load("path/to/model")
output = model.generate("Hello, how are you?")
print(output)

Both repositories provide language models, but they differ in scale, focus, and implementation. Ristretto aims for efficiency and ease of use, while Llama offers more powerful models with broader capabilities. The code snippets demonstrate similar usage patterns, but Ristretto's API appears more streamlined for quick deployment.

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • More comprehensive optimization toolkit for deep learning
  • Supports a wider range of model architectures and training scenarios
  • Offers advanced features like ZeRO optimizer and pipeline parallelism

Cons of DeepSpeed

  • Steeper learning curve due to its extensive feature set
  • May introduce more complexity for simpler projects
  • Requires more configuration and setup compared to Ristretto

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

Ristretto:

from ristretto import Ristretto
ristretto = Ristretto(model)
ristretto.quantize()

Summary

DeepSpeed offers a more comprehensive suite of optimization techniques for deep learning, supporting a wider range of models and scenarios. However, it comes with a steeper learning curve and increased complexity. Ristretto, on the other hand, focuses primarily on quantization and provides a simpler interface for this specific task. The choice between the two depends on the project's requirements, with DeepSpeed being more suitable for large-scale, complex deep learning projects, while Ristretto may be preferable for projects specifically focused on model quantization.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of Transformers

  • Extensive library with support for a wide range of pre-trained models and architectures
  • Large and active community, frequent updates, and comprehensive documentation
  • Seamless integration with popular deep learning frameworks like PyTorch and TensorFlow

Cons of Transformers

  • Can be resource-intensive, especially for large models
  • Learning curve may be steeper for beginners due to the extensive feature set
  • May include unnecessary components for projects focused solely on specific tasks

Code Comparison

Transformers:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello world!", return_tensors="pt")
outputs = model(**inputs)

Ristretto:

from ristretto import Ristretto

model = Ristretto.from_pretrained("bert-base-uncased")
outputs = model.generate("Hello world!")

Ristretto appears to offer a more streamlined API for specific use cases, while Transformers provides a more comprehensive and flexible approach to working with various models and tasks.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Ristretto

GitHub License chat GitHub Repo stars GitHub commit activity Go Report Card

Ristretto is a fast, concurrent cache library built with a focus on performance and correctness.

The motivation to build Ristretto comes from the need for a contention-free cache in Dgraph.

Features

  • High Hit Ratios - with our unique admission/eviction policy pairing, Ristretto's performance is best in class.
    • Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
    • Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
  • Fast Throughput - we use a variety of techniques for managing contention and the result is excellent throughput.
  • Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
  • Fully Concurrent - you can use as many goroutines as you want with little throughput degradation.
  • Metrics - optional performance metrics for throughput, hit ratios, and other stats.
  • Simple API - just figure out your ideal Config values and you're off and running.

Status

Ristretto is production-ready. See Projects using Ristretto.

Getting Started

Installing

To start using Ristretto, install Go 1.21 or above. Ristretto needs go modules. From your project, run the following command

go get github.com/dgraph-io/ristretto/v2

This will retrieve the library.

Choosing a version

Following these rules:

  • v1.x.x is the first version used in most programs with Ristretto dependencies.
  • v2.x.x is the new version with support for generics, for which it has a slightly different interface. This version is designed to solve compatibility problems of programs using the old version of Ristretto. If you start writing a new program, it is recommended to use this version.

Usage

package main

import (
  "fmt"

  "github.com/dgraph-io/ristretto/v2"
)

func main() {
  cache, err := ristretto.NewCache(&ristretto.Config[string, string]{
    NumCounters: 1e7,     // number of keys to track frequency of (10M).
    MaxCost:     1 << 30, // maximum cost of cache (1GB).
    BufferItems: 64,      // number of keys per Get buffer.
  })
  if err != nil {
    panic(err)
  }
  defer cache.Close()

  // set a value with a cost of 1
  cache.Set("key", "value", 1)

  // wait for value to pass through buffers
  cache.Wait()

  // get value from cache
  value, found := cache.Get("key")
  if !found {
    panic("missing value")
  }
  fmt.Println(value)

  // del value from cache
  cache.Del("key")
}

Benchmarks

The benchmarks can be found in https://github.com/hypermodeinc/dgraph-benchmarks/tree/main/cachebench/ristretto.

Hit Ratios for Search

This trace is described as "disk read accesses initiated by a large commercial search engine in response to various web search requests."

Graph showing hit ratios comparison for search workload

Hit Ratio for Database

This trace is described as "a database server running at a commercial site running an ERP application on top of a commercial database."

Graph showing hit ratios comparison for database workload

Hit Ratio for Looping

This trace demonstrates a looping access pattern.

Graph showing hit ratios comparison for looping access pattern

Hit Ratio for CODASYL

This trace is described as "references to a CODASYL database for a one hour period."

Graph showing hit ratios comparison for CODASYL workload

Throughput for Mixed Workload

Graph showing throughput comparison for mixed workload

Throughput ffor Read Workload

Graph showing throughput comparison for read workload

Through for Write Workload

Graph showing throughput comparison for write workload

Projects Using Ristretto

Below is a list of known projects that use Ristretto:

  • Badger - Embeddable key-value DB in Go
  • Dgraph - Horizontally scalable and distributed GraphQL database with a graph backend

FAQ

How are you achieving this performance? What shortcuts are you taking?

We go into detail in the Ristretto blog post, but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent admission policy and SampledLFU eviction policy.

As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache.

Is Ristretto distributed?

No, it's just like any other Go library that you can import into your project and use in a single process.