chroma

the AI-native open-source embedding database

21,316

1,687

21,316

424

View on GitHub View on NPM

Top Related Projects

qdrant

24,940

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

milvus

36,178

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

elasticsearch-py

4,321

Official Python client for Elasticsearch

Quick Overview

Chroma is an open-source embedding database designed for building AI applications with embeddings. It allows developers to store, search, and analyze vector embeddings efficiently, making it easier to create semantic search, recommendation systems, and other AI-powered features.

Pros

Easy to use and integrate with existing AI/ML workflows
Supports various embedding models and distance functions
Offers both local and cloud-hosted options for flexibility
Provides a simple API for querying and managing embeddings

Cons

May have performance limitations for extremely large datasets
Documentation could be more comprehensive for advanced use cases
Limited built-in analytics and visualization tools
Relatively new project, so the ecosystem is still developing

Code Examples

Creating a collection and adding documents:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")

collection.add(
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)

Querying the collection:

results = collection.query(
    query_texts=["This is a query document"],
    n_results=2
)
print(results)

Updating and deleting documents:

collection.update(
    ids=["id1"],
    documents=["This is an updated document"],
    metadatas=[{"source": "updated_source"}]
)

collection.delete(ids=["id2"])

Getting Started

To get started with Chroma, follow these steps:

Install Chroma:

pip install chromadb

Create a simple script:

import chromadb

client = chromadb.Client()
collection = client.create_collection("quickstart")

collection.add(
    documents=["Hello world", "Goodbye world"],
    metadatas=[{"source": "greeting"}, {"source": "farewell"}],
    ids=["1", "2"]
)

results = collection.query(
    query_texts=["hello"],
    n_results=1
)

print(results)

Run the script and explore the results. You can now start building more complex applications using Chroma's embedding database capabilities.

Competitor Comparisons

qdrant

24,940

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Pros of Qdrant

Written in Rust, offering high performance and memory safety
Supports complex vector search queries with filtering
Provides a distributed architecture for scalability

Cons of Qdrant

Steeper learning curve due to more advanced features
Requires more system resources for optimal performance

Code Comparison

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(
    documents=["This is a document", "This is another document"],
    ids=["id1", "id2"]
)

Qdrant:

from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)
client.recreate_collection(
    collection_name="my_collection",
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE)
)
client.upsert(
    collection_name="my_collection",
    points=[
        models.PointStruct(id=1, vector=[0.05, 0.61, 0.76], payload={"color": "red"}),
        models.PointStruct(id=2, vector=[0.19, 0.81, 0.75], payload={"color": "blue"}),
    ]
)

milvus

36,178

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

Pros of Milvus

Highly scalable and distributed architecture for large-scale vector search
Supports multiple index types and similarity metrics for diverse use cases
Offers advanced features like data management and real-time search capabilities

Cons of Milvus

More complex setup and configuration compared to Chroma
Steeper learning curve due to its extensive feature set
Requires more system resources for optimal performance

Code Comparison

Milvus (Python client):

from pymilvus import Collection, connections

connections.connect()
collection = Collection("example_collection")
results = collection.search(
    data=[vector],
    anns_field="embedding",
    param={"metric_type": "L2", "params": {"nprobe": 10}},
    limit=5
)

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("example_collection")
results = collection.query(
    query_embeddings=[vector],
    n_results=5
)

Both libraries offer similar functionality for vector search, but Milvus provides more advanced configuration options, while Chroma focuses on simplicity and ease of use. Milvus is better suited for large-scale, production deployments, whereas Chroma is ideal for quick prototyping and smaller-scale applications.

weaviate

14,031

Pros of Weaviate

More mature and feature-rich, with a wider range of functionalities
Better scalability for large-scale production environments
Supports multiple vector index types (HNSW, LSH, FLAT)

Cons of Weaviate

Steeper learning curve due to more complex architecture
Requires more resources to run and maintain
Less straightforward setup compared to Chroma

Code Comparison

Weaviate (Python client):

import weaviate
client = weaviate.Client("http://localhost:8080")
client.schema.create_class({
    "class": "Article",
    "vectorizer": "text2vec-transformers"
})

Chroma:

import chromadb
client = chromadb.Client()
collection = client.create_collection("articles")
collection.add(
    documents=["content1", "content2"],
    metadatas=[{"source": "wiki"}, {"source": "book"}],
    ids=["id1", "id2"]
)

Both Weaviate and Chroma are vector databases, but they differ in complexity and use cases. Weaviate offers more advanced features and scalability, making it suitable for large-scale production environments. Chroma, on the other hand, provides a simpler interface and easier setup, which can be advantageous for smaller projects or quick prototyping. The code comparison shows that Weaviate requires more configuration, while Chroma offers a more straightforward API for basic operations.

elasticsearch-py

4,321

Official Python client for Elasticsearch

Pros of elasticsearch-py

Mature and widely adopted Elasticsearch client for Python
Comprehensive API coverage for Elasticsearch operations
Extensive documentation and community support

Cons of elasticsearch-py

Focused solely on Elasticsearch, lacking vector search capabilities
Steeper learning curve for users new to Elasticsearch

Code Comparison

elasticsearch-py:

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")
doc = {"title": "Test Document", "content": "This is a test"}
es.index(index="my_index", body=doc)

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(
    documents=["This is a test"],
    metadatas=[{"title": "Test Document"}],
    ids=["1"]
)

Key Differences

Chroma focuses on vector databases and similarity search, while elasticsearch-py is for general-purpose document indexing and search
Chroma offers a simpler API for vector operations, making it easier for machine learning tasks
elasticsearch-py provides more advanced querying capabilities and supports complex aggregations

Use Cases

elasticsearch-py: Full-text search, log analysis, and complex data aggregations
Chroma: Similarity search, recommendation systems, and AI-powered applications

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Chroma - the open-source embedding database.
The fastest way to build Python or JavaScript LLM apps with memory!

| | Docs | Homepage

pip install chromadb # python client
# for javascript, npm install chromadb!
# for client-server mode, chroma run --path /chroma_db_path

The core API is only 4 functions (run our ð¡ Google Colab or Replit template):

import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
    documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
    ids=["doc1", "doc2"], # unique for each doc
)

# Query/search 2 most similar results. You can also .get by id
results = collection.query(
    query_texts=["This is a query document"],
    n_results=2,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    # where_document={"$contains":"search_string"}  # optional filter
)

Features

Simple: Fully-typed, fully-tested, fully-documented == happiness
Integrations: ð¦ï¸ð LangChain (python and js), ð¦ LlamaIndex and more soon
Dev, Test, Prod: the same API that runs in your python notebook, scales to your cluster
Feature-rich: Queries, filtering, density estimation and more
Free & Open Source: Apache 2.0 Licensed

Use case: ChatGPT for ______

For example, the "Chat your data" use case:

Add documents to your database. You can pass in your own embeddings, embedding function, or let Chroma embed them for you.
Query relevant documents with natural language.
Compose documents into the context window of an LLM like GPT3 for additional summarization or analysis.

Embeddings?

What are embeddings?

Read the guide from OpenAI
Literal: Embedding something turns it from image/text/audio into a list of numbers. ð¼ï¸ or ð => [1.2, 2.1, ....]. This process makes documents "understandable" to a machine learning model.
By analogy: An embedding represents the essence of a document. This enables documents and queries with the same essence to be "near" each other and therefore easy to find.
Technical: An embedding is the latent-space position of a document at a layer of a deep neural network. For models trained specifically to embed data, this is the last layer.
A small example: If you search your photos for "famous bridge in San Francisco". By embedding this query and comparing it to the embeddings of your photos and their metadata - it should return photos of the Golden Gate Bridge.

Embeddings databases (also known as vector databases) store embeddings and allow you to search by nearest neighbors rather than by substrings like a traditional database. By default, Chroma uses Sentence Transformers to embed for you but you can also use OpenAI embeddings, Cohere (multilingual) embeddings, or your own.

Get involved

Chroma is a rapidly developing project. We welcome PR contributors and ideas for how to improve the project.

Join the conversation on Discord - #contributing channel
Review the ð£ï¸ Roadmap and contribute your ideas
Grab an issue and open a PR - Good first issue tag
Read our contributing guide

Release Cadence We currently release new tagged versions of the pypi and npm packages on Mondays. Hotfixes go out at any time during the week.

License

Apache 2.0

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Qdrant

Cons of Qdrant

Code Comparison

Pros of Milvus

Cons of Milvus

Code Comparison

Pros of Weaviate

Cons of Weaviate

Code Comparison

Pros of elasticsearch-py

Cons of elasticsearch-py

Code Comparison

Key Differences

Use Cases

Convert designs to code with AI

README

Features

Use case: ChatGPT for ______

Embeddings?

Get involved

License

Top Related Projects

Convert designs to code with AI

NPM DownloadsLast 30 Days