Convert Figma logo to code with AI

chroma-core logochroma

the AI-native open-source embedding database

14,422
1,202
14,422
608

Top Related Projects

19,731

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

29,291

A cloud-native vector database, storage for next generation AI applications

10,708

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

Official Python client for Elasticsearch

Quick Overview

Chroma is an open-source embedding database designed for building AI applications with embeddings. It allows developers to store, search, and analyze vector embeddings efficiently, making it easier to create semantic search, recommendation systems, and other AI-powered features.

Pros

  • Easy to use and integrate with existing AI/ML workflows
  • Supports various embedding models and distance functions
  • Offers both local and cloud-hosted options for flexibility
  • Provides a simple API for querying and managing embeddings

Cons

  • May have performance limitations for extremely large datasets
  • Documentation could be more comprehensive for advanced use cases
  • Limited built-in analytics and visualization tools
  • Relatively new project, so the ecosystem is still developing

Code Examples

  1. Creating a collection and adding documents:
import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")

collection.add(
    documents=["This is a document", "This is another document"],
    metadatas=[{"source": "my_source"}, {"source": "my_source"}],
    ids=["id1", "id2"]
)
  1. Querying the collection:
results = collection.query(
    query_texts=["This is a query document"],
    n_results=2
)
print(results)
  1. Updating and deleting documents:
collection.update(
    ids=["id1"],
    documents=["This is an updated document"],
    metadatas=[{"source": "updated_source"}]
)

collection.delete(ids=["id2"])

Getting Started

To get started with Chroma, follow these steps:

  1. Install Chroma:
pip install chromadb
  1. Create a simple script:
import chromadb

client = chromadb.Client()
collection = client.create_collection("quickstart")

collection.add(
    documents=["Hello world", "Goodbye world"],
    metadatas=[{"source": "greeting"}, {"source": "farewell"}],
    ids=["1", "2"]
)

results = collection.query(
    query_texts=["hello"],
    n_results=1
)

print(results)
  1. Run the script and explore the results. You can now start building more complex applications using Chroma's embedding database capabilities.

Competitor Comparisons

19,731

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Pros of Qdrant

  • Written in Rust, offering high performance and memory safety
  • Supports complex vector search queries with filtering
  • Provides a distributed architecture for scalability

Cons of Qdrant

  • Steeper learning curve due to more advanced features
  • Requires more system resources for optimal performance

Code Comparison

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(
    documents=["This is a document", "This is another document"],
    ids=["id1", "id2"]
)

Qdrant:

from qdrant_client import QdrantClient

client = QdrantClient("localhost", port=6333)
client.recreate_collection(
    collection_name="my_collection",
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE)
)
client.upsert(
    collection_name="my_collection",
    points=[
        models.PointStruct(id=1, vector=[0.05, 0.61, 0.76], payload={"color": "red"}),
        models.PointStruct(id=2, vector=[0.19, 0.81, 0.75], payload={"color": "blue"}),
    ]
)
29,291

A cloud-native vector database, storage for next generation AI applications

Pros of Milvus

  • Highly scalable and distributed architecture for large-scale vector search
  • Supports multiple index types and similarity metrics for diverse use cases
  • Offers advanced features like data management and real-time search capabilities

Cons of Milvus

  • More complex setup and configuration compared to Chroma
  • Steeper learning curve due to its extensive feature set
  • Requires more system resources for optimal performance

Code Comparison

Milvus (Python client):

from pymilvus import Collection, connections

connections.connect()
collection = Collection("example_collection")
results = collection.search(
    data=[vector],
    anns_field="embedding",
    param={"metric_type": "L2", "params": {"nprobe": 10}},
    limit=5
)

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("example_collection")
results = collection.query(
    query_embeddings=[vector],
    n_results=5
)

Both libraries offer similar functionality for vector search, but Milvus provides more advanced configuration options, while Chroma focuses on simplicity and ease of use. Milvus is better suited for large-scale, production deployments, whereas Chroma is ideal for quick prototyping and smaller-scale applications.

10,708

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

Pros of Weaviate

  • More mature and feature-rich, with a wider range of functionalities
  • Better scalability for large-scale production environments
  • Supports multiple vector index types (HNSW, LSH, FLAT)

Cons of Weaviate

  • Steeper learning curve due to more complex architecture
  • Requires more resources to run and maintain
  • Less straightforward setup compared to Chroma

Code Comparison

Weaviate (Python client):

import weaviate
client = weaviate.Client("http://localhost:8080")
client.schema.create_class({
    "class": "Article",
    "vectorizer": "text2vec-transformers"
})

Chroma:

import chromadb
client = chromadb.Client()
collection = client.create_collection("articles")
collection.add(
    documents=["content1", "content2"],
    metadatas=[{"source": "wiki"}, {"source": "book"}],
    ids=["id1", "id2"]
)

Both Weaviate and Chroma are vector databases, but they differ in complexity and use cases. Weaviate offers more advanced features and scalability, making it suitable for large-scale production environments. Chroma, on the other hand, provides a simpler interface and easier setup, which can be advantageous for smaller projects or quick prototyping. The code comparison shows that Weaviate requires more configuration, while Chroma offers a more straightforward API for basic operations.

Official Python client for Elasticsearch

Pros of elasticsearch-py

  • Mature and widely adopted Elasticsearch client for Python
  • Comprehensive API coverage for Elasticsearch operations
  • Extensive documentation and community support

Cons of elasticsearch-py

  • Focused solely on Elasticsearch, lacking vector search capabilities
  • Steeper learning curve for users new to Elasticsearch

Code Comparison

elasticsearch-py:

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")
doc = {"title": "Test Document", "content": "This is a test"}
es.index(index="my_index", body=doc)

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(
    documents=["This is a test"],
    metadatas=[{"title": "Test Document"}],
    ids=["1"]
)

Key Differences

  • Chroma focuses on vector databases and similarity search, while elasticsearch-py is for general-purpose document indexing and search
  • Chroma offers a simpler API for vector operations, making it easier for machine learning tasks
  • elasticsearch-py provides more advanced querying capabilities and supports complex aggregations

Use Cases

  • elasticsearch-py: Full-text search, log analysis, and complex data aggregations
  • Chroma: Similarity search, recommendation systems, and AI-powered applications

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Chroma logo

Chroma - the open-source embedding database.
The fastest way to build Python or JavaScript LLM apps with memory!

Discord | License | Docs | Homepage

pip install chromadb # python client
# for javascript, npm install chromadb!
# for client-server mode, chroma run --path /chroma_db_path

The core API is only 4 functions (run our 💡 Google Colab or Replit template):

import chromadb
# setup Chroma in-memory, for easy prototyping. Can add persistence easily!
client = chromadb.Client()

# Create collection. get_collection, get_or_create_collection, delete_collection also available!
collection = client.create_collection("all-my-documents")

# Add docs to the collection. Can also update and delete. Row-based API coming soon!
collection.add(
    documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these!
    ids=["doc1", "doc2"], # unique for each doc
)

# Query/search 2 most similar results. You can also .get by id
results = collection.query(
    query_texts=["This is a query document"],
    n_results=2,
    # where={"metadata_field": "is_equal_to_this"}, # optional filter
    # where_document={"$contains":"search_string"}  # optional filter
)

Features

  • Simple: Fully-typed, fully-tested, fully-documented == happiness
  • Integrations: 🦜️🔗 LangChain (python and js), 🦙 LlamaIndex and more soon
  • Dev, Test, Prod: the same API that runs in your python notebook, scales to your cluster
  • Feature-rich: Queries, filtering, density estimation and more
  • Free & Open Source: Apache 2.0 Licensed

Use case: ChatGPT for ______

For example, the "Chat your data" use case:

  1. Add documents to your database. You can pass in your own embeddings, embedding function, or let Chroma embed them for you.
  2. Query relevant documents with natural language.
  3. Compose documents into the context window of an LLM like GPT3 for additional summarization or analysis.

Embeddings?

What are embeddings?

  • Read the guide from OpenAI
  • Literal: Embedding something turns it from image/text/audio into a list of numbers. 🖼️ or 📄 => [1.2, 2.1, ....]. This process makes documents "understandable" to a machine learning model.
  • By analogy: An embedding represents the essence of a document. This enables documents and queries with the same essence to be "near" each other and therefore easy to find.
  • Technical: An embedding is the latent-space position of a document at a layer of a deep neural network. For models trained specifically to embed data, this is the last layer.
  • A small example: If you search your photos for "famous bridge in San Francisco". By embedding this query and comparing it to the embeddings of your photos and their metadata - it should return photos of the Golden Gate Bridge.

Embeddings databases (also known as vector databases) store embeddings and allow you to search by nearest neighbors rather than by substrings like a traditional database. By default, Chroma uses Sentence Transformers to embed for you but you can also use OpenAI embeddings, Cohere (multilingual) embeddings, or your own.

Get involved

Chroma is a rapidly developing project. We welcome PR contributors and ideas for how to improve the project.

Release Cadence We currently release new tagged versions of the pypi and npm packages on Mondays. Hotfixes go out at any time during the week.

License

Apache 2.0

NPM DownloadsLast 30 Days