khoj
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
Top Related Projects
Examples and guides for using the OpenAI API
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A library for efficient similarity search and clustering of dense vectors.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
An open-source NLP research library, built on PyTorch.
Quick Overview
Khoj is an AI-powered personal search engine and chatbot. It allows users to search and chat with their personal knowledge base, including notes, documents, and images. Khoj aims to provide a privacy-focused, offline-first solution for personal information management and retrieval.
Pros
- Privacy-focused: Runs locally on your device, ensuring data privacy
- Versatile: Supports various file formats and integrations (e.g., Markdown, Org-mode, PDF, images)
- Customizable: Offers different search algorithms and embedding models
- Open-source: Allows for community contributions and transparency
Cons
- Resource-intensive: May require significant computational resources for large knowledge bases
- Setup complexity: Initial configuration and indexing process can be challenging for non-technical users
- Limited natural language understanding: May not always interpret complex queries accurately
- Ongoing maintenance: Requires regular updates and re-indexing to keep the knowledge base current
Code Examples
# Initialize Khoj client
from khoj.utils.khoj_client import KhojClient
client = KhojClient(api_url="http://localhost:8000")
# Perform a search query
results = client.search("What are the benefits of meditation?")
print(results)
# Chat with Khoj
conversation = client.chat("Tell me about the last book I read.")
for message in conversation:
print(f"{message['role']}: {message['content']}")
# Add a new file to the knowledge base
client.index_file("/path/to/new_document.md")
Getting Started
-
Install Khoj:
pip install khoj-assistant
-
Start the Khoj server:
khoj
-
Open a web browser and navigate to
http://localhost:8000
to access the Khoj web interface. -
Configure your knowledge base sources in the settings.
-
Begin searching and chatting with your personal knowledge base!
Competitor Comparisons
Examples and guides for using the OpenAI API
Pros of openai-cookbook
- Comprehensive collection of OpenAI API usage examples and best practices
- Regularly updated with new features and improvements from OpenAI
- Extensive documentation and explanations for various AI tasks
Cons of openai-cookbook
- Focused solely on OpenAI's products, limiting its applicability to other AI platforms
- Requires API keys and potentially significant costs for running examples
- Less emphasis on local, privacy-focused AI solutions
Code Comparison
openai-cookbook:
import openai
response = openai.Completion.create(
engine="text-davinci-002",
prompt="Translate the following English text to French: '{}'",
max_tokens=60
)
khoj:
from khoj.utils.ai import get_ai_response
response = get_ai_response(
"Translate the following English text to French: '{}'",
model="gpt-3.5-turbo"
)
Summary
openai-cookbook provides a wealth of information and examples for working with OpenAI's APIs, making it an excellent resource for developers using their services. However, it's limited to OpenAI's ecosystem and may involve costs.
khoj offers a more privacy-focused, local-first approach to AI integration, with support for multiple models and platforms. It may have a steeper learning curve but provides greater flexibility and control over data privacy.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Pros of transformers
- Extensive library of pre-trained models for various NLP tasks
- Well-documented and widely adopted in the AI/ML community
- Regular updates and contributions from a large open-source community
Cons of transformers
- Steep learning curve for beginners due to its comprehensive nature
- Can be resource-intensive, especially for large models
- Primarily focused on NLP tasks, limiting its use in other domains
Code Comparison
Khoj (Python):
from khoj.utils.constants import EMBEDDING_MODEL_NAME
from khoj.utils.rawconfig import RawConfig
from khoj.processor.content.text_to_entries import process_text_files
config = RawConfig(embedding_model=EMBEDDING_MODEL_NAME)
entries = process_text_files(config)
transformers (Python):
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModel.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)
A library for efficient similarity search and clustering of dense vectors.
Pros of FAISS
- Highly optimized for large-scale similarity search and clustering of dense vectors
- Supports GPU acceleration for faster processing
- Extensive documentation and wide industry adoption
Cons of FAISS
- Steeper learning curve due to its focus on low-level operations
- Limited to vector similarity search, lacking broader AI assistant capabilities
- Requires more setup and integration work for end-user applications
Code Comparison
FAISS (vector indexing and search):
import faiss
index = faiss.IndexFlatL2(d)
index.add(xb)
D, I = index.search(xq, k)
Khoj (AI assistant interaction):
from khoj.interface.cli import cli
result = cli.query("What is the capital of France?")
print(result)
Key Differences
FAISS is a specialized library for efficient similarity search and clustering of dense vectors, ideal for large-scale machine learning applications. Khoj, on the other hand, is an AI-powered personal assistant focused on natural language processing and information retrieval from personal knowledge bases.
While FAISS excels in vector operations, Khoj provides a more user-friendly interface for AI-assisted tasks and personal knowledge management. FAISS requires more technical expertise but offers greater flexibility for custom vector search implementations, whereas Khoj aims to provide an out-of-the-box solution for AI-powered personal assistance.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Pros of Haystack
- More comprehensive and feature-rich, offering a wide range of NLP tasks and pipelines
- Larger community and ecosystem, with better documentation and support
- Designed for production-ready, scalable applications
Cons of Haystack
- Steeper learning curve due to its complexity and extensive features
- Heavier resource requirements, which may be overkill for simpler projects
- Less focus on personal knowledge management compared to Khoj
Code Comparison
Khoj (Python):
from khoj.processor.text.semantic_search import SemanticSearch
searcher = SemanticSearch()
results = searcher.search("query", ["file1.txt", "file2.txt"])
Haystack (Python):
from haystack import Pipeline
from haystack.nodes import EmbeddingRetriever, Ranker
pipeline = Pipeline()
pipeline.add_node(component=EmbeddingRetriever(), name="Retriever", inputs=["Query"])
pipeline.add_node(component=Ranker(), name="Ranker", inputs=["Retriever"])
results = pipeline.run(query="query", documents=["file1.txt", "file2.txt"])
The code comparison shows that Khoj offers a simpler, more straightforward API for semantic search, while Haystack provides a more flexible and customizable pipeline approach. Haystack's code demonstrates its ability to chain multiple components together, which can be beneficial for complex NLP tasks but may be unnecessary for basic search functionality.
An open-source NLP research library, built on PyTorch.
Pros of AllenNLP
- Comprehensive NLP toolkit with a wide range of pre-built models and components
- Extensive documentation and tutorials for ease of use
- Large and active community support
Cons of AllenNLP
- Steeper learning curve for beginners due to its extensive feature set
- Heavier resource requirements for some models and tasks
Code Comparison
AllenNLP:
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="Did Uriah honestly think he could beat the game in under three hours?")
Khoj:
from khoj.utils.rawconfig import RawConfig
from khoj.processor.content.markdown import MarkdownContent
config = RawConfig(content_type="markdown")
processor = MarkdownContent(config)
entries = processor.process_file("path/to/file.md")
AllenNLP offers a more comprehensive set of NLP tools and pre-trained models, making it suitable for a wide range of NLP tasks. It has extensive documentation and community support, but may have a steeper learning curve for beginners. Khoj, on the other hand, is more focused on personal knowledge management and information retrieval, with a simpler API for processing specific content types like Markdown.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ð Docs ⢠ð Web ⢠ð¥ App ⢠ð¬ Discord ⢠âð½ Blog
ð New
- Start any message with
/research
to try out the experimental research mode with Khoj. - Anyone can now create custom agents with tunable personality, tools and knowledge bases.
- Read about Khoj's excellent performance on modern retrieval and reasoning benchmarks.
Overview
Khoj is a personal AI app to extend your capabilities. It smoothly scales up from an on-device personal AI to a cloud-scale enterprise AI.
- Chat with any local or online LLM (e.g llama3, qwen, gemma, mistral, gpt, claude, gemini, deepseek).
- Get answers from the internet and your docs (including image, pdf, markdown, org-mode, word, notion files).
- Access it from your Browser, Obsidian, Emacs, Desktop, Phone or Whatsapp.
- Create agents with custom knowledge, persona, chat model and tools to take on any role.
- Automate away repetitive research. Get personal newsletters and smart notifications delivered to your inbox.
- Find relevant docs quickly and easily using our advanced semantic search.
- Generate images, talk out loud, play your messages.
- Khoj is open-source, self-hostable. Always.
- Run it privately on your computer or try it on our cloud app.
See it in action
Go to https://app.khoj.dev to see Khoj live.
Full feature list
You can see the full feature list here.
Self-Host
To get started with self-hosting Khoj, read the docs.
Enterprise
Khoj is available as a cloud service, on-premises, or as a hybrid solution. To learn more about Khoj Enterprise, visit our website.
Frequently Asked Questions (FAQ)
Q: Can I use Khoj without self-hosting?
Yes! You can use Khoj right away at https://app.khoj.dev â no setup required.
Q: What kinds of documents can Khoj read?
Khoj supports a wide variety: PDFs, Markdown, Notion, Word docs, org-mode files, and more.
Q: How can I make my own agent?
Check out this blog post for a step-by-step guide to custom agents. For more questions, head over to our Discord!
Contributors
Cheers to our awesome contributors! ð
Made with contrib.rocks.
Interested in Contributing?
Khoj is open source. It is sustained by the community and weâd love for you to join it! Whether youâre a coder, designer, writer, or enthusiast, thereâs a place for you.
Why Contribute?
- Make an Impact: Help build, test and improve a tool used by thousands to boost productivity.
- Learn & Grow: Work on cutting-edge AI, LLMs, and semantic search technologies.
You can help us build new features, improve the project documentation, report issues and fix bugs. If you're a developer, please see our Contributing Guidelines and check out good first issues to work on.
Top Related Projects
Examples and guides for using the OpenAI API
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A library for efficient similarity search and clustering of dense vectors.
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
An open-source NLP research library, built on PyTorch.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot