MindSearch
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
Top Related Projects
Integrate cutting-edge LLM technology quickly and easily into your apps
LlamaIndex is the leading framework for building LLM-powered agents over your data.
the AI-native open-source embedding database
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Quick Overview
MindSearch is an open-source natural language processing (NLP) library that provides a set of tools for building and deploying large language models (LLMs) on the web. It aims to make it easier for developers to integrate powerful AI-powered search and question-answering capabilities into their applications.
Pros
- Modular and Extensible: MindSearch is designed with a modular architecture, allowing developers to easily integrate specific components or extend the functionality as needed.
- Web-based Deployment: The library supports web-based deployment, making it easier to integrate AI-powered search and question-answering features into web applications.
- Customizable Models: MindSearch allows developers to fine-tune and customize the language models to better suit their specific use cases.
- Open-source and Community-driven: The project is open-source, encouraging community contributions and collaboration.
Cons
- Limited Documentation: The project's documentation could be more comprehensive, which may make it challenging for new users to get started.
- Performance Concerns: Depending on the size and complexity of the language models used, the performance of MindSearch-powered applications may be a concern, especially for real-time use cases.
- Dependency on External Libraries: MindSearch relies on several external libraries and frameworks, which may introduce additional complexity and potential compatibility issues.
- Lack of Widespread Adoption: As a relatively new project, MindSearch may not have the same level of community support and ecosystem as more established NLP libraries.
Code Examples
from mindsearch import MindSearchEngine
# Initialize the MindSearchEngine
engine = MindSearchEngine()
# Load a pre-trained language model
engine.load_model("path/to/model")
# Perform a search query
results = engine.search("What is the capital of France?")
print(results)
This code demonstrates how to initialize the MindSearchEngine
, load a pre-trained language model, and perform a search query.
from mindsearch import MindSearchEngine
# Initialize the MindSearchEngine
engine = MindSearchEngine()
# Fine-tune the language model
engine.fine_tune_model("path/to/training_data")
# Evaluate the model's performance
accuracy = engine.evaluate_model()
print(f"Model accuracy: {accuracy}")
This code shows how to fine-tune the language model using custom training data and evaluate the model's performance.
from mindsearch import MindSearchEngine
# Initialize the MindSearchEngine
engine = MindSearchEngine()
# Deploy the MindSearchEngine as a web service
engine.deploy_as_web_service()
This code snippet demonstrates how to deploy the MindSearchEngine
as a web service, making it accessible through a RESTful API.
Getting Started
To get started with MindSearch, follow these steps:
-
Install the required dependencies:
pip install mindsearch
-
Import the
MindSearchEngine
class and initialize it:from mindsearch import MindSearchEngine engine = MindSearchEngine()
-
Load a pre-trained language model:
engine.load_model("path/to/model")
-
Perform a search query:
results = engine.search("What is the capital of France?") print(results)
-
(Optional) Fine-tune the language model with custom data:
engine.fine_tune_model("path/to/training_data")
-
(Optional) Deploy the
MindSearchEngine
as a web service:engine.deploy_as_web_service()
For more detailed information, please refer to the project's documentation.
Competitor Comparisons
Integrate cutting-edge LLM technology quickly and easily into your apps
Pros of Semantic Kernel
- More comprehensive framework for building AI applications
- Better integration with Azure and other Microsoft services
- Larger community and more extensive documentation
Cons of Semantic Kernel
- Steeper learning curve due to its broader scope
- Potentially more complex setup for simple projects
- Stronger dependency on Microsoft ecosystem
Code Comparison
MindSearch:
from mindsearch import MindSearch
ms = MindSearch(model_name="internlm-chat-7b")
result = ms.search("What is the capital of France?")
print(result)
Semantic Kernel:
using Microsoft.SemanticKernel;
var kernel = Kernel.Builder.Build();
var result = await kernel.RunAsync("What is the capital of France?");
Console.WriteLine(result);
Key Differences
- MindSearch focuses specifically on search functionality, while Semantic Kernel is a more general-purpose AI development framework
- MindSearch is Python-based, whereas Semantic Kernel primarily uses C#
- Semantic Kernel offers more extensive plugin and skill integration capabilities
- MindSearch may be easier to set up for simple search tasks, but Semantic Kernel provides more flexibility for complex AI applications
LlamaIndex is the leading framework for building LLM-powered agents over your data.
Pros of LlamaIndex
- More extensive documentation and examples
- Broader range of integrations with other tools and frameworks
- Active community support and frequent updates
Cons of LlamaIndex
- Potentially more complex setup for beginners
- May have higher computational requirements for large-scale applications
Code Comparison
MindSearch:
from mindsearch import MindSearch
ms = MindSearch()
ms.add_documents("path/to/documents")
results = ms.search("query")
LlamaIndex:
from llama_index import GPTSimpleVectorIndex, Document
documents = [Document(text) for text in texts]
index = GPTSimpleVectorIndex(documents)
response = index.query("query")
Key Differences
- MindSearch focuses on simplicity and ease of use for basic search functionality
- LlamaIndex offers more advanced features and customization options
- LlamaIndex has a larger ecosystem and integration possibilities
- MindSearch may be more suitable for quick prototyping or smaller projects
- LlamaIndex is better suited for complex, production-level applications
the AI-native open-source embedding database
Pros of Chroma
- More mature and widely adopted project with a larger community
- Supports multiple programming languages (Python, JavaScript, Rust)
- Offers both open-source and cloud-hosted options for flexibility
Cons of Chroma
- May have higher resource requirements for large-scale deployments
- Learning curve can be steeper due to more extensive features
Code Comparison
MindSearch:
from mindsearch import MindSearch
ms = MindSearch()
ms.add_documents(documents)
results = ms.search("query")
Chroma:
import chromadb
client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(documents=documents, metadatas=metadatas, ids=ids)
results = collection.query(query_texts=["query"], n_results=10)
Key Differences
- MindSearch focuses on simplicity and ease of use for quick implementation
- Chroma offers more advanced features and customization options
- MindSearch is tailored for specific use cases, while Chroma is more versatile
- Chroma has a larger ecosystem of tools and integrations
Use Case Considerations
- Choose MindSearch for rapid prototyping or simpler search requirements
- Opt for Chroma when scalability and advanced features are priorities
- Consider MindSearch for projects with limited resources or simpler architectures
- Select Chroma for enterprise-level applications or when multi-language support is needed
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Pros of Haystack
- More mature and widely adopted project with extensive documentation
- Supports a broader range of use cases and integrations
- Offers a modular architecture for flexible pipeline construction
Cons of Haystack
- Steeper learning curve due to its extensive feature set
- Potentially higher resource requirements for complex pipelines
- May be overkill for simpler search applications
Code Comparison
MindSearch:
from mindsearch import MindSearch
ms = MindSearch()
ms.add_documents(documents)
results = ms.search("query")
Haystack:
from haystack import Pipeline
from haystack.nodes import ElasticsearchRetriever, FARMReader
pipeline = Pipeline()
pipeline.add_node(component=ElasticsearchRetriever(), name="Retriever", inputs=["Query"])
pipeline.add_node(component=FARMReader(), name="Reader", inputs=["Retriever"])
results = pipeline.run(query="query")
The code comparison shows that MindSearch offers a simpler, more straightforward API for basic search functionality, while Haystack provides a more flexible and customizable pipeline approach for complex search and question-answering tasks.
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Pros of txtai
- More comprehensive NLP toolkit with broader functionality beyond search
- Better documentation and examples for easier adoption
- Active development with frequent updates and community support
Cons of txtai
- May be overkill for simple search use cases
- Potentially steeper learning curve due to more extensive features
- Less focus on specialized mind-mapping capabilities
Code Comparison
MindSearch:
from mindsearch import MindSearch
ms = MindSearch()
ms.add_document("doc1", "This is a sample document")
results = ms.search("sample")
txtai:
from txtai.embeddings import Embeddings
embeddings = Embeddings()
embeddings.index([(0, "This is a sample document", None)])
results = embeddings.search("sample", 1)
Summary
While MindSearch focuses on mind-mapping and specialized search functionality, txtai offers a more comprehensive NLP toolkit with broader applications. txtai provides better documentation and community support but may have a steeper learning curve due to its extensive feature set. MindSearch might be more suitable for specific mind-mapping use cases, while txtai is better suited for general NLP tasks and complex search implementations.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
English | ç®ä½ä¸æ
https://github.com/user-attachments/assets/44ffe4b9-be26-4b93-a77b-02fed16e33fe
⨠MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
ð Changelog
- 2024/11/05: 𥳠MindSearch is now deployed on Puyu! ð Try it ð
- Refactored the agent module based on Lagent v0.5 for better performance in concurrency.
- Improved the UI to embody the simultaneous multi-query search.
â½ï¸ Build Your Own MindSearch
Step1: Dependencies Installation
git clone https://github.com/InternLM/MindSearch
cd MindSearch
pip install -r requirements.txt
Step2: Setup Environment Variables
Before setting up the API, you need to configure environment variables. Rename the .env.example
file to .env
and fill in the required values.
mv .env.example .env
# Open .env and add your keys and model configurations
Step3: Setup MindSearch API
Setup FastAPI Server.
python -m mindsearch.app --lang en --model_format internlm_server --search_engine DuckDuckGoSearch --asy
--lang
: language of the model,en
for English andcn
for Chinese.--model_format
: format of the model.internlm_server
for InternLM2.5-7b-chat with local server. (InternLM2.5-7b-chat has been better optimized for Chinese.)gpt4
for GPT4. if you want to use other models, please modify models
--search_engine
: Search engine.DuckDuckGoSearch
for search engine for DuckDuckGo.BingSearch
for Bing search engine.BraveSearch
for Brave search web api engine.GoogleSearch
for Google Serper web search api engine.TencentSearch
for Tencent search api engine.
WEB_SEARCH_API_KEY
environment variable unless you are usingDuckDuckGo
, orTencentSearch
that requires secret id asTENCENT_SEARCH_SECRET_ID
and secret key asTENCENT_SEARCH_SECRET_KEY
.--asy
: deploy asynchronous agents.
Step4: Setup MindSearch Frontend
Providing following frontend interfaces,
- React
First configurate the backend URL for Vite proxy.
HOST="127.0.0.1" # modify as you need
PORT=8002
sed -i -r "s/target:\s*\"\"/target: \"${HOST}:${PORT}\"/" frontend/React/vite.config.ts
# Install Node.js and npm
# for Ubuntu
sudo apt install nodejs npm
# for windows
# download from https://nodejs.org/zh-cn/download/prebuilt-installer
# Install dependencies
cd frontend/React
npm install
npm start
Details can be found in React
- Gradio
python frontend/mindsearch_gradio.py
- Streamlit
streamlit run frontend/mindsearch_streamlit.py
ð Change Web Search API
To use a different type of web search API, modify the searcher_type
attribute in the searcher_cfg
located in mindsearch/agent/__init__.py
. Currently supported web search APIs include:
GoogleSearch
DuckDuckGoSearch
BraveSearch
BingSearch
TencentSearch
For example, to change to the Brave Search API, you would configure it as follows:
BingBrowser(
searcher_type='BraveSearch',
topk=2,
api_key=os.environ.get('BRAVE_API_KEY', 'YOUR BRAVE API')
)
ð Using the Backend Without Frontend
For users who prefer to interact with the backend directly, use the backend_example.py
script. This script demonstrates how to send a query to the backend and process the response.
python backend_example.py
Make sure you have set up the environment variables and the backend is running before executing the script.
ð Debug Locally
python -m mindsearch.terminal
ð License
This project is released under the Apache 2.0 license.
Citation
If you find this project useful in your research, please consider cite:
@article{chen2024mindsearch,
title={MindSearch: Mimicking Human Minds Elicits Deep AI Searcher},
author={Chen, Zehui and Liu, Kuikun and Wang, Qiuchen and Liu, Jiangning and Zhang, Wenwei and Chen, Kai and Zhao, Feng},
journal={arXiv preprint arXiv:2407.20183},
year={2024}
}
Our Projects
Explore our additional research on large language models, focusing on LLM agents.
Top Related Projects
Integrate cutting-edge LLM technology quickly and easily into your apps
LlamaIndex is the leading framework for building LLM-powered agents over your data.
the AI-native open-source embedding database
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot