MindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

6,267

632

6,267

View on GitHub

Top Related Projects

semantic-kernel

23,530

Integrate cutting-edge LLM technology quickly and easily into your apps

llama_index

40,267

LlamaIndex is the leading framework for building LLM-powered agents over your data.

chroma

19,402

the AI-native open-source embedding database

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

txtai

10,629

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Quick Overview

MindSearch is an open-source natural language processing (NLP) library that provides a set of tools for building and deploying large language models (LLMs) on the web. It aims to make it easier for developers to integrate powerful AI-powered search and question-answering capabilities into their applications.

Pros

Modular and Extensible: MindSearch is designed with a modular architecture, allowing developers to easily integrate specific components or extend the functionality as needed.
Web-based Deployment: The library supports web-based deployment, making it easier to integrate AI-powered search and question-answering features into web applications.
Customizable Models: MindSearch allows developers to fine-tune and customize the language models to better suit their specific use cases.
Open-source and Community-driven: The project is open-source, encouraging community contributions and collaboration.

Cons

Limited Documentation: The project's documentation could be more comprehensive, which may make it challenging for new users to get started.
Performance Concerns: Depending on the size and complexity of the language models used, the performance of MindSearch-powered applications may be a concern, especially for real-time use cases.
Dependency on External Libraries: MindSearch relies on several external libraries and frameworks, which may introduce additional complexity and potential compatibility issues.
Lack of Widespread Adoption: As a relatively new project, MindSearch may not have the same level of community support and ecosystem as more established NLP libraries.

Code Examples

from mindsearch import MindSearchEngine

# Initialize the MindSearchEngine
engine = MindSearchEngine()

# Load a pre-trained language model
engine.load_model("path/to/model")

# Perform a search query
results = engine.search("What is the capital of France?")
print(results)

This code demonstrates how to initialize the MindSearchEngine, load a pre-trained language model, and perform a search query.

from mindsearch import MindSearchEngine

# Initialize the MindSearchEngine
engine = MindSearchEngine()

# Fine-tune the language model
engine.fine_tune_model("path/to/training_data")

# Evaluate the model's performance
accuracy = engine.evaluate_model()
print(f"Model accuracy: {accuracy}")

This code shows how to fine-tune the language model using custom training data and evaluate the model's performance.

from mindsearch import MindSearchEngine

# Initialize the MindSearchEngine
engine = MindSearchEngine()

# Deploy the MindSearchEngine as a web service
engine.deploy_as_web_service()

This code snippet demonstrates how to deploy the MindSearchEngine as a web service, making it accessible through a RESTful API.

Getting Started

To get started with MindSearch, follow these steps:

Install the required dependencies:
```
pip install mindsearch
```

Import the MindSearchEngine class and initialize it:

from mindsearch import MindSearchEngine
engine = MindSearchEngine()

Load a pre-trained language model:
```
engine.load_model("path/to/model")
```

Perform a search query:

results = engine.search("What is the capital of France?")
print(results)

(Optional) Fine-tune the language model with custom data:
```
engine.fine_tune_model("path/to/training_data")
```
(Optional) Deploy the MindSearchEngine as a web service:
```
engine.deploy_as_web_service()
```

For more detailed information, please refer to the project's documentation.

Competitor Comparisons

semantic-kernel

23,530

Integrate cutting-edge LLM technology quickly and easily into your apps

Pros of Semantic Kernel

More comprehensive framework for building AI applications
Better integration with Azure and other Microsoft services
Larger community and more extensive documentation

Cons of Semantic Kernel

Steeper learning curve due to its broader scope
Potentially more complex setup for simple projects
Stronger dependency on Microsoft ecosystem

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch(model_name="internlm-chat-7b")
result = ms.search("What is the capital of France?")
print(result)

Semantic Kernel:

using Microsoft.SemanticKernel;

var kernel = Kernel.Builder.Build();
var result = await kernel.RunAsync("What is the capital of France?");
Console.WriteLine(result);

Key Differences

MindSearch focuses specifically on search functionality, while Semantic Kernel is a more general-purpose AI development framework
MindSearch is Python-based, whereas Semantic Kernel primarily uses C#
Semantic Kernel offers more extensive plugin and skill integration capabilities
MindSearch may be easier to set up for simple search tasks, but Semantic Kernel provides more flexibility for complex AI applications

llama_index

40,267

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Pros of LlamaIndex

More extensive documentation and examples
Broader range of integrations with other tools and frameworks
Active community support and frequent updates

Cons of LlamaIndex

Potentially more complex setup for beginners
May have higher computational requirements for large-scale applications

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_documents("path/to/documents")
results = ms.search("query")

LlamaIndex:

from llama_index import GPTSimpleVectorIndex, Document

documents = [Document(text) for text in texts]
index = GPTSimpleVectorIndex(documents)
response = index.query("query")

Key Differences

MindSearch focuses on simplicity and ease of use for basic search functionality
LlamaIndex offers more advanced features and customization options
LlamaIndex has a larger ecosystem and integration possibilities
MindSearch may be more suitable for quick prototyping or smaller projects
LlamaIndex is better suited for complex, production-level applications

chroma

19,402

the AI-native open-source embedding database

Pros of Chroma

More mature and widely adopted project with a larger community
Supports multiple programming languages (Python, JavaScript, Rust)
Offers both open-source and cloud-hosted options for flexibility

Cons of Chroma

May have higher resource requirements for large-scale deployments
Learning curve can be steeper due to more extensive features

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_documents(documents)
results = ms.search("query")

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(documents=documents, metadatas=metadatas, ids=ids)
results = collection.query(query_texts=["query"], n_results=10)

Key Differences

MindSearch focuses on simplicity and ease of use for quick implementation
Chroma offers more advanced features and customization options
MindSearch is tailored for specific use cases, while Chroma is more versatile
Chroma has a larger ecosystem of tools and integrations

Use Case Considerations

Choose MindSearch for rapid prototyping or simpler search requirements
Opt for Chroma when scalability and advanced features are priorities
Consider MindSearch for projects with limited resources or simpler architectures
Select Chroma for enterprise-level applications or when multi-language support is needed

haystack

19,922

Pros of Haystack

More mature and widely adopted project with extensive documentation
Supports a broader range of use cases and integrations
Offers a modular architecture for flexible pipeline construction

Cons of Haystack

Steeper learning curve due to its extensive feature set
Potentially higher resource requirements for complex pipelines
May be overkill for simpler search applications

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_documents(documents)
results = ms.search("query")

Haystack:

from haystack import Pipeline
from haystack.nodes import ElasticsearchRetriever, FARMReader

pipeline = Pipeline()
pipeline.add_node(component=ElasticsearchRetriever(), name="Retriever", inputs=["Query"])
pipeline.add_node(component=FARMReader(), name="Reader", inputs=["Retriever"])
results = pipeline.run(query="query")

The code comparison shows that MindSearch offers a simpler, more straightforward API for basic search functionality, while Haystack provides a more flexible and customizable pipeline approach for complex search and question-answering tasks.

txtai

10,629

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Pros of txtai

More comprehensive NLP toolkit with broader functionality beyond search
Better documentation and examples for easier adoption
Active development with frequent updates and community support

Cons of txtai

May be overkill for simple search use cases
Potentially steeper learning curve due to more extensive features
Less focus on specialized mind-mapping capabilities

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_document("doc1", "This is a sample document")
results = ms.search("sample")

txtai:

from txtai.embeddings import Embeddings

embeddings = Embeddings()
embeddings.index([(0, "This is a sample document", None)])
results = embeddings.search("sample", 1)

Summary

While MindSearch focuses on mind-mapping and specialized search functionality, txtai offers a more comprehensive NLP toolkit with broader applications. txtai provides better documentation and community support but may have a steeper learning curve due to its extensive feature set. MindSearch might be more suitable for specific mind-mapping use cases, while txtai is better suited for general NLP tasks and complex search implementations.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ð Paper | ð» Demo

English | ç®ä½ä¸æ

https://github.com/user-attachments/assets/44ffe4b9-be26-4b93-a77b-02fed16e33fe

â¨ MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

ð Changelog

2024/11/05: ð¥³ MindSearch is now deployed on Puyu! ð Try it ð
- Refactored the agent module based on Lagent v0.5 for better performance in concurrency.
- Improved the UI to embody the simultaneous multi-query search.

â½ï¸ Build Your Own MindSearch

Step1: Dependencies Installation

git clone https://github.com/InternLM/MindSearch
cd MindSearch
pip install -r requirements.txt

Step2: Setup Environment Variables

Before setting up the API, you need to configure environment variables. Rename the .env.example file to .env and fill in the required values.

mv .env.example .env
# Open .env and add your keys and model configurations

Step3: Setup MindSearch API

Setup FastAPI Server.

python -m mindsearch.app --lang en --model_format internlm_server --search_engine DuckDuckGoSearch --asy

--lang: language of the model, en for English and cn for Chinese.
--model_format: format of the model.
- internlm_server for InternLM2.5-7b-chat with local server. (InternLM2.5-7b-chat has been better optimized for Chinese.)
- gpt4 for GPT4. if you want to use other models, please modify models
--search_engine: Search engine.
- DuckDuckGoSearch for search engine for DuckDuckGo.
- BingSearch for Bing search engine.
- BraveSearch for Brave search web api engine.
- GoogleSearch for Google Serper web search api engine.
- TencentSearch for Tencent search api engine.
Please set your Web Search engine API key as the WEB_SEARCH_API_KEY environment variable unless you are using DuckDuckGo, or TencentSearch that requires secret id as TENCENT_SEARCH_SECRET_ID and secret key as TENCENT_SEARCH_SECRET_KEY.
--asy: deploy asynchronous agents.

Step4: Setup MindSearch Frontend

Providing following frontend interfaces,

React

First configurate the backend URL for Vite proxy.

HOST="127.0.0.1"  # modify as you need
PORT=8002
sed -i -r "s/target:\s*\"\"/target: \"${HOST}:${PORT}\"/" frontend/React/vite.config.ts

# Install Node.js and npm
# for Ubuntu
sudo apt install nodejs npm

# for windows
# download from https://nodejs.org/zh-cn/download/prebuilt-installer

# Install dependencies

cd frontend/React
npm install
npm start

Details can be found in React

Gradio

python frontend/mindsearch_gradio.py

Streamlit

streamlit run frontend/mindsearch_streamlit.py

ð Change Web Search API

To use a different type of web search API, modify the searcher_type attribute in the searcher_cfg located in mindsearch/agent/__init__.py. Currently supported web search APIs include:

GoogleSearch
DuckDuckGoSearch
BraveSearch
BingSearch
TencentSearch

For example, to change to the Brave Search API, you would configure it as follows:

BingBrowser(
    searcher_type='BraveSearch',
    topk=2,
    api_key=os.environ.get('BRAVE_API_KEY', 'YOUR BRAVE API')
)

ð Using the Backend Without Frontend

For users who prefer to interact with the backend directly, use the backend_example.py script. This script demonstrates how to send a query to the backend and process the response.

python backend_example.py

Make sure you have set up the environment variables and the backend is running before executing the script.

ð Debug Locally

python -m mindsearch.terminal

ð License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful in your research, please consider cite:

@article{chen2024mindsearch,
  title={MindSearch: Mimicking Human Minds Elicits Deep AI Searcher},
  author={Chen, Zehui and Liu, Kuikun and Wang, Qiuchen and Liu, Jiangning and Zhang, Wenwei and Chen, Kai and Zhao, Feng},
  journal={arXiv preprint arXiv:2407.20183},
  year={2024}
}

Our Projects

Explore our additional research on large language models, focusing on LLM agents.

Lagent: A lightweight framework for building LLM-based agents
AgentFLAN: An innovative approach for constructing and training with high-quality agent datasets (ACL 2024 Findings)
T-Eval: A Fine-grained tool utilization evaluation benchmark (ACL 2024)

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Semantic Kernel

Cons of Semantic Kernel

Code Comparison

Key Differences

Pros of LlamaIndex

Cons of LlamaIndex

Code Comparison

Key Differences

Pros of Chroma

Cons of Chroma

Code Comparison

Key Differences

Use Case Considerations

Pros of Haystack

Cons of Haystack

Code Comparison

Pros of txtai

Cons of txtai

Code Comparison

Summary

Convert designs to code with AI

README

â¨ MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

ð Changelog

â½ï¸ Build Your Own MindSearch

Step1: Dependencies Installation

Step2: Setup Environment Variables

Step3: Setup MindSearch API

Step4: Setup MindSearch Frontend

ð Change Web Search API

ð Using the Backend Without Frontend

ð Debug Locally

ð License

Citation

Our Projects

Top Related Projects

Convert designs to code with AI

â¨ MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

ð Changelog

â½ï¸ Build Your Own MindSearch

ð Change Web Search API

ð Using the Backend Without Frontend

ð Debug Locally

ð License