Convert Figma logo to code with AI

InternLM logoMindSearch

🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)

6,267
632
6,267
45

Top Related Projects

Integrate cutting-edge LLM technology quickly and easily into your apps

LlamaIndex is the leading framework for building LLM-powered agents over your data.

19,402

the AI-native open-source embedding database

19,922

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

10,629

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Quick Overview

MindSearch is an open-source natural language processing (NLP) library that provides a set of tools for building and deploying large language models (LLMs) on the web. It aims to make it easier for developers to integrate powerful AI-powered search and question-answering capabilities into their applications.

Pros

  • Modular and Extensible: MindSearch is designed with a modular architecture, allowing developers to easily integrate specific components or extend the functionality as needed.
  • Web-based Deployment: The library supports web-based deployment, making it easier to integrate AI-powered search and question-answering features into web applications.
  • Customizable Models: MindSearch allows developers to fine-tune and customize the language models to better suit their specific use cases.
  • Open-source and Community-driven: The project is open-source, encouraging community contributions and collaboration.

Cons

  • Limited Documentation: The project's documentation could be more comprehensive, which may make it challenging for new users to get started.
  • Performance Concerns: Depending on the size and complexity of the language models used, the performance of MindSearch-powered applications may be a concern, especially for real-time use cases.
  • Dependency on External Libraries: MindSearch relies on several external libraries and frameworks, which may introduce additional complexity and potential compatibility issues.
  • Lack of Widespread Adoption: As a relatively new project, MindSearch may not have the same level of community support and ecosystem as more established NLP libraries.

Code Examples

from mindsearch import MindSearchEngine

# Initialize the MindSearchEngine
engine = MindSearchEngine()

# Load a pre-trained language model
engine.load_model("path/to/model")

# Perform a search query
results = engine.search("What is the capital of France?")
print(results)

This code demonstrates how to initialize the MindSearchEngine, load a pre-trained language model, and perform a search query.

from mindsearch import MindSearchEngine

# Initialize the MindSearchEngine
engine = MindSearchEngine()

# Fine-tune the language model
engine.fine_tune_model("path/to/training_data")

# Evaluate the model's performance
accuracy = engine.evaluate_model()
print(f"Model accuracy: {accuracy}")

This code shows how to fine-tune the language model using custom training data and evaluate the model's performance.

from mindsearch import MindSearchEngine

# Initialize the MindSearchEngine
engine = MindSearchEngine()

# Deploy the MindSearchEngine as a web service
engine.deploy_as_web_service()

This code snippet demonstrates how to deploy the MindSearchEngine as a web service, making it accessible through a RESTful API.

Getting Started

To get started with MindSearch, follow these steps:

  1. Install the required dependencies:

    pip install mindsearch
    
  2. Import the MindSearchEngine class and initialize it:

    from mindsearch import MindSearchEngine
    engine = MindSearchEngine()
    
  3. Load a pre-trained language model:

    engine.load_model("path/to/model")
    
  4. Perform a search query:

    results = engine.search("What is the capital of France?")
    print(results)
    
  5. (Optional) Fine-tune the language model with custom data:

    engine.fine_tune_model("path/to/training_data")
    
  6. (Optional) Deploy the MindSearchEngine as a web service:

    engine.deploy_as_web_service()
    

For more detailed information, please refer to the project's documentation.

Competitor Comparisons

Integrate cutting-edge LLM technology quickly and easily into your apps

Pros of Semantic Kernel

  • More comprehensive framework for building AI applications
  • Better integration with Azure and other Microsoft services
  • Larger community and more extensive documentation

Cons of Semantic Kernel

  • Steeper learning curve due to its broader scope
  • Potentially more complex setup for simple projects
  • Stronger dependency on Microsoft ecosystem

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch(model_name="internlm-chat-7b")
result = ms.search("What is the capital of France?")
print(result)

Semantic Kernel:

using Microsoft.SemanticKernel;

var kernel = Kernel.Builder.Build();
var result = await kernel.RunAsync("What is the capital of France?");
Console.WriteLine(result);

Key Differences

  • MindSearch focuses specifically on search functionality, while Semantic Kernel is a more general-purpose AI development framework
  • MindSearch is Python-based, whereas Semantic Kernel primarily uses C#
  • Semantic Kernel offers more extensive plugin and skill integration capabilities
  • MindSearch may be easier to set up for simple search tasks, but Semantic Kernel provides more flexibility for complex AI applications

LlamaIndex is the leading framework for building LLM-powered agents over your data.

Pros of LlamaIndex

  • More extensive documentation and examples
  • Broader range of integrations with other tools and frameworks
  • Active community support and frequent updates

Cons of LlamaIndex

  • Potentially more complex setup for beginners
  • May have higher computational requirements for large-scale applications

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_documents("path/to/documents")
results = ms.search("query")

LlamaIndex:

from llama_index import GPTSimpleVectorIndex, Document

documents = [Document(text) for text in texts]
index = GPTSimpleVectorIndex(documents)
response = index.query("query")

Key Differences

  • MindSearch focuses on simplicity and ease of use for basic search functionality
  • LlamaIndex offers more advanced features and customization options
  • LlamaIndex has a larger ecosystem and integration possibilities
  • MindSearch may be more suitable for quick prototyping or smaller projects
  • LlamaIndex is better suited for complex, production-level applications
19,402

the AI-native open-source embedding database

Pros of Chroma

  • More mature and widely adopted project with a larger community
  • Supports multiple programming languages (Python, JavaScript, Rust)
  • Offers both open-source and cloud-hosted options for flexibility

Cons of Chroma

  • May have higher resource requirements for large-scale deployments
  • Learning curve can be steeper due to more extensive features

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_documents(documents)
results = ms.search("query")

Chroma:

import chromadb

client = chromadb.Client()
collection = client.create_collection("my_collection")
collection.add(documents=documents, metadatas=metadatas, ids=ids)
results = collection.query(query_texts=["query"], n_results=10)

Key Differences

  • MindSearch focuses on simplicity and ease of use for quick implementation
  • Chroma offers more advanced features and customization options
  • MindSearch is tailored for specific use cases, while Chroma is more versatile
  • Chroma has a larger ecosystem of tools and integrations

Use Case Considerations

  • Choose MindSearch for rapid prototyping or simpler search requirements
  • Opt for Chroma when scalability and advanced features are priorities
  • Consider MindSearch for projects with limited resources or simpler architectures
  • Select Chroma for enterprise-level applications or when multi-language support is needed
19,922

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Pros of Haystack

  • More mature and widely adopted project with extensive documentation
  • Supports a broader range of use cases and integrations
  • Offers a modular architecture for flexible pipeline construction

Cons of Haystack

  • Steeper learning curve due to its extensive feature set
  • Potentially higher resource requirements for complex pipelines
  • May be overkill for simpler search applications

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_documents(documents)
results = ms.search("query")

Haystack:

from haystack import Pipeline
from haystack.nodes import ElasticsearchRetriever, FARMReader

pipeline = Pipeline()
pipeline.add_node(component=ElasticsearchRetriever(), name="Retriever", inputs=["Query"])
pipeline.add_node(component=FARMReader(), name="Reader", inputs=["Retriever"])
results = pipeline.run(query="query")

The code comparison shows that MindSearch offers a simpler, more straightforward API for basic search functionality, while Haystack provides a more flexible and customizable pipeline approach for complex search and question-answering tasks.

10,629

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Pros of txtai

  • More comprehensive NLP toolkit with broader functionality beyond search
  • Better documentation and examples for easier adoption
  • Active development with frequent updates and community support

Cons of txtai

  • May be overkill for simple search use cases
  • Potentially steeper learning curve due to more extensive features
  • Less focus on specialized mind-mapping capabilities

Code Comparison

MindSearch:

from mindsearch import MindSearch

ms = MindSearch()
ms.add_document("doc1", "This is a sample document")
results = ms.search("sample")

txtai:

from txtai.embeddings import Embeddings

embeddings = Embeddings()
embeddings.index([(0, "This is a sample document", None)])
results = embeddings.search("sample", 1)

Summary

While MindSearch focuses on mind-mapping and specialized search functionality, txtai offers a more comprehensive NLP toolkit with broader applications. txtai provides better documentation and community support but may have a steeper learning curve due to its extensive feature set. MindSearch might be more suitable for specific mind-mapping use cases, while txtai is better suited for general NLP tasks and complex search implementations.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

✨ MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

📅 Changelog

  • 2024/11/05: 🥳 MindSearch is now deployed on Puyu! 👉 Try it 👈
    • Refactored the agent module based on Lagent v0.5 for better performance in concurrency.
    • Improved the UI to embody the simultaneous multi-query search.

⚽️ Build Your Own MindSearch

Step1: Dependencies Installation

git clone https://github.com/InternLM/MindSearch
cd MindSearch
pip install -r requirements.txt

Step2: Setup Environment Variables

Before setting up the API, you need to configure environment variables. Rename the .env.example file to .env and fill in the required values.

mv .env.example .env
# Open .env and add your keys and model configurations

Step3: Setup MindSearch API

Setup FastAPI Server.

python -m mindsearch.app --lang en --model_format internlm_server --search_engine DuckDuckGoSearch --asy 
  • --lang: language of the model, en for English and cn for Chinese.
  • --model_format: format of the model.
    • internlm_server for InternLM2.5-7b-chat with local server. (InternLM2.5-7b-chat has been better optimized for Chinese.)
    • gpt4 for GPT4. if you want to use other models, please modify models
  • --search_engine: Search engine.
    • DuckDuckGoSearch for search engine for DuckDuckGo.
    • BingSearch for Bing search engine.
    • BraveSearch for Brave search web api engine.
    • GoogleSearch for Google Serper web search api engine.
    • TencentSearch for Tencent search api engine.
    Please set your Web Search engine API key as the WEB_SEARCH_API_KEY environment variable unless you are using DuckDuckGo, or TencentSearch that requires secret id as TENCENT_SEARCH_SECRET_ID and secret key as TENCENT_SEARCH_SECRET_KEY.
  • --asy: deploy asynchronous agents.

Step4: Setup MindSearch Frontend

Providing following frontend interfaces,

  • React

First configurate the backend URL for Vite proxy.

HOST="127.0.0.1"  # modify as you need
PORT=8002
sed -i -r "s/target:\s*\"\"/target: \"${HOST}:${PORT}\"/" frontend/React/vite.config.ts
# Install Node.js and npm
# for Ubuntu
sudo apt install nodejs npm

# for windows
# download from https://nodejs.org/zh-cn/download/prebuilt-installer

# Install dependencies

cd frontend/React
npm install
npm start

Details can be found in React

  • Gradio
python frontend/mindsearch_gradio.py
  • Streamlit
streamlit run frontend/mindsearch_streamlit.py

🌐 Change Web Search API

To use a different type of web search API, modify the searcher_type attribute in the searcher_cfg located in mindsearch/agent/__init__.py. Currently supported web search APIs include:

  • GoogleSearch
  • DuckDuckGoSearch
  • BraveSearch
  • BingSearch
  • TencentSearch

For example, to change to the Brave Search API, you would configure it as follows:

BingBrowser(
    searcher_type='BraveSearch',
    topk=2,
    api_key=os.environ.get('BRAVE_API_KEY', 'YOUR BRAVE API')
)

🐞 Using the Backend Without Frontend

For users who prefer to interact with the backend directly, use the backend_example.py script. This script demonstrates how to send a query to the backend and process the response.

python backend_example.py

Make sure you have set up the environment variables and the backend is running before executing the script.

🐞 Debug Locally

python -m mindsearch.terminal

📝 License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful in your research, please consider cite:

@article{chen2024mindsearch,
  title={MindSearch: Mimicking Human Minds Elicits Deep AI Searcher},
  author={Chen, Zehui and Liu, Kuikun and Wang, Qiuchen and Liu, Jiangning and Zhang, Wenwei and Chen, Kai and Zhao, Feng},
  journal={arXiv preprint arXiv:2407.20183},
  year={2024}
}

Our Projects

Explore our additional research on large language models, focusing on LLM agents.

  • Lagent: A lightweight framework for building LLM-based agents
  • AgentFLAN: An innovative approach for constructing and training with high-quality agent datasets (ACL 2024 Findings)
  • T-Eval: A Fine-grained tool utilization evaluation benchmark (ACL 2024)