localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

20,762

2,295

20,762

478

View on GitHub

Top Related Projects

private-gpt

56,132

Interact with your documents using the power of GPT, 100% privately, no data leaks

simpleaichat

3,517

Python package for easily interfacing with chat apps, with robust features and minimal code complexity.

semantic-kernel

25,112

Integrate cutting-edge LLM technology quickly and easily into your apps

langchain

112,752

🦜🔗 Build context-aware reasoning applications

chatgpt-retrieval-plugin

21,201

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.

chroma

21,316

the AI-native open-source embedding database

Quick Overview

LocalGPT is an open-source project that allows users to run GPT models locally on their own hardware. It provides a way to interact with large language models without relying on cloud services, ensuring privacy and control over data. The project aims to make AI more accessible and customizable for individual users and developers.

Pros

Ensures data privacy by running models locally
Provides full control over the model and its parameters
Reduces dependency on cloud services and associated costs
Allows for customization and fine-tuning of models for specific use cases

Cons

Requires significant computational resources to run large models effectively
May have lower performance compared to cloud-based solutions
Limited to the capabilities of the local hardware
Requires technical knowledge to set up and maintain

Getting Started

To get started with LocalGPT:

Clone the repository:

git clone https://github.com/PromtEngineer/localGPT.git

Install dependencies:

cd localGPT
pip install -r requirements.txt

Download a compatible model (e.g., GPT-J) and place it in the models directory.
Run the application:
```
python run_localGPT.py
```
Access the web interface at http://localhost:7860 to interact with the model.

Note: Ensure you have sufficient GPU memory and CUDA support for optimal performance.

Competitor Comparisons

private-gpt

56,132

Interact with your documents using the power of GPT, 100% privately, no data leaks

Pros of private-gpt

Supports multiple document types (PDF, TXT, CSV) for ingestion
Offers a user-friendly web interface for easier interaction
Provides more detailed documentation and setup instructions

Cons of private-gpt

May have higher resource requirements due to additional features
Potentially more complex setup process for beginners
Less frequent updates compared to localGPT

Code Comparison

localGPT:

def load_single_document(file_path):
    # Load and process a single document
    loader = UnstructuredFileLoader(file_path)
    return loader.load()[0]

private-gpt:

def load_document(file_path):
    # Load document based on file extension
    ext = os.path.splitext(file_path)[1].lower()
    if ext == ".pdf":
        return load_pdf(file_path)
    elif ext in [".txt", ".csv"]:
        return load_text(file_path)

Both projects aim to provide local, privacy-focused alternatives to GPT-based chatbots. localGPT focuses on simplicity and ease of use, while private-gpt offers more features and document support. The choice between them depends on specific needs and technical expertise.

simpleaichat

3,517

Python package for easily interfacing with chat apps, with robust features and minimal code complexity.

Pros of simpleaichat

Simpler setup and usage, ideal for quick prototyping
Supports multiple AI models, including OpenAI and Anthropic
Includes built-in conversation memory management

Cons of simpleaichat

Relies on external API services, potentially incurring costs
Less customizable for specific document processing tasks
May have limitations in handling large volumes of data

Code Comparison

simpleaichat:

from simpleaichat import AIChat

ai = AIChat(api_key="YOUR_API_KEY")
response = ai.chat("Hello, how are you?")
print(response)

localGPT:

from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import GPT4All

embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")
db = Chroma(persist_directory="db", embedding_function=embeddings)
llm = GPT4All(model="./models/gpt4all-model.bin", n_ctx=1000, backend='gptj')

localGPT focuses on local document processing and querying using embeddings and vector stores, while simpleaichat provides a straightforward interface for interacting with various AI models through APIs. The choice between them depends on specific use cases, data privacy requirements, and the need for customization in document processing tasks.

semantic-kernel

25,112

Integrate cutting-edge LLM technology quickly and easily into your apps

Pros of semantic-kernel

More comprehensive framework for building AI-powered applications
Supports multiple programming languages (C#, Python, Java)
Extensive documentation and examples for easier integration

Cons of semantic-kernel

Steeper learning curve due to its broader scope
Requires more setup and configuration
May be overkill for simple chatbot or Q&A applications

Code Comparison

localGPT:

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

semantic-kernel:

var kernel = Kernel.Builder.Build();
kernel.Config.AddOpenAITextCompletionService("davinci", "YOUR_API_KEY");
var promptTemplate = kernel.CreateSemanticFunction("{{$input}}");

localGPT focuses on creating a question-answering system using a retrieval-based approach, while semantic-kernel provides a more flexible framework for building various AI-powered applications. localGPT is simpler to set up for specific use cases, whereas semantic-kernel offers more extensive capabilities but requires more configuration.

langchain

112,752

🦜🔗 Build context-aware reasoning applications

Pros of langchain

More comprehensive framework for building LLM applications
Larger community and ecosystem with extensive documentation
Supports multiple LLMs and integrations with various tools

Cons of langchain

Steeper learning curve due to its extensive features
Potentially higher resource requirements for full functionality
May be overkill for simple, local LLM applications

Code Comparison

localGPT:

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectordb.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
)

langchain:

from langchain import OpenAI, VectorDBQA
from langchain.vectorstores import Chroma

qa = VectorDBQA.from_chain_type(
    llm=OpenAI(),
    chain_type="stuff",
    vectorstore=Chroma(embedding_function=embedding_function, persist_directory=persist_directory)
)

Both repositories aim to simplify working with LLMs, but langchain offers a more extensive toolkit for building complex LLM-powered applications. localGPT focuses on providing a straightforward solution for running LLMs locally, making it potentially easier to set up for simple use cases. The code snippets demonstrate similar approaches to creating question-answering systems, with langchain offering more flexibility in terms of LLM and vector store choices.

chatgpt-retrieval-plugin

21,201

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.

Pros of chatgpt-retrieval-plugin

Seamless integration with OpenAI's ChatGPT, leveraging its powerful language model
Supports multiple vector database options, including Pinecone, Weaviate, and Zilliz
Offers a flexible API for custom implementations and integrations

Cons of chatgpt-retrieval-plugin

Requires an OpenAI API key and relies on external services
May have higher latency due to API calls and external dependencies
Limited control over the underlying language model and retrieval process

Code Comparison

localGPT:

embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)
db = Chroma.from_documents(docs, embeddings, persist_directory=PERSIST_DIRECTORY)

chatgpt-retrieval-plugin:

from datastore.providers import pinecone
datastore = pinecone.PineconeDataStore()
datastore.upsert(documents)

Both projects aim to enhance document retrieval and question-answering capabilities, but they take different approaches. localGPT focuses on running everything locally, using open-source models and libraries, while chatgpt-retrieval-plugin is designed to work with OpenAI's ChatGPT and various vector databases. The code snippets show how localGPT uses HuggingFace embeddings and Chroma for document storage, whereas chatgpt-retrieval-plugin utilizes Pinecone as a vector database in this example.

chroma

21,316

the AI-native open-source embedding database

Pros of Chroma

More comprehensive database solution for AI applications
Actively maintained with frequent updates and contributions
Broader feature set for vector search and embedding management

Cons of Chroma

Steeper learning curve due to more complex architecture
May be overkill for simpler local LLM applications
Requires more setup and configuration compared to LocalGPT

Code Comparison

LocalGPT:

from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.vectorstores import Chroma

embeddings = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")
db = Chroma(persist_directory="db", embedding_function=embeddings)

Chroma:

import chromadb
from chromadb.config import Settings

client = chromadb.Client(Settings(
    chroma_db_impl="duckdb+parquet",
    persist_directory="db"
))
collection = client.create_collection("my_collection")

Both repositories utilize Chroma for vector storage, but Chroma offers more direct control over database implementation and collection management. LocalGPT simplifies the process by leveraging Langchain's integration with Chroma, making it easier to use for local LLM applications. Chroma provides greater flexibility and scalability for more complex AI systems, while LocalGPT focuses on a streamlined approach for running language models locally.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

LocalGPT: Secure, Local Conversations with Your Documents ð

ð¨ð¨ You can run localGPT on a pre-configured Virtual Machine. Make sure to use the code: PromptEngineering to get 50% off. I will get a small commision!

LocalGPT is an open-source initiative that allows you to converse with your documents without compromising your privacy. With everything running locally, you can be assured that no data ever leaves your computer. Dive into the world of secure, local document interactions with LocalGPT.

Features ð

Utmost Privacy: Your data remains on your computer, ensuring 100% security.
Versatile Model Support: Seamlessly integrate a variety of open-source models, including HF, GPTQ, GGML, and GGUF.
Diverse Embeddings: Choose from a range of open-source embeddings.
Reuse Your LLM: Once downloaded, reuse your LLM without the need for repeated downloads.
Chat History: Remembers your previous conversations (in a session).
API: LocalGPT has an API that you can use for building RAG Applications.
Graphical Interface: LocalGPT comes with two GUIs, one uses the API and the other is standalone (based on streamlit).
GPU, CPU, HPU & MPS Support: Supports multiple platforms out of the box, Chat with your data using CUDA, CPU, HPU (IntelÂ® GaudiÂ®) or MPS and more!

Dive Deeper with Our Videos ð¥

Technical Details ð ï¸

By selecting the right local models and the power of LangChain you can run the entire RAG pipeline locally, without any data leaving your environment, and with reasonable performance.

ingest.py uses LangChain tools to parse the document and create embeddings locally using InstructorEmbeddings. It then stores the result in a local vector database using Chroma vector store.
run_localGPT.py uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.
You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.

This project was inspired by the original privateGPT.

Built Using ð§©

Environment Setup ð

ð¥ Clone the repo using git:

git clone https://github.com/PromtEngineer/localGPT.git

ð Install conda for virtual environment management. Create and activate a new virtual environment.

conda create -n localGPT python=3.10.0
conda activate localGPT

ð ï¸ Install the dependencies using pip

To set up your environment to run the code, first install all requirements:

pip install -r requirements.txt

Installing LLAMA-CPP :

LocalGPT uses LlamaCpp-Python for GGML (you will need llama-cpp-python <=0.1.76) and GGUF (llama-cpp-python >=0.1.83) models.

To run the quantized Llama3 model, ensure you have llama-cpp-python version 0.2.62 or higher installed.

If you want to use BLAS or Metal with llama-cpp you can set appropriate flags:

For NVIDIA GPUs support, use cuBLAS

# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

For Apple Metal (M1/M2) support, use

# Example: METAL
CMAKE_ARGS="-DLLAMA_METAL=on"  FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

For more details, please refer to llama-cpp

Docker ð³

Installing the required packages for GPU inference on NVIDIA GPUs, like gcc 11 and CUDA 11, may cause conflicts with other packages in your system. As an alternative to Conda, you can use Docker with the provided Dockerfile. It includes CUDA, your system just needs Docker, BuildKit, your NVIDIA GPU driver and the NVIDIA container toolkit. Build as docker build -t localgpt ., requires BuildKit. Docker BuildKit does not support GPU during docker build time right now, only during docker run. Run as docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all localgpt. For running the code on IntelÂ® GaudiÂ® HPU, use the following Dockerfile - Dockerfile_hpu.

Test dataset

For testing, this repository comes with Constitution of USA as an example file to use.

Ingesting your OWN Data.

Put your files in the SOURCE_DOCUMENTS folder. You can put multiple folders within the SOURCE_DOCUMENTS folder and the code will recursively read your files.

Support file formats:

LocalGPT currently supports the following file formats. LocalGPT uses LangChain for loading these file formats. The code in constants.py uses a DOCUMENT_MAP dictionary to map a file format to the corresponding loader. In order to add support for another file format, simply add this dictionary with the file format and the corresponding loader from LangChain.

DOCUMENT_MAP = {
    ".txt": TextLoader,
    ".md": TextLoader,
    ".py": TextLoader,
    ".pdf": PDFMinerLoader,
    ".csv": CSVLoader,
    ".xls": UnstructuredExcelLoader,
    ".xlsx": UnstructuredExcelLoader,
    ".docx": Docx2txtLoader,
    ".doc": Docx2txtLoader,
}

Ingest

Run the following command to ingest all the data.

If you have cuda setup on your system.

python ingest.py

You will see an output like this: Screenshot 2023-09-14 at 3 36 27 PM

Use the device type argument to specify a given device. To run on cpu

python ingest.py --device_type cpu

To run on M1/M2

python ingest.py --device_type mps

Use help for a full list of supported devices.

python ingest.py --help

This will create a new folder called DB and use it for the newly created vector store. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database. If you want to start from an empty database, delete the DB and reingest your documents.

Note: When you run this for the first time, it will need internet access to download the embedding model (default: Instructor Embedding). In the subsequent runs, no data will leave your local environment and you can ingest data without internet connection.

Ask questions to your documents, locally!

In order to chat with your documents, run the following command (by default, it will run on cuda).

python run_localGPT.py

You can also specify the device type just like ingest.py

python run_localGPT.py --device_type mps # to run on Apple silicon

# To run on IntelÂ® GaudiÂ® hpu
MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.2" # in constants.py
python run_localGPT.py --device_type hpu

This will load the ingested vector store and embedding model. You will be presented with a prompt:

> Enter a query:

After typing your question, hit enter. LocalGPT will take some time based on your hardware. You will get a response like this below. Screenshot 2023-09-14 at 3 33 19 PM

Once the answer is generated, you can then ask another question without re-running the script, just wait for the prompt again.

Note: When you run this for the first time, it will need internet connection to download the LLM (default: TheBloke/Llama-2-7b-Chat-GGUF). After that you can turn off your internet connection, and the script inference would still work. No data gets out of your local environment.

Type exit to finish the script.

Extra Options with run_localGPT.py

You can use the --show_sources flag with run_localGPT.py to show which chunks were retrieved by the embedding model. By default, it will show 4 different sources/chunks. You can change the number of sources/chunks

python run_localGPT.py --show_sources

Another option is to enable chat history. Note: This is disabled by default and can be enabled by using the --use_history flag. The context window is limited so keep in mind enabling history will use it and might overflow.

python run_localGPT.py --use_history

You can store user questions and model responses with flag --save_qa into a csv file /local_chat_history/qa_log.csv. Every interaction will be stored.

python run_localGPT.py --save_qa

Run the Graphical User Interface

Open constants.py in an editor of your choice and depending on choice add the LLM you want to use. By default, the following model will be used:
```
MODEL_ID = "TheBloke/Llama-2-7b-Chat-GGUF"
MODEL_BASENAME = "llama-2-7b-chat.Q4_K_M.gguf"
```
Open up a terminal and activate your python environment that contains the dependencies installed from requirements.txt.
Navigate to the /LOCALGPT directory.
Run the following command python run_localGPT_API.py. The API should being to run.
Wait until everything has loaded in. You should see something like INFO:werkzeug:Press CTRL+C to quit.
Open up a second terminal and activate the same python environment.
Navigate to the /LOCALGPT/localGPTUI directory.
Run the command python localGPTUI.py.
Open up a web browser and go the address http://localhost:5111/.

How to select different LLM models?

To change the models you will need to set both MODEL_ID and MODEL_BASENAME.

Open up constants.py in the editor of your choice.
Change the MODEL_ID and MODEL_BASENAME. If you are using a quantized model (GGML, GPTQ, GGUF), you will need to provide MODEL_BASENAME. For unquantized models, set MODEL_BASENAME to NONE
There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").
For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.
- Make sure you have a MODEL_ID selected. For example -> MODEL_ID = "TheBloke/guanaco-7B-HF"
- Go to the HuggingFace Repo
For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.
- Make sure you have a MODEL_ID selected. For example -> model_id = "TheBloke/wizardLM-7B-GPTQ"
- Got to the corresponding HuggingFace Repo and select "Files and versions".
- Pick one of the model names and set it as MODEL_BASENAME. For example -> MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"
Follow the same steps for GGUF and GGML models.

GPU and VRAM Requirements

Below is the VRAM requirement for different models depending on their size (Billions of parameters). The estimates in the table does not include VRAM used by the Embedding models - which use an additional 2GB-7GB of VRAM depending on the model.

Mode Size (B)	float32	float16	GPTQ 8bit	GPTQ 4bit
7B	28 GB	14 GB	7 GB - 9 GB	3.5 GB - 5 GB
13B	52 GB	26 GB	13 GB - 15 GB	6.5 GB - 8 GB
32B	130 GB	65 GB	32.5 GB - 35 GB	16.25 GB - 19 GB
65B	260.8 GB	130.4 GB	65.2 GB - 67 GB	32.6 GB - 35 GB

System Requirements

Python Version

To use this software, you must have Python 3.10 or later installed. Earlier versions of Python will not compile.

C++ Compiler

If you encounter an error while building a wheel during the pip install process, you may need to install a C++ compiler on your computer.

For Windows 10/11

To install a C++ compiler on Windows 10/11, follow these steps:

Install Visual Studio 2022.
Make sure the following components are selected:
- Universal Windows Platform development
- C++ CMake tools for Windows
Download the MinGW installer from the MinGW website.
Run the installer and select the "gcc" component.

NVIDIA Driver's Issues:

Follow this page to install NVIDIA Drivers.

Star History

Disclaimer

This is a test project to validate the feasibility of a fully local solution for question answering using LLMs and Vector embeddings. It is not production ready, and it is not meant to be used in production. Vicuna-7B is based on the Llama model so that has the original Llama license.

Common Errors

Torch not compatible with CUDA enabled

Get CUDA version
```
nvcc --version
```
```
nvidia-smi
```

Try installing PyTorch depending on your CUDA version

   conda install -c pytorch torchvision cudatoolkit=10.1 pytorch

If it doesn't work, try reinstalling

   pip uninstall torch
   pip cache purge
   pip install torch -f https://download.pytorch.org/whl/torch_stable.html

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed
```
   pip install h5py
   pip install typing-extensions
   pip install wheel
```

Failed to import transformers

Try re-install

   conda uninstall tokenizers, transformers
   pip install transformers

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of private-gpt

Cons of private-gpt

Code Comparison

Pros of simpleaichat

Cons of simpleaichat

Code Comparison

Pros of semantic-kernel

Cons of semantic-kernel

Code Comparison

Pros of langchain

Cons of langchain

Code Comparison

Pros of chatgpt-retrieval-plugin

Cons of chatgpt-retrieval-plugin

Code Comparison

Pros of Chroma

Cons of Chroma

Code Comparison

Convert designs to code with AI

README

LocalGPT: Secure, Local Conversations with Your Documents ð

Features ð

Dive Deeper with Our Videos ð¥

Technical Details ð ï¸

Built Using ð§©

Environment Setup ð

Docker ð³

Test dataset

Ingesting your OWN Data.

Support file formats:

Ingest

Ask questions to your documents, locally!

Extra Options with run_localGPT.py

Run the Graphical User Interface

How to select different LLM models?

GPU and VRAM Requirements

System Requirements

Python Version

C++ Compiler

For Windows 10/11

NVIDIA Driver's Issues:

Star History

Disclaimer

Common Errors

Top Related Projects

Convert designs to code with AI

LocalGPT: Secure, Local Conversations with Your Documents ð

Features ð

Dive Deeper with Our Videos ð¥

Technical Details ð ï¸

Built Using ð§©

Environment Setup ð

Docker ð³