gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

73,208

7,976

73,208

710

View on GitHub

Top Related Projects

llama2.c

18,317

Inference Llama 2 in one file of pure C

whisper.cpp

41,097

Port of OpenAI's Whisper model in C/C++

mlc-llm

20,493

Universal LLM Deployment Engine with ML Compilation

text-generation-webui

44,456

LLM UI with advanced features, easy setup, and multiple backend support.

Quick Overview

GPT4All is an open-source ecosystem of large language models (LLMs) that can run locally on consumer-grade hardware. It aims to democratize AI by providing powerful language models that can be run without the need for expensive cloud infrastructure or specialized hardware.

Pros

Runs locally on consumer hardware, ensuring privacy and reducing costs
Supports multiple platforms (Windows, macOS, Linux)
Offers a variety of pre-trained models with different capabilities
Provides both command-line and GUI interfaces for ease of use

Cons

Performance may be slower compared to cloud-based solutions
Limited by the capabilities of local hardware
May require significant disk space for storing models
Some advanced features of larger models may not be available in local versions

Code Examples

Basic usage of GPT4All in Python:

from gpt4all import GPT4All

model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
output = model.generate("The capital of France is", max_tokens=3)
print(output)

Using GPT4All with a custom prompt:

from gpt4all import GPT4All

model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
prompt = "Translate the following English text to French: 'Hello, how are you?'"
output = model.generate(prompt, max_tokens=20)
print(output)

Streaming output from GPT4All:

from gpt4all import GPT4All

model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
prompt = "Write a short story about a robot learning to paint:"

for token in model.generate(prompt, max_tokens=200, streaming=True):
    print(token, end='', flush=True)

Getting Started

To get started with GPT4All, follow these steps:

Install the library:

pip install gpt4all

Download a model (e.g., ggml-gpt4all-j-v1.3-groovy) from the GPT4All website.
Use the following code to initialize and generate text:

from gpt4all import GPT4All

model = GPT4All("path/to/your/model.bin")
output = model.generate("Your prompt here", max_tokens=50)
print(output)

Replace "path/to/your/model.bin" with the actual path to your downloaded model file.

Competitor Comparisons

llama.cpp

78,890

LLM inference in C/C++

Pros of llama.cpp

Highly optimized for performance, especially on CPU
Supports a wider range of LLMs beyond just GPT-J
More flexible and customizable for advanced users

Cons of llama.cpp

Steeper learning curve for beginners
Requires more manual setup and configuration
Less focus on out-of-the-box chatbot functionality

Code Comparison

gpt4all:

from gpt4all import GPT4All

model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
output = model.generate("Once upon a time", max_tokens=50)
print(output)

llama.cpp:

#include "llama.h"

llama_context * ctx = llama_init_from_file("model.bin", params);
llama_eval(ctx, tokens.data(), tokens.size(), n_past, n_threads);
llama_print_timings(ctx);
llama_free(ctx);

Both projects aim to bring large language models to local devices, but they differ in their approach and target audience. gpt4all focuses on providing an easy-to-use interface for running GPT-J models, while llama.cpp offers a more low-level, performance-oriented solution for a variety of LLMs. The code examples highlight the simplicity of gpt4all's Python API compared to the more complex C++ implementation in llama.cpp, which provides greater control but requires more expertise to use effectively.

llama2.c

18,317

Inference Llama 2 in one file of pure C

Pros of llama2.c

Lightweight and minimalistic implementation
Focused on single-file C code for simplicity
Easier to understand and modify for educational purposes

Cons of llama2.c

Limited features compared to GPT4All
Less support for different model architectures
Fewer pre-trained models available out-of-the-box

Code Comparison

llama2.c:

float* forward(Transformer* transformer, int token, int pos) {
    float* x = transformer->token_embedding_table + token * transformer->dim;
    for (int l = 0; l < transformer->n_layers; l++) {
        // ... (attention and feedforward operations)
    }
    return x;
}

GPT4All:

void LLModel::prompt(const std::string &prompt, std::function<bool(int32_t)> promptCallback,
                     std::function<bool(int32_t, const std::string&)> responseCallback,
                     std::function<bool(bool)> recalculateCallback,
                     PromptContext &promptCtx) {
    // ... (tokenization and generation logic)
}

The code snippets highlight the difference in complexity and abstraction level between the two projects. llama2.c focuses on a simple, low-level implementation, while GPT4All provides a more feature-rich and abstracted interface for language model interactions.

whisper.cpp

41,097

Port of OpenAI's Whisper model in C/C++

Pros of whisper.cpp

Focused on speech recognition, providing a specialized solution for audio-to-text conversion
Lightweight and efficient implementation in C++, suitable for embedded systems and low-resource environments
Supports multiple languages and can be used for various audio processing tasks

Cons of whisper.cpp

Limited to speech recognition functionality, lacking the broader language modeling capabilities of GPT4All
May require more manual setup and configuration compared to GPT4All's more user-friendly interface
Smaller community and fewer pre-trained models available compared to GPT4All

Code Comparison

whisper.cpp:

#include "whisper.h"

int main(int argc, char ** argv) {
    struct whisper_context * ctx = whisper_init_from_file("model.bin");
    whisper_full_default(ctx, wparams, pcmf32.data(), pcmf32.size());
    whisper_print_timings(ctx);
    whisper_free(ctx);
}

GPT4All:

from gpt4all import GPT4All

model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
output = model.generate("Once upon a time", max_tokens=50)
print(output)

The code snippets demonstrate the different focus areas of the two projects, with whisper.cpp handling audio processing and GPT4All providing text generation capabilities.

llama-cpp-python

9,009

Python bindings for llama.cpp

Pros of llama-cpp-python

Focused on providing Python bindings for the llama.cpp library, offering a more specialized and potentially efficient implementation
Supports GPU acceleration out of the box, which can significantly improve performance
Provides a simpler API, making it easier to integrate into existing Python projects

Cons of llama-cpp-python

Limited to LLaMA-based models, whereas gpt4all supports a wider range of models
Less extensive documentation and community support compared to gpt4all
Fewer built-in features and tools for model fine-tuning and customization

Code Comparison

llama-cpp-python:

from llama_cpp import Llama

llm = Llama(model_path="./models/7B/ggml-model.bin")
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output["choices"][0]["text"])

gpt4all:

from gpt4all import GPT4All

model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
output = model.generate("Name the planets in the solar system.", max_tokens=32)
print(output)

mlc-llm

20,493

Universal LLM Deployment Engine with ML Compilation

Pros of mlc-llm

Focuses on efficient deployment of large language models across various hardware platforms
Provides a unified framework for optimizing LLMs on different devices (CPUs, GPUs, mobile)
Supports multiple model architectures and quantization techniques

Cons of mlc-llm

May have a steeper learning curve due to its focus on low-level optimizations
Less emphasis on providing a ready-to-use chatbot interface compared to gpt4all
Requires more technical knowledge to implement and customize

Code Comparison

mlc-llm:

import mlc_llm
import tvm

model = mlc_llm.load_model("llama-7b")
output = model.generate("Hello, how are you?")
print(output)

gpt4all:

from gpt4all import GPT4All

model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
output = model.generate("Hello, how are you?")
print(output)

Both repositories aim to make large language models more accessible, but they approach this goal differently. mlc-llm focuses on optimizing LLMs for various hardware platforms, while gpt4all provides a more user-friendly interface for running chatbots locally. The code comparison shows that mlc-llm requires more setup and configuration, while gpt4all offers a simpler API for generating text.

text-generation-webui

44,456

LLM UI with advanced features, easy setup, and multiple backend support.

Pros of text-generation-webui

More extensive model support, including various architectures and quantization methods
Rich web-based interface with multiple chat modes and extensions
Active development and community contributions

Cons of text-generation-webui

Higher system requirements and more complex setup process
Steeper learning curve for beginners
Less focus on mobile and edge device deployment

Code comparison

text-generation-webui:

def generate_reply(
    question, state, stopping_strings=None, is_chat=False, escape_html=False
):
    # Complex generation logic with multiple parameters and options
    # ...

gpt4all:

def generate(self, prompt, max_tokens=200, temp=0.7):
    # Simpler generation function with fewer parameters
    # ...

The code comparison shows that text-generation-webui offers more advanced and customizable generation options, while gpt4all provides a simpler, more straightforward approach. This reflects the overall design philosophy of each project, with text-generation-webui catering to power users and gpt4all focusing on ease of use and accessibility.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GPT4All

Now with support for DeepSeek R1 Distillations

Website • Documentation • Discord • YouTube Tutorial

GPT4All runs large language models (LLMs) privately on everyday desktops & laptops.

No API calls or GPUs required - you can just download the application and get started.

Read about what's new in our blog.

Subscribe to the newsletter

https://github.com/nomic-ai/gpt4all/assets/70534565/513a0f15-4964-4109-89e4-4f9a9011f311

GPT4All is made possible by our compute partner Paperspace.

Download Links

— Windows Installer —

— Windows ARM Installer —

— macOS Installer —

— Ubuntu Installer —

The Windows and Linux builds require Intel Core i3 2nd Gen / AMD Bulldozer, or better.

The Windows ARM build supports Qualcomm Snapdragon and Microsoft SQ1/SQ2 processors.

The Linux build is x86-64 only (no ARM).

The macOS build requires Monterey 12.6 or newer. Best results with Apple Silicon M-series processors.

See the full System Requirements for more details.

Flathub (community maintained)

Install GPT4All Python

gpt4all gives you access to LLMs with our Python client around llama.cpp implementations.

Nomic contributes to open source software like llama.cpp to make LLMs accessible and efficient for all.

pip install gpt4all

from gpt4all import GPT4All
model = GPT4All("Meta-Llama-3-8B-Instruct.Q4_0.gguf") # downloads / loads a 4.66GB LLM
with model.chat_session():
    print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))

Integrations

:parrot::link: Langchain :card_file_box: Weaviate Vector Database - module docs :telescope: OpenLIT (OTel-native Monitoring) - Docs

Release History

July 2nd, 2024: V3.0.0 Release
- Fresh redesign of the chat application UI
- Improved user workflow for LocalDocs
- Expanded access to more model architectures
October 19th, 2023: GGUF Support Launches with Support for:
- Mistral 7b base model, an updated model gallery on our website, several new local code models including Rift Coder v1.5
- Nomic Vulkan support for Q4_0 and Q4_1 quantizations in GGUF.
- Offline build support for running old versions of the GPT4All Local LLM Chat Client.
September 18th, 2023: Nomic Vulkan launches supporting local LLM inference on NVIDIA and AMD GPUs.
July 2023: Stable support for LocalDocs, a feature that allows you to privately and locally chat with your data.
June 28th, 2023: Docker-based API server launches allowing inference of local LLMs from an OpenAI-compatible HTTP endpoint.

Contributing

GPT4All welcomes contributions, involvement, and discussion from the open source community! Please see CONTRIBUTING.md and follow the issues, bug reports, and PR markdown templates.

Check project discord, with project owners, or through existing issues/PRs to avoid duplicate work. Please make sure to tag all of the above with relevant project identifiers or your contribution could potentially get lost. Example tags: backend, bindings, python-bindings, documentation, etc.

Citation

If you utilize this repository, models or data in a downstream project, please consider citing it with:

@misc{gpt4all,
  author = {Yuvanesh Anand and Zach Nussbaum and Brandon Duderstadt and Benjamin Schmidt and Andriy Mulyar},
  title = {GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nomic-ai/gpt4all}},
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot