LocalAI
:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
Top Related Projects
A Gradio web UI for Large Language Models.
Python bindings for llama.cpp
LLM inference in C/C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Stable Diffusion web UI
Quick Overview
LocalAI is an open-source project that provides a drop-in replacement for OpenAI API running locally on consumer-grade hardware. It supports various AI models and allows users to run language models, generate images, and transcribe audio without relying on cloud services.
Pros
- Runs locally, ensuring privacy and reducing dependency on cloud services
- Supports multiple AI models and tasks (language, image generation, audio transcription)
- Compatible with OpenAI API, making it easy to integrate with existing applications
- Customizable and extensible, allowing users to add their own models
Cons
- May require significant computational resources for larger models
- Performance might not match cloud-based solutions for some tasks
- Limited to the models and capabilities implemented in the project
- Requires some technical knowledge to set up and configure
Getting Started
- Install Docker on your system
- Pull the LocalAI Docker image:
docker pull quay.io/go-skynet/local-ai:latest
- Run LocalAI with a basic configuration:
docker run -p 8080:8080 -v $PWD/models:/models quay.io/go-skynet/local-ai:latest
- Test the API using curl:
curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{ "model": "gpt-3.5-turbo", "prompt": "Hello, how are you?", "max_tokens": 50 }'
Note: You'll need to download and place appropriate AI models in the models
directory before using LocalAI. Refer to the project's documentation for specific model setup instructions.
Competitor Comparisons
A Gradio web UI for Large Language Models.
Pros of text-generation-webui
- More user-friendly interface with a web-based UI for easier interaction
- Supports a wider range of models and architectures
- Offers more advanced features like character creation and chat modes
Cons of text-generation-webui
- Requires more setup and configuration compared to LocalAI
- May have higher resource requirements for running the web interface
- Less focused on API-first approach, which can limit integration options
Code Comparison
text-generation-webui:
def generate_reply(
question, state, stopping_strings=None, is_chat=False, for_ui=False
):
# Complex generation logic with multiple parameters
# ...
LocalAI:
func (l *LLM) Predict(ctx context.Context, text string, options ...PredictOption) (string, error) {
// Simpler prediction function with context and options
// ...
}
The code comparison shows that text-generation-webui has a more complex generation function with multiple parameters, while LocalAI offers a simpler prediction function with context and options. This reflects the different approaches of the two projects, with text-generation-webui providing more advanced features and LocalAI focusing on a streamlined API-first approach.
Python bindings for llama.cpp
Pros of llama-cpp-python
- Focused specifically on Python bindings for llama.cpp, making it easier to integrate into Python projects
- Provides a more Pythonic interface for working with LLaMA models
- Offers direct access to low-level LLaMA functionality
Cons of llama-cpp-python
- Limited to LLaMA models, while LocalAI supports multiple model types
- Lacks built-in API server functionality, requiring additional setup for web-based interactions
- May require more manual configuration and code writing for advanced use cases
Code Comparison
llama-cpp-python:
from llama_cpp import Llama
llm = Llama(model_path="./models/7B/ggml-model.bin")
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)
LocalAI:
import "github.com/mudler/LocalAI/pkg/model"
llm, _ := model.NewLLM("./models/7B/ggml-model.bin")
output, _ := llm.Predict("Q: Name the planets in the solar system? A: ", 32)
fmt.Println(output)
LLM inference in C/C++
Pros of llama.cpp
- Highly optimized C++ implementation for efficient inference on consumer hardware
- Supports quantization techniques for reduced memory usage and faster inference
- Provides a simple command-line interface for easy interaction with the model
Cons of llama.cpp
- Limited to LLaMA-based models, lacking support for other architectures
- Requires manual setup and configuration for different use cases
- Less user-friendly for those unfamiliar with command-line tools
Code Comparison
LocalAI:
func (l *LLM) Predict(ctx context.Context, input string, opts ...PredictOption) (string, error) {
// Implementation details
}
llama.cpp:
int llama_eval(
struct llama_context * ctx,
const llama_token * tokens,
int n_tokens,
int n_past,
int n_threads
);
LocalAI provides a higher-level abstraction with a more user-friendly API, while llama.cpp offers lower-level control for advanced users and developers. LocalAI supports multiple model architectures and provides additional features like API compatibility with OpenAI, making it more versatile for various applications. However, llama.cpp's specialized focus on LLaMA models allows for highly optimized performance on supported hardware.
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Pros of GPT4All
- Offers a wider range of pre-trained models, including GPT-J and GPT-NeoX
- Provides a user-friendly chat interface for easy interaction with models
- Supports multiple languages and has a larger community contributing to its development
Cons of GPT4All
- Generally requires more computational resources due to larger model sizes
- Less flexible in terms of customization and fine-tuning options
- May have slower inference times compared to LocalAI for similar tasks
Code Comparison
GPT4All:
from gpt4all import GPT4All
model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
output = model.generate("Tell me a joke", max_tokens=100)
LocalAI:
import requests
response = requests.post("http://localhost:8080/v1/completions",
json={"prompt": "Tell me a joke", "max_tokens": 100})
output = response.json()["choices"][0]["text"]
Both projects aim to provide local AI capabilities, but GPT4All focuses on offering a range of pre-trained models with an easy-to-use interface, while LocalAI emphasizes flexibility and customization for various AI tasks. The choice between them depends on specific use cases and resource availability.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Pros of FastChat
- More comprehensive suite of tools for training, serving, and evaluating LLMs
- Better support for multi-GPU and distributed training
- More active development and larger community support
Cons of FastChat
- Higher resource requirements and complexity
- Less focus on local, lightweight deployment
- Steeper learning curve for beginners
Code Comparison
FastChat example (model serving):
from fastchat.serve.controller import Controller
from fastchat.serve.model_worker import ModelWorker
from fastchat.serve.openai_api_server import OpenAIAPIServer
controller = Controller()
worker = ModelWorker(controller.controller_addr, worker_addr)
api_server = OpenAIAPIServer(controller)
LocalAI example (model serving):
./local-ai --models-path ./models \
--context-size 512 \
--threads 4 \
--api-key sk-xxx
FastChat offers a more programmatic approach with separate components for control, worker, and API serving. LocalAI provides a simpler command-line interface for quick deployment of local models. FastChat's structure allows for more flexibility and scalability, while LocalAI focuses on ease of use and lightweight deployment.
Stable Diffusion web UI
Pros of stable-diffusion-webui
- More comprehensive UI with advanced features for image generation and editing
- Extensive support for various models and extensions
- Active community with frequent updates and improvements
Cons of stable-diffusion-webui
- Requires more computational resources and setup time
- Limited to image generation tasks, not a general-purpose AI solution
- May have a steeper learning curve for beginners
Code Comparison
stable-diffusion-webui:
import modules.scripts
from modules import images
from modules.processing import process_images, Processed
from modules.shared import opts, cmd_opts, state
LocalAI:
import (
"github.com/mudler/LocalAI/api"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/grpc"
)
Summary
stable-diffusion-webui is a specialized tool for image generation with a rich UI and extensive features, while LocalAI is a more general-purpose AI solution that can be run locally. stable-diffusion-webui offers more advanced image manipulation capabilities but requires more resources, whereas LocalAI provides a broader range of AI functionalities with potentially easier setup and lower resource requirements.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
LocalAI
:bulb: Get help - âFAQ ðDiscussions :speech_balloon: Discord :book: Documentation website
ð» Quickstart ð¼ï¸ Models ð Roadmap 𥽠Demo ð Explorer ð« Examples
LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API thatâs compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by Ettore Di Giacinto.
Run the installer script:
curl https://localai.io/install.sh | sh
Or run with docker:
# CPU only image:
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-cpu
# Nvidia GPU:
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12
# CPU and GPU image (bigger size):
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest
# AIO images (it will pre-download a set of models ready for use, see https://localai.io/basics/container/)
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
To load models:
# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# Start LocalAI with the phi-2 model directly from huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# Install and run a model from the Ollama OCI registry
local-ai run ollama://gemma:2b
# Run a model from a configuration file
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# Install and run a model from a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest
ð° Latest project news
- Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
- Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
- Nov 2024: Voice activity detection models (VAD) added to the API: https://github.com/mudler/LocalAI/pull/4204
- Oct 2024: examples moved to LocalAI-examples
- Aug 2024: ð FLUX-1, P2P Explorer
- July 2024: ð¥ð¥ ð P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723
- June 2024: ð You can browse now the model gallery without LocalAI! Check out https://models.localai.io
- June 2024: Support for models from OCI registries: https://github.com/mudler/LocalAI/pull/2628
- May 2024: ð¥ð¥ Decentralized P2P llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) ð Docs https://localai.io/features/distribute/
- May 2024: ð¥ð¥ Openvoice: https://github.com/mudler/LocalAI/pull/2334
- May 2024: ð Function calls without grammars and mixed mode: https://github.com/mudler/LocalAI/pull/2328
- May 2024: ð¥ð¥ Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
- May 2024: Chat, TTS, and Image generation in the WebUI: https://github.com/mudler/LocalAI/pull/2222
- April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121
Roadmap items: List of issues
ð¥ð¥ Hot topics (looking for help):
- Multimodal with vLLM and Video understanding: https://github.com/mudler/LocalAI/pull/3729
- Realtime API https://github.com/mudler/LocalAI/issues/3714
- ð¥ð¥ Distributed, P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
- WebUI improvements: https://github.com/mudler/LocalAI/issues/2156
- Backends v2: https://github.com/mudler/LocalAI/issues/1126
- Improving UX v2: https://github.com/mudler/LocalAI/issues/1373
- Assistant API: https://github.com/mudler/LocalAI/issues/1273
- Moderation endpoint: https://github.com/mudler/LocalAI/issues/999
- Vulkan: https://github.com/mudler/LocalAI/issues/1647
- Anthropic API: https://github.com/mudler/LocalAI/issues/1808
If you want to help and contribute, issues up for grabs: https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3A%22up+for+grabs%22
ð Features
- ð Text generation with GPTs (
llama.cpp
,gpt4all.cpp
, ... :book: and more) - ð£ Text to Audio
- ð Audio to Text (Audio transcription with
whisper.cpp
) - ð¨ Image generation with stable diffusion
- ð¥ OpenAI-alike tools API
- ð§ Embeddings generation for vector databases
- âï¸ Constrained grammars
- ð¼ï¸ Download Models directly from Huggingface
- 𥽠Vision API
- ð Reranker API
- ðð§ P2P Inferencing
- ð Integrated WebUI!
ð» Usage
Check out the Getting started section in our documentation.
ð Community and integrations
Build and deploy custom containers:
WebUIs:
- https://github.com/Jirubizu/localai-admin
- https://github.com/go-skynet/LocalAI-frontend
- QA-Pilot(An interactive chat project that leverages LocalAI LLMs for rapid understanding and navigation of GitHub code repository) https://github.com/reid41/QA-Pilot
Model galleries
Other:
- Helm chart https://github.com/go-skynet/helm-charts
- VSCode extension https://github.com/badgooooor/localai-vscode-plugin
- Terminal utility https://github.com/djcopley/ShellOracle
- Local Smart assistant https://github.com/mudler/LocalAGI
- Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-gpt4vision
- Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
- Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
- Shell-Pilot(Interact with LLM using LocalAI models via pure shell scripts on your Linux or MacOS system) https://github.com/reid41/shell-pilot
- Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
- Another Telegram Bot https://github.com/JackBekket/Hellper
- Auto-documentation https://github.com/JackBekket/Reflexia
- Github bot which answer on issues, with code and documentation as context https://github.com/JackBekket/GitHelper
- Github Actions: https://github.com/marketplace/actions/start-localai
- Examples: https://github.com/mudler/LocalAI/tree/master/examples/
ð Resources
- LLM finetuning guide
- How to build locally
- How to install in Kubernetes
- Projects integrating LocalAI
- How tos section (curated by our community)
:book: ð¥ Media, Blogs, Social
- Run Visual studio code with LocalAI (SUSE)
- ð Run LocalAI on Jetson Nano Devkit
- Run LocalAI on AWS EKS with Pulumi
- Run LocalAI on AWS
- Create a slackbot for teams and OSS projects that answer to documentation
- LocalAI meets k8sgpt
- Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All
- Tutorial to use k8sgpt with LocalAI
Citation
If you utilize this repository, data in a downstream project, please consider citing it with:
@misc{localai,
author = {Ettore Di Giacinto},
title = {LocalAI: The free, Open source OpenAI alternative},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/go-skynet/LocalAI}},
â¤ï¸ Sponsors
Do you find LocalAI useful?
Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.
A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:
ð Star history
ð License
LocalAI is a community-driven project created by Ettore Di Giacinto.
MIT - Author Ettore Di Giacinto mudler@localai.io
ð Acknowledgements
LocalAI couldn't have been built without the help of great software already available from the community. Thank you!
- llama.cpp
- https://github.com/tatsu-lab/stanford_alpaca
- https://github.com/cornelk/llama-go for the initial ideas
- https://github.com/antimatter15/alpaca.cpp
- https://github.com/EdVince/Stable-Diffusion-NCNN
- https://github.com/ggerganov/whisper.cpp
- https://github.com/rhasspy/piper
ð¤ Contributors
This is a community project, a special thanks to our contributors! ð¤
Top Related Projects
A Gradio web UI for Large Language Models.
Python bindings for llama.cpp
LLM inference in C/C++
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Stable Diffusion web UI
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot