LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference

33,574

2,594

33,574

469

View on GitHub

Top Related Projects

text-generation-webui

43,368

A Gradio web UI for Large Language Models with support for multiple inference backends.

llama-cpp-python

9,009

Python bindings for llama.cpp

llama.cpp

78,890

LLM inference in C/C++

gpt4all

73,208

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

FastChat

38,431

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

stable-diffusion-webui

153,957

Stable Diffusion web UI

Quick Overview

LocalAI is an open-source project that provides a drop-in replacement for OpenAI API running locally on consumer-grade hardware. It supports various AI models and allows users to run language models, generate images, and transcribe audio without relying on cloud services.

Pros

Runs locally, ensuring privacy and reducing dependency on cloud services
Supports multiple AI models and tasks (language, image generation, audio transcription)
Compatible with OpenAI API, making it easy to integrate with existing applications
Customizable and extensible, allowing users to add their own models

Cons

May require significant computational resources for larger models
Performance might not match cloud-based solutions for some tasks
Limited to the models and capabilities implemented in the project
Requires some technical knowledge to set up and configure

Getting Started

Install Docker on your system

Pull the LocalAI Docker image:

docker pull quay.io/go-skynet/local-ai:latest

Run LocalAI with a basic configuration:

docker run -p 8080:8080 -v $PWD/models:/models quay.io/go-skynet/local-ai:latest

Test the API using curl:

curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
  "model": "gpt-3.5-turbo",
  "prompt": "Hello, how are you?",
  "max_tokens": 50
}'

Note: You'll need to download and place appropriate AI models in the models directory before using LocalAI. Refer to the project's documentation for specific model setup instructions.

Competitor Comparisons

text-generation-webui

43,368

A Gradio web UI for Large Language Models with support for multiple inference backends.

Pros of text-generation-webui

More user-friendly interface with a web-based UI for easier interaction
Supports a wider range of models and architectures
Offers more advanced features like character creation and chat modes

Cons of text-generation-webui

Requires more setup and configuration compared to LocalAI
May have higher resource requirements for running the web interface
Less focused on API-first approach, which can limit integration options

Code Comparison

text-generation-webui:

def generate_reply(
    question, state, stopping_strings=None, is_chat=False, for_ui=False
):
    # Complex generation logic with multiple parameters
    # ...

LocalAI:

func (l *LLM) Predict(ctx context.Context, text string, options ...PredictOption) (string, error) {
    // Simpler prediction function with context and options
    // ...
}

The code comparison shows that text-generation-webui has a more complex generation function with multiple parameters, while LocalAI offers a simpler prediction function with context and options. This reflects the different approaches of the two projects, with text-generation-webui providing more advanced features and LocalAI focusing on a streamlined API-first approach.

llama-cpp-python

9,009

Python bindings for llama.cpp

Pros of llama-cpp-python

Focused specifically on Python bindings for llama.cpp, making it easier to integrate into Python projects
Provides a more Pythonic interface for working with LLaMA models
Offers direct access to low-level LLaMA functionality

Cons of llama-cpp-python

Limited to LLaMA models, while LocalAI supports multiple model types
Lacks built-in API server functionality, requiring additional setup for web-based interactions
May require more manual configuration and code writing for advanced use cases

Code Comparison

llama-cpp-python:

from llama_cpp import Llama

llm = Llama(model_path="./models/7B/ggml-model.bin")
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

LocalAI:

import "github.com/mudler/LocalAI/pkg/model"

llm, _ := model.NewLLM("./models/7B/ggml-model.bin")
output, _ := llm.Predict("Q: Name the planets in the solar system? A: ", 32)
fmt.Println(output)

llama.cpp

78,890

LLM inference in C/C++

Pros of llama.cpp

Highly optimized C++ implementation for efficient inference on CPUs
Supports quantization techniques for reduced memory usage and faster inference
Provides a simple command-line interface for easy usage

Cons of llama.cpp

Limited to LLaMA-based models and doesn't support other architectures
Lacks built-in API or server functionality for integration into applications
Requires more technical knowledge to use and customize

Code Comparison

llama.cpp:

int main(int argc, char ** argv) {
    gpt_params params;
    if (gpt_params_parse(argc, argv, params) == false) {
        return 1;
    }
    llama_init_backend();
    ...
}

LocalAI:

func main() {
    app := &cli.App{
        Name:  "LocalAI",
        Usage: "Run LLMs locally",
        Flags: []cli.Flag{
            ...
        },
    }
    ...
}

LocalAI provides a more user-friendly approach with a CLI app structure, while llama.cpp focuses on low-level implementation and parameter parsing. LocalAI offers broader model support and easier integration, but llama.cpp excels in performance optimization for LLaMA models on CPUs.

gpt4all

73,208

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Pros of GPT4All

Offers a wider range of pre-trained models, including GPT-J and GPT-NeoX
Provides a user-friendly chat interface for easy interaction with models
Supports multiple languages and has a larger community contributing to its development

Cons of GPT4All

Generally requires more computational resources due to larger model sizes
Less flexible in terms of customization and fine-tuning options
May have slower inference times compared to LocalAI for similar tasks

Code Comparison

GPT4All:

from gpt4all import GPT4All
model = GPT4All("ggml-gpt4all-j-v1.3-groovy")
output = model.generate("Tell me a joke", max_tokens=100)

LocalAI:

import requests
response = requests.post("http://localhost:8080/v1/completions", 
    json={"prompt": "Tell me a joke", "max_tokens": 100})
output = response.json()["choices"][0]["text"]

Both projects aim to provide local AI capabilities, but GPT4All focuses on offering a range of pre-trained models with an easy-to-use interface, while LocalAI emphasizes flexibility and customization for various AI tasks. The choice between them depends on specific use cases and resource availability.

FastChat

38,431

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Pros of FastChat

More comprehensive suite of tools for training, serving, and evaluating LLMs
Better support for multi-GPU and distributed training
More active development and larger community support

Cons of FastChat

Higher resource requirements and complexity
Less focus on local, lightweight deployment
Steeper learning curve for beginners

Code Comparison

FastChat example (model serving):

from fastchat.serve.controller import Controller
from fastchat.serve.model_worker import ModelWorker
from fastchat.serve.openai_api_server import OpenAIAPIServer

controller = Controller()
worker = ModelWorker(controller.controller_addr, worker_addr)
api_server = OpenAIAPIServer(controller)

LocalAI example (model serving):

./local-ai --models-path ./models \
           --context-size 512 \
           --threads 4 \
           --api-key sk-xxx

FastChat offers a more programmatic approach with separate components for control, worker, and API serving. LocalAI provides a simpler command-line interface for quick deployment of local models. FastChat's structure allows for more flexibility and scalability, while LocalAI focuses on ease of use and lightweight deployment.

stable-diffusion-webui

153,957

Stable Diffusion web UI

Pros of stable-diffusion-webui

More comprehensive UI with advanced features for image generation and editing
Extensive support for various models and extensions
Active community with frequent updates and improvements

Cons of stable-diffusion-webui

Requires more computational resources and setup time
Limited to image generation tasks, not a general-purpose AI solution
May have a steeper learning curve for beginners

Code Comparison

stable-diffusion-webui:

import modules.scripts
from modules import images
from modules.processing import process_images, Processed
from modules.shared import opts, cmd_opts, state

LocalAI:

import (
    "github.com/mudler/LocalAI/api"
    "github.com/mudler/LocalAI/pkg/model"
    "github.com/mudler/LocalAI/pkg/grpc"
)

Summary

stable-diffusion-webui is a specialized tool for image generation with a rich UI and extensive features, while LocalAI is a more general-purpose AI solution that can be run locally. stable-diffusion-webui offers more advanced image manipulation capabilities but requires more resources, whereas LocalAI provides a broader range of AI functionalities with potentially easier setup and lower resource requirements.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

:bulb: Get help - âFAQ ðDiscussions :speech_balloon: Discord :book: Documentation website

ð» Quickstart ð¼ï¸ Models ð Roadmap ð¥½ Demo ð Explorer ð« Examples Try on

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by Ettore Di Giacinto.

ðð Local Stack Family

ð LocalAI is now part of a comprehensive suite of AI tools designed to work together:

LocalAGI

A powerful Local AI agent management platform that serves as a drop-in replacement for OpenAI's Responses API, enhanced with advanced agentic capabilities.

LocalRecall

A REST-ful API and knowledge base management system that provides persistent memory and storage capabilities for AI agents.

Screenshots

Talk Interface	Generate Audio

Models Overview	Generate Images

Chat Interface	Home

Login	Swarm

ð» Quickstart

Run the installer script:

# Basic installation
curl https://localai.io/install.sh | sh

For more installation options, see Installer Options.

Or run with docker:

CPU only image:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU Images:

# CUDA 12.0
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# CUDA 11.7
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-11

# NVIDIA Jetson (L4T) ARM64
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

AMD GPU Images (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU Images (oneAPI):

# Intel GPU with FP16 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f16

# Intel GPU with FP32 support
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-intel-f32

Vulkan GPU Images:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

AIO Images (pre-downloaded models):

# CPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu

# NVIDIA CUDA 12 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12

# NVIDIA CUDA 11 version
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-11

# Intel GPU version
docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel-f16

# AMD GPU version
docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas

For more information about the AIO images and pre-downloaded models, see Container Documentation.

To load models:

# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# Start LocalAI with the phi-2 model directly from huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# Install and run a model from the Ollama OCI registry
local-ai run ollama://gemma:2b
# Run a model from a configuration file
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# Install and run a model from a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

For more information, see ð» Getting started

ð° Latest project news

June 2025: Backend management has been added. Attention: extras images are going to be deprecated from the next release! Read the backend management PR.
May 2025: Audio input and Reranking in llama.cpp backend, Realtime API, Support to Gemma, SmollVLM, and more multimodal models (available in the gallery).
May 2025: Important: image name changes See release
Apr 2025: Rebrand, WebUI enhancements
Apr 2025: LocalAGI and LocalRecall join the LocalAI family stack.
Apr 2025: WebUI overhaul, AIO images updates
Feb 2025: Backend cleanup, Breaking changes, new backends (kokoro, OutelTTS, faster-whisper), Nvidia L4T images
Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603
Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )
Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )
Nov 2024: Voice activity detection models (VAD) added to the API: https://github.com/mudler/LocalAI/pull/4204
Oct 2024: examples moved to LocalAI-examples
Aug 2024: ð FLUX-1, P2P Explorer
July 2024: ð¥ð¥ ð P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723. P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113
May 2024: ð¥ð¥ Decentralized P2P llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) ð Docs https://localai.io/features/distribute/
May 2024: ð¥ð¥ Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324
April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121

Roadmap items: List of issues

ð Features

ð Text generation with GPTs (llama.cpp, transformers, vllm ... :book: and more)
ð£ Text to Audio
ð Audio to Text (Audio transcription with whisper.cpp)
ð¨ Image generation
ð¥ OpenAI-alike tools API
ð§ Embeddings generation for vector databases
âï¸ Constrained grammars
ð¼ï¸ Download Models directly from Huggingface
ð¥½ Vision API
ð Reranker API
ðð§ P2P Inferencing
Agentic capabilities
ð Voice activity detection (Silero-VAD support)
ð Integrated WebUI!

ð Community and integrations

Build and deploy custom containers:

https://github.com/sozercan/aikit

WebUIs:

https://github.com/Jirubizu/localai-admin
https://github.com/go-skynet/LocalAI-frontend
QA-Pilot(An interactive chat project that leverages LocalAI LLMs for rapid understanding and navigation of GitHub code repository) https://github.com/reid41/QA-Pilot

Model galleries

https://github.com/go-skynet/model-gallery

Other:

Helm chart https://github.com/go-skynet/helm-charts
VSCode extension https://github.com/badgooooor/localai-vscode-plugin
Langchain: https://python.langchain.com/docs/integrations/providers/localai/
Terminal utility https://github.com/djcopley/ShellOracle
Local Smart assistant https://github.com/mudler/LocalAGI
Home Assistant https://github.com/sammcj/homeassistant-localai / https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-gpt4vision
Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord
Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack
Shell-Pilot(Interact with LLM using LocalAI models via pure shell scripts on your Linux or MacOS system) https://github.com/reid41/shell-pilot
Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot
Another Telegram Bot https://github.com/JackBekket/Hellper
Auto-documentation https://github.com/JackBekket/Reflexia
Github bot which answer on issues, with code and documentation as context https://github.com/JackBekket/GitHelper
Github Actions: https://github.com/marketplace/actions/start-localai
Examples: https://github.com/mudler/LocalAI/tree/master/examples/

ð Resources

LLM finetuning guide
How to build locally
How to install in Kubernetes
Projects integrating LocalAI
How tos section (curated by our community)

:book: ð¥ Media, Blogs, Social

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

â¤ï¸ Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:

ð Star history

ð License

LocalAI is a community-driven project created by Ettore Di Giacinto.

MIT - Author Ettore Di Giacinto mudler@localai.io

ð Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

ð¤ Contributors

This is a community project, a special thanks to our contributors! ð¤

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of text-generation-webui

Cons of text-generation-webui

Code Comparison

Pros of llama-cpp-python

Cons of llama-cpp-python

Code Comparison

Pros of llama.cpp

Cons of llama.cpp

Code Comparison

Pros of GPT4All

Cons of GPT4All

Code Comparison

Pros of FastChat

Cons of FastChat

Code Comparison

Pros of stable-diffusion-webui

Cons of stable-diffusion-webui

Code Comparison

Summary

Convert designs to code with AI

README

ðð Local Stack Family

Screenshots

ð» Quickstart

CPU only image:

NVIDIA GPU Images:

AMD GPU Images (ROCm):

Intel GPU Images (oneAPI):

Vulkan GPU Images:

AIO Images (pre-downloaded models):

ð° Latest project news

ð Features

ð Community and integrations

ð Resources

:book: ð¥ Media, Blogs, Social

Citation

â¤ï¸ Sponsors

ð Star history

ð License

ð Acknowledgements

ð¤ Contributors

Top Related Projects

Convert designs to code with AI

ðð Local Stack Family

ð» Quickstart

ð° Latest project news

ð Features

ð Community and integrations

ð Resources

:book: ð¥ Media, Blogs, Social

â¤ï¸ Sponsors

ð Star history

ð License

ð Acknowledgements

ð¤ Contributors