serge
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
Top Related Projects
A Gradio web UI for Large Language Models.
Stable Diffusion web UI
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
The official gpt4free repository | various collection of powerful language models
Interact with your documents using the power of GPT, 100% privately, no data leaks
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Quick Overview
Serge is an open-source, local chat assistant that can be run on your own hardware. It provides a ChatGPT-like experience without relying on external APIs, ensuring privacy and control over your data. Serge supports various language models and offers a user-friendly web interface.
Pros
- Privacy-focused: Runs locally, ensuring data stays on your own hardware
- Customizable: Supports multiple language models and can be tailored to specific needs
- Cost-effective: No subscription fees or API costs
- User-friendly: Offers a clean web interface for easy interaction
Cons
- Resource-intensive: Requires significant computational power to run large language models
- Limited compared to cloud-based alternatives: May not have access to the latest models or features
- Setup complexity: Requires some technical knowledge to install and configure
- Potential for lower performance: Depending on hardware, may not match the speed of cloud-based solutions
Getting Started
-
Clone the repository:
git clone https://github.com/serge-chat/serge.git cd serge
-
Install dependencies:
pip install -r requirements.txt
-
Download a language model (e.g., GPT-J 6B):
python3 download-model.py GPT-J-6B
-
Start the Serge server:
python3 server.py
-
Access the web interface at
http://localhost:8008
in your browser.
Competitor Comparisons
A Gradio web UI for Large Language Models.
Pros of text-generation-webui
- More extensive model support, including popular models like GPT-J, LLaMA, and OPT
- Advanced features such as character creation, chat modes, and instruct mode
- Highly customizable interface with various extensions and plugins
Cons of text-generation-webui
- Steeper learning curve due to more complex setup and configuration options
- Higher system requirements, especially for running larger language models
- May be overwhelming for users seeking a simple, out-of-the-box chat experience
Code Comparison
text-generation-webui:
def generate_reply(
question, chatbot, state, stopping_strings=None, is_chat=False, **kwargs
):
# Complex generation logic with multiple parameters and options
serge:
def generate_response(self, prompt: str) -> str:
# Simpler generation function focused on basic chat functionality
return self.model.generate(prompt)
The code comparison highlights the difference in complexity between the two projects. text-generation-webui offers more advanced features and customization options, while serge focuses on providing a straightforward chat experience with simpler code structure.
Stable Diffusion web UI
Pros of stable-diffusion-webui
- More extensive features for image generation and manipulation
- Larger community and more frequent updates
- Better documentation and user guides
Cons of stable-diffusion-webui
- Higher system requirements and more complex setup
- Steeper learning curve for new users
- Less focused on chat-based interactions
Code Comparison
stable-diffusion-webui:
def create_infotext(p, all_prompts, all_seeds, all_subseeds, comments=None, iteration=0, position_in_batch=0):
index = position_in_batch + iteration * p.batch_size
clip_skip = getattr(p, 'clip_skip', opts.CLIP_stop_at_last_layers)
token_merging_ratio = getattr(p, 'token_merging_ratio', 0)
token_merging_ratio_hr = getattr(p, 'token_merging_ratio_hr', 0)
serge:
def get_model_path(model_name: str) -> str:
model_path = os.path.join(MODELS_PATH, model_name)
if not os.path.exists(model_path):
raise ValueError(f"Model {model_name} not found in {MODELS_PATH}")
return model_path
The code snippets show different focuses: stable-diffusion-webui deals with image generation parameters, while serge handles model path management for chat-based interactions.
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Pros of FastChat
- More comprehensive and feature-rich, offering a wider range of functionalities
- Better documentation and examples, making it easier for developers to integrate and use
- Actively maintained with frequent updates and improvements
Cons of FastChat
- More complex setup and configuration process
- Higher system requirements due to its extensive features
- Steeper learning curve for beginners
Code Comparison
Serge (Python):
from serge import Serge
chat = Serge()
response = chat.chat("Hello, how are you?")
print(response)
FastChat (Python):
from fastchat.model import load_model, get_conversation_template
from fastchat.serve.inference import chat_loop
model, tokenizer = load_model("vicuna-7b")
conv = get_conversation_template("vicuna")
chat_loop(model, tokenizer, conv)
FastChat offers more flexibility and control over the conversation flow, while Serge provides a simpler, more straightforward interface for basic chatbot functionality. FastChat's code demonstrates its ability to load custom models and conversation templates, making it more versatile for advanced use cases.
The official gpt4free repository | various collection of powerful language models
Pros of gpt4free
- Offers access to multiple AI models and providers
- Includes a web interface for easy interaction
- Provides more frequent updates and active development
Cons of gpt4free
- Less focus on privacy and self-hosting
- May have potential legal and ethical concerns
- Lacks some advanced features present in Serge
Code Comparison
gpt4free:
from g4f import ChatCompletion
response = ChatCompletion.create(model='gpt-3.5-turbo', messages=[
{'role': 'user', 'content': 'Hello, how are you?'}
])
print(response)
Serge:
from serge import ChatBot
bot = ChatBot()
response = bot.chat("Hello, how are you?")
print(response)
Summary
gpt4free offers a wider range of AI models and providers, along with a web interface, making it more versatile for users seeking various AI interactions. However, Serge focuses more on privacy and self-hosting, which may be preferable for users concerned about data security. gpt4free's code appears more complex, allowing for model selection, while Serge's implementation is simpler and more straightforward. Both projects have their strengths, and the choice between them depends on the user's specific needs and priorities.
Interact with your documents using the power of GPT, 100% privately, no data leaks
Pros of private-gpt
- Focuses on privacy and local data processing
- Supports multiple document types (PDF, TXT, etc.)
- Utilizes LangChain for improved language model interactions
Cons of private-gpt
- Less emphasis on multi-user collaboration
- May require more setup and configuration
- Potentially higher resource requirements for local processing
Code Comparison
Serge (Python):
@app.route('/api/chat', methods=['POST'])
def chat():
data = request.json
conversation_id = data.get('conversation_id')
message = data.get('message')
# ... (processing logic)
return jsonify(response)
private-gpt (Python):
@app.route("/chat", methods=["POST"])
def chat_endpoint():
request_data = request.json
question = request_data["question"]
history = request_data.get("history", [])
# ... (processing logic)
return jsonify({"answer": answer, "history": updated_history})
Both projects use Flask for API endpoints, but private-gpt focuses on question-answering with history, while Serge emphasizes conversation management.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Highly optimized for large-scale distributed training of deep learning models
- Supports a wide range of AI models and architectures
- Extensive documentation and active community support
Cons of DeepSpeed
- Steeper learning curve for beginners
- Primarily focused on training, less emphasis on inference
- Requires more setup and configuration for optimal performance
Code Comparison
DeepSpeed:
import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
model=model,
model_parameters=params)
Serge:
from serge import Serge
serge = Serge()
response = serge.chat("Hello, how are you?")
Summary
DeepSpeed is a powerful library for optimizing large-scale AI model training, offering advanced features and broad compatibility. However, it may be more complex for beginners and requires more setup. Serge, on the other hand, appears to be a simpler chat-based interface, potentially easier to use but with fewer advanced optimization features. The choice between them depends on the specific use case and level of expertise required.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Serge - LLaMA made easy ð¦
Serge is a chat interface crafted with llama.cpp for running GGUF models. No API keys, entirely self-hosted!
- ð SvelteKit frontend
- ð¾ Redis for storing chat history & parameters
- âï¸ FastAPI + LangChain for the API, wrapping calls to llama.cpp using the python bindings
ð¥ Demo:
â¡ï¸ Quick start
ð³ Docker:
docker run -d \
--name serge \
-v weights:/usr/src/app/weights \
-v datadb:/data/db/ \
-p 8008:8008 \
ghcr.io/serge-chat/serge:latest
ð Docker Compose:
services:
serge:
image: ghcr.io/serge-chat/serge:latest
container_name: serge
restart: unless-stopped
ports:
- 8008:8008
volumes:
- weights:/usr/src/app/weights
- datadb:/data/db/
volumes:
weights:
datadb:
Then, just visit http://localhost:8008, You can find the API documentation at http://localhost:8008/api/docs
ð Environment Variables
The following Environment Variables are available:
Variable Name | Description | Default Value |
---|---|---|
SERGE_DATABASE_URL | Database connection string | sqlite:////data/db/sql_app.db |
SERGE_JWT_SECRET | Key for auth token encryption. Use a random string | uF7FGN5uzfGdFiPzR |
SERGE_SESSION_EXPIRY | Duration in minutes before a user must reauthenticate | 60 |
NODE_ENV | Node.js running environment | production |
ð¥ï¸ Windows
Ensure you have Docker Desktop installed, WSL2 configured, and enough free RAM to run models.
âï¸ Kubernetes
Instructions for setting up Serge on Kubernetes can be found in the wiki.
ð§ Supported Models
Category | Models |
---|---|
Alfred | 40B-1023 |
BioMistral | 7B |
Code | 13B, 33B |
CodeLLaMA | 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python |
Codestral | 22B v0.1 |
Gemma | 2B, 1.1-2B-Instruct, 7B, 1.1-7B-Instruct, 2-9B, 2-9B-Instruct, 2-27B, 2-27B-Instruct |
Gorilla | Falcon-7B-HF-v0, 7B-HF-v1, Openfunctions-v1, Openfunctions-v2 |
Falcon | 7B, 7B-Instruct, 11B, 40B, 40B-Instruct |
LLaMA 2 | 7B, 7B-Chat, 7B-Coder, 13B, 13B-Chat, 70B, 70B-Chat, 70B-OASST |
LLaMA 3 | 11B-Instruct, 13B-Instruct, 16B-Instruct |
LLaMA Pro | 8B, 8B-Instruct |
Mathstral | 7B |
Med42 | 70B, v2-8B, v2-70B |
Medalpaca | 13B |
Medicine | Chat, LLM |
Meditron | 7B, 7B-Chat, 70B, 3-8B |
Meta-LlaMA-3 | 3-8B, 3.1-8B, 3.2-1B-Instruct, 3-8B-Instruct, 3.1-8B-Instruct, 3.2-3B-Instruct, 3-70B, 3.1-70B, 3-70B-Instruct, 3.1-70B-Instruct |
Mistral | 7B-V0.1, 7B-Instruct-v0.2, 7B-OpenOrca, Nemo-Instruct |
MistralLite | 7B |
Mixtral | 8x7B-v0.1, 8x7B-Dolphin-2.7, 8x7B-Instruct-v0.1 |
Neural-Chat | 7B-v3.3 |
Notus | 7B-v1 |
Notux | 8x7b-v1 |
Nous-Hermes 2 | Mistral-7B-DPO, Mixtral-8x7B-DPO, Mistral-8x7B-SFT |
OpenChat | 7B-v3.5-1210? 8B-v3.6-20240522 |
OpenCodeInterpreter | DS-6.7B, DS-33B, CL-7B, CL-13B, CL-70B |
OpenLLaMA | 3B-v2, 7B-v2, 13B-v2 |
Orca 2 | 7B, 13B |
Phi | 2-2.7B, 3-mini-4k-instruct, 3.1-mini-4k-instruct, 3.1-mini-128k-instruct,3.5-mini-instruct, 3-medium-4k-instruct, 3-medium-128k-instruct |
Python Code | 13B, 33B |
PsyMedRP | 13B-v1, 20B-v1 |
Starling LM | 7B-Alpha |
SOLAR | 10.7B-v1.0, 10.7B-instruct-v1.0 |
TinyLlama | 1.1B |
Vicuna | 7B-v1.5, 13B-v1.5, 33B-v1.3, 33B-Coder |
WizardLM | 2-7B, 13B-v1.2, 70B-v1.0 |
Zephyr | 3B, 7B-Alpha, 7B-Beta |
Additional models can be requested by opening a GitHub issue. Other models are also available at Serge Models.
â ï¸ Memory Usage
LLaMA will crash if you don't have enough available memory for the model
ð¬ Support
Need help? Join our Discord
𧾠License
Nathan Sarrazin and Contributors. Serge
is free and open-source software licensed under the MIT License and Apache-2.0.
ð¤ Contributing
If you discover a bug or have a feature idea, feel free to open an issue or PR.
To run Serge in development mode:
git clone https://github.com/serge-chat/serge.git
cd serge/
docker compose -f docker-compose.dev.yml up --build
The solution will accept a python debugger session on port 5678. Example launch.json for VSCode:
{
"version": "0.2.0",
"configurations": [
{
"name": "Remote Debug",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/api",
"remoteRoot": "/usr/src/app/api/"
}
],
"justMyCode": false
}
]
}
Top Related Projects
A Gradio web UI for Large Language Models.
Stable Diffusion web UI
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
The official gpt4free repository | various collection of powerful language models
Interact with your documents using the power of GPT, 100% privately, no data leaks
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot