Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
Port of OpenAI's Whisper model in C/C++
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Faster Whisper transcription with CTranslate2
Quick Overview
Oumi is an open-source AI assistant framework designed to create customizable AI agents. It provides a flexible architecture for building conversational AI systems with various capabilities, including task planning, memory management, and integration with external tools and APIs.
Pros
- Highly customizable and extensible architecture
- Supports multiple language models and external tools
- Includes built-in memory and task planning capabilities
- Active development and community support
Cons
- Relatively new project, may have some stability issues
- Documentation could be more comprehensive
- Steeper learning curve compared to some simpler chatbot frameworks
- Limited pre-built integrations compared to more established frameworks
Code Examples
- Creating a simple Oumi agent:
from oumi import Oumi
agent = Oumi()
response = agent.chat("Hello, how are you?")
print(response)
- Using a custom language model:
from oumi import Oumi
from oumi.llm import OpenAILLM
custom_llm = OpenAILLM(model="gpt-4")
agent = Oumi(llm=custom_llm)
response = agent.chat("Explain quantum computing.")
print(response)
- Adding a custom tool to the agent:
from oumi import Oumi
from oumi.tool import Tool
def weather_tool(location):
# Implement weather lookup logic here
return f"The weather in {location} is sunny."
custom_tool = Tool("weather", weather_tool)
agent = Oumi(tools=[custom_tool])
response = agent.chat("What's the weather like in New York?")
print(response)
Getting Started
To get started with Oumi, follow these steps:
- Install Oumi:
pip install oumi
- Create a simple agent:
from oumi import Oumi
agent = Oumi()
response = agent.chat("Hello, Oumi!")
print(response)
- Customize the agent with tools and a specific language model:
from oumi import Oumi
from oumi.llm import OpenAILLM
from oumi.tool import Tool
custom_llm = OpenAILLM(model="gpt-3.5-turbo")
custom_tool = Tool("example", lambda x: f"Example: {x}")
agent = Oumi(llm=custom_llm, tools=[custom_tool])
response = agent.chat("Use the example tool.")
print(response)
Competitor Comparisons
Robust Speech Recognition via Large-Scale Weak Supervision
Pros of Whisper
- Highly accurate speech recognition across multiple languages
- Extensive documentation and community support
- Backed by OpenAI's research and resources
Cons of Whisper
- Larger model size, requiring more computational resources
- Limited to speech recognition tasks only
- Potentially slower inference time for real-time applications
Code Comparison
Whisper:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Oumi:
from oumi import Oumi
oumi = Oumi()
result = oumi.transcribe("audio.mp3")
print(result)
While both repositories offer speech recognition capabilities, Whisper focuses solely on this task with high accuracy across multiple languages. Oumi, on the other hand, aims to provide a more comprehensive AI assistant framework, potentially sacrificing some specialization for broader functionality.
Whisper's extensive documentation and community support make it easier for developers to implement and troubleshoot. However, its larger model size may require more computational resources, which could be a drawback for some applications.
Oumi's simpler API and potentially lighter resource requirements might make it more suitable for certain use cases, especially if additional AI assistant features are needed beyond speech recognition.
Port of OpenAI's Whisper model in C/C++
Pros of whisper.cpp
- Highly optimized C++ implementation for efficient speech recognition
- Supports various platforms including mobile and embedded devices
- Offers real-time audio processing capabilities
Cons of whisper.cpp
- Limited to speech recognition tasks only
- Requires more technical expertise to integrate and use effectively
- Less flexibility for customization compared to Oumi
Code Comparison
whisper.cpp:
#include "whisper.h"
int main() {
struct whisper_context * ctx = whisper_init_from_file("model.bin");
whisper_full_default(ctx, params, pcm, pcm_len);
whisper_free(ctx);
}
Oumi:
from oumi import Oumi
oumi = Oumi()
response = oumi.chat("Hello, how are you?")
print(response)
Summary
whisper.cpp is a specialized, high-performance speech recognition library, while Oumi is a more versatile AI assistant framework. whisper.cpp excels in efficiency and platform support for audio processing, but Oumi offers broader functionality and easier integration for general AI tasks. The choice between them depends on the specific requirements of your project, whether it's focused solely on speech recognition or needs more comprehensive AI capabilities.
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Pros of Whisper
- Optimized for Windows with DirectCompute, offering faster performance on compatible hardware
- Supports both CPU and GPU processing, providing flexibility for different system configurations
- Implements advanced features like voice activity detection and automatic language detection
Cons of Whisper
- Limited to Windows platform, reducing cross-platform compatibility
- Requires specific hardware and drivers for optimal performance
- Less focus on real-time transcription compared to Oumi
Code Comparison
Whisper (C++):
void CTranscribeTask::ProcessAudio( const float* pAudio, size_t cbAudio )
{
const size_t samples = cbAudio / sizeof( float );
m_audioBuffer.insert( m_audioBuffer.end(), pAudio, pAudio + samples );
while( m_audioBuffer.size() >= m_samplesPerSegment )
processSegment();
}
Oumi (Python):
def transcribe_stream(self, audio_stream, **kwargs):
for chunk in audio_stream:
if self.vad.is_speech(chunk):
self.buffer.extend(chunk)
elif len(self.buffer) > 0:
text = self.transcribe(self.buffer, **kwargs)
yield text
self.buffer.clear()
The code snippets demonstrate different approaches to audio processing and transcription. Whisper focuses on efficient buffer management and segment processing, while Oumi emphasizes real-time streaming and voice activity detection.
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Pros of WhisperX
- Specialized in audio transcription and alignment
- Offers word-level timestamps for accurate synchronization
- Supports multiple languages and provides language detection
Cons of WhisperX
- Limited to audio processing tasks
- Requires specific dependencies and may have higher system requirements
- Less versatile compared to Oumi's broader AI capabilities
Code Comparison
WhisperX:
import whisperx
model = whisperx.load_model("large-v2")
result = model.transcribe("audio.mp3")
print(result["text"])
Oumi:
from oumi import Oumi
oumi = Oumi()
response = oumi.chat("Transcribe this audio file: audio.mp3")
print(response)
Key Differences
WhisperX focuses on advanced audio transcription and alignment, providing detailed timestamps and multi-language support. It's ideal for projects requiring precise audio processing.
Oumi is a more general-purpose AI assistant, capable of handling various tasks beyond audio transcription. It offers a simpler interface for interacting with AI capabilities.
WhisperX may be more suitable for specialized audio projects, while Oumi provides a broader range of AI functionalities for diverse applications.
Faster Whisper transcription with CTranslate2
Pros of faster-whisper
- Specialized for speech recognition and transcription tasks
- Optimized for faster performance using CTranslate2
- Supports multiple languages and provides confidence scores
Cons of faster-whisper
- Limited to audio processing and transcription
- Requires additional dependencies like FFmpeg for audio file handling
Code comparison
faster-whisper:
from faster_whisper import WhisperModel
model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
oumi:
from oumi import Oumi
oumi = Oumi()
response = oumi.chat("Transcribe the audio file 'audio.mp3'")
print(response)
Key differences
faster-whisper is a specialized tool for speech recognition and transcription, offering optimized performance for audio processing tasks. It provides detailed control over transcription parameters and supports multiple languages.
oumi, on the other hand, is a more general-purpose AI assistant that can handle a variety of tasks, including audio transcription, through natural language instructions. While it may not offer the same level of specialization for audio processing, it provides a more versatile and user-friendly interface for various AI-related tasks.
The choice between the two depends on the specific requirements of the project, with faster-whisper being more suitable for dedicated audio transcription tasks and oumi offering broader AI capabilities.
Pros of insanely-fast-whisper
- Focuses specifically on optimizing Whisper for speed, potentially offering faster transcription
- Provides detailed benchmarks and comparisons with other Whisper implementations
- Offers a simple command-line interface for easy use
Cons of insanely-fast-whisper
- Limited to Whisper functionality, while oumi offers a broader range of AI capabilities
- Less extensive documentation and community support compared to oumi
- May require more technical expertise to set up and use effectively
Code Comparison
oumi:
from oumi import Oumi
oumi = Oumi()
response = oumi.chat("Tell me a joke")
print(response)
insanely-fast-whisper:
from faster_whisper import WhisperModel
model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)
The code snippets demonstrate the different focus areas of each project. oumi provides a more general-purpose AI interface, while insanely-fast-whisper is specifically tailored for audio transcription using an optimized Whisper model.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Everything you need to build state-of-the-art foundation models, end-to-end.
Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from data preparation and training to evaluation and deployment. Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.
With Oumi, you can:
- ð Train and fine-tune models from 10M to 405B parameters using state-of-the-art techniques (SFT, LoRA, QLoRA, DPO, and more)
- ð¤ Work with both text and multimodal models (Llama, DeepSeek, Qwen, Phi, and others)
- ð Synthesize and curate training data with LLM judges
- â¡ï¸ Deploy models efficiently with popular inference engines (vLLM, SGLang)
- ð Evaluate models comprehensively across standard benchmarks
- ð Run anywhere - from laptops to clusters to clouds (AWS, Azure, GCP, Lambda, and more)
- ð Integrate with both open models and commercial APIs (OpenAI, Anthropic, Vertex AI, Together, Parasail, ...)
All with one consistent API, production-grade reliability, and all the flexibility you need for research.
Learn more at oumi.ai, or jump right in with the quickstart guide.
ð Getting Started
ð§ Usage
Installation
Installing oumi in your environment is straightforward:
# Install the package (CPU & NPU only)
pip install oumi # For local development & testing
# OR, with GPU support (Requires Nvidia or AMD GPU)
pip install oumi[gpu] # For GPU training
# To get the latest version, install from the source
pip install git+https://github.com/oumi-ai/oumi.git
For more advanced installation options, see the installation guide.
Oumi CLI
You can quickly use the oumi
command to train, evaluate, and infer models using one of the existing recipes:
# Training
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml
# Evaluation
oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml
# Inference
oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml --interactive
For more advanced options, see the training, evaluation, inference, and llm-as-a-judge guides.
Running Jobs Remotely
You can run jobs remotely on cloud platforms (AWS, Azure, GCP, Lambda, etc.) using the oumi launch
command:
# GCP
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml
# AWS
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud aws
# Azure
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud azure
# Lambda
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud lambda
Note: Oumi is in beta and under active development. The core features are stable, but some advanced features might change as the platform improves.
ð» Why use Oumi?
If you need a comprehensive platform for training, evaluating, or deploying models, Oumi is a great choice.
Here are some of the key features that make Oumi stand out:
- ð§ Zero Boilerplate: Get started in minutes with ready-to-use recipes for popular models and workflows. No need to write training loops or data pipelines.
- ð¢ Enterprise-Grade: Built and validated by teams training models at scale
- ð¯ Research Ready: Perfect for ML research with easily reproducible experiments, and flexible interfaces for customizing each component.
- ð Broad Model Support: Works with most popular model architectures - from tiny models to the largest ones, text-only to multimodal.
- ð SOTA Performance: Native support for distributed training techniques (FSDP, DDP) and optimized inference engines (vLLM, SGLang).
- ð¤ Community First: 100% open source with an active community. No vendor lock-in, no strings attached.
ð Examples & Recipes
Explore the growing collection of ready-to-use configurations for state-of-the-art models and training workflows:
Note: These configurations are not an exhaustive list of what's supported, simply examples to get you started. You can find a more exhaustive list of supported models, and datasets (supervised fine-tuning, pre-training, preference tuning, and vision-language finetuning) in the oumi documentation.
ð DeepSeek R1 Family
Model | Example Configurations |
---|---|
DeepSeek R1 671B | Inference (Together AI) |
Distilled Llama 8B | FFT ⢠LoRA ⢠QLoRA ⢠Inference ⢠Evaluation |
Distilled Llama 70B | FFT ⢠LoRA ⢠QLoRA ⢠Inference ⢠Evaluation |
Distilled Qwen 1.5B | FFT ⢠LoRA ⢠Inference ⢠Evaluation |
Distilled Qwen 32B | LoRA ⢠Inference ⢠Evaluation |
ð¦ Llama Family
Model | Example Configurations |
---|---|
Llama 3.1 8B | FFT ⢠LoRA ⢠QLoRA ⢠Pre-training ⢠Inference (vLLM) ⢠Inference ⢠Evaluation |
Llama 3.1 70B | FFT ⢠LoRA ⢠QLoRA ⢠Inference ⢠Evaluation |
Llama 3.1 405B | FFT ⢠LoRA ⢠QLoRA |
Llama 3.2 1B | FFT ⢠LoRA ⢠QLoRA ⢠Inference (vLLM) ⢠Inference (SGLang) ⢠Inference ⢠Evaluation |
Llama 3.2 3B | FFT ⢠LoRA ⢠QLoRA ⢠Inference (vLLM) ⢠Inference (SGLang) ⢠Inference ⢠Evaluation |
Llama 3.3 70B | FFT ⢠LoRA ⢠QLoRA ⢠Inference (vLLM) ⢠Inference ⢠Evaluation |
Llama 3.2 Vision 11B | SFT ⢠Inference (vLLM) ⢠Inference (SGLang) ⢠Evaluation |
ð¨ Vision Models
Model | Example Configurations |
---|---|
Llama 3.2 Vision 11B | SFT ⢠LoRA ⢠Inference (vLLM) ⢠Inference (SGLang) ⢠Evaluation |
LLaVA 7B | SFT ⢠Inference (vLLM) ⢠Inference |
Phi3 Vision 4.2B | SFT ⢠Inference (vLLM) |
Qwen2-VL 2B | SFT ⢠Inference (vLLM) ⢠Inference (SGLang) ⢠Inference ⢠Evaluation |
SmolVLM-Instruct 2B | SFT |
ð Even more options
This section lists all the language models that can be used with Oumi. Thanks to the integration with the ð¤ Transformers library, you can easily use any of these models for training, evaluation, or inference.
Models prefixed with a checkmark (â ) have been thoroughly tested and validated by the Oumi community, with ready-to-use recipes available in the configs/recipes directory.
ð Click to see more supported models
Instruct Models
Model | Size | Paper | HF Hub | License | Open 1 | Recommended Parameters |
---|---|---|---|---|---|---|
â SmolLM-Instruct | 135M/360M/1.7B | Blog | Hub | Apache 2.0 | â | |
â DeepSeek R1 Family | 1.5B/8B/32B/70B/671B | Blog | Hub | MIT | â | |
â Llama 3.1 Instruct | 8B/70B/405B | Paper | Hub | License | â | |
â Llama 3.2 Instruct | 1B/3B | Paper | Hub | License | â | |
â Llama 3.3 Instruct | 70B | Paper | Hub | License | â | |
â Phi-3.5-Instruct | 4B/14B | Paper | Hub | License | â | |
Qwen2.5-Instruct | 0.5B-70B | Paper | Hub | License | â | |
OLMo 2 Instruct | 7B | Paper | Hub | Apache 2.0 | â | |
MPT-Instruct | 7B | Blog | Hub | Apache 2.0 | â | |
Command R | 35B/104B | Blog | Hub | License | â | |
Granite-3.1-Instruct | 2B/8B | Paper | Hub | Apache 2.0 | â | |
Gemma 2 Instruct | 2B/9B | Blog | Hub | License | â | |
DBRX-Instruct | 130B MoE | Blog | Hub | Apache 2.0 | â | |
Falcon-Instruct | 7B/40B | Paper | Hub | Apache 2.0 | â |
Vision-Language Models
Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
---|---|---|---|---|---|---|
â Llama 3.2 Vision | 11B | Paper | Hub | License | â | |
â LLaVA-1.5 | 7B | Paper | Hub | License | â | |
â Phi-3 Vision | 4.2B | Paper | Hub | License | â | |
â BLIP-2 | 3.6B | Paper | Hub | MIT | â | |
â Qwen2-VL | 2B | Blog | Hub | License | â | |
â SmolVLM-Instruct | 2B | Blog | Hub | Apache 2.0 | â |
Base Models
Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
---|---|---|---|---|---|---|
â SmolLM2 | 135M/360M/1.7B | Blog | Hub | Apache 2.0 | â | |
â Llama 3.2 | 1B/3B | Paper | Hub | License | â | |
â Llama 3.1 | 8B/70B/405B | Paper | Hub | License | â | |
â GPT-2 | 124M-1.5B | Paper | Hub | MIT | â | |
DeepSeek V2 | 7B/13B | Blog | Hub | License | â | |
Gemma2 | 2B/9B | Blog | Hub | License | â | |
GPT-J | 6B | Blog | Hub | Apache 2.0 | â | |
GPT-NeoX | 20B | Paper | Hub | Apache 2.0 | â | |
Mistral | 7B | Paper | Hub | Apache 2.0 | â | |
Mixtral | 8x7B/8x22B | Blog | Hub | Apache 2.0 | â | |
MPT | 7B | Blog | Hub | Apache 2.0 | â | |
OLMo | 1B/7B | Paper | Hub | Apache 2.0 | â |
Reasoning Models
Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
---|---|---|---|---|---|---|
Qwen QwQ | 32B | Blog | Hub | License | â |
Code Models
Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
---|---|---|---|---|---|---|
â Qwen2.5 Coder | 0.5B-32B | Blog | Hub | License | â | |
DeepSeek Coder | 1.3B-33B | Paper | Hub | License | â | |
StarCoder 2 | 3B/7B/15B | Paper | Hub | License | â |
Math Models
Model | Size | Paper | HF Hub | License | Open | Recommended Parameters |
---|---|---|---|---|---|---|
DeepSeek Math | 7B | Paper | Hub | License | â |
ð Documentation
To learn more about all the platform's capabilities, see the Oumi documentation.
ð¤ Join the Community!
Oumi is a community-first effort. Whether you are a developer, a researcher, or a non-technical user, all contributions are very welcome!
- To contribute to the
oumi
repository, please check theCONTRIBUTING.md
for guidance on how to contribute to send your first Pull Request. - Make sure to join our Discord community to get help, share your experiences, and contribute to the project!
- If you are interested in joining one of the community's open-science efforts, check out our open collaboration page.
ð Acknowledgements
Oumi makes use of several libraries and tools from the open-source community. We would like to acknowledge and deeply thank the contributors of these projects! ⨠ð ð«
ð Citation
If you find Oumi useful in your research, please consider citing it:
@software{oumi2025,
author = {Oumi Community},
title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
month = {January},
year = {2025},
url = {https://github.com/oumi-ai/oumi}
}
ð License
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
Footnotes
-
Open models are defined as models with fully open weights, training code, and data, and a permissive license. See Open Source Definitions for more information. ↩
Top Related Projects
Robust Speech Recognition via Large-Scale Weak Supervision
Port of OpenAI's Whisper model in C/C++
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Faster Whisper transcription with CTranslate2
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot