oumi

Everything you need to build state-of-the-art foundation models, end-to-end.

7,783

559

7,783

View on GitHub

Top Related Projects

whisper

80,764

Robust Speech Recognition via Large-Scale Weak Supervision

whisper.cpp

41,097

Port of OpenAI's Whisper model in C/C++

Whisper

9,434

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

whisperX

16,462

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

faster-whisper

15,668

Faster Whisper transcription with CTranslate2

Quick Overview

Oumi is an open-source AI assistant framework designed to create customizable AI agents. It provides a flexible architecture for building conversational AI systems with various capabilities, including task planning, memory management, and integration with external tools and APIs.

Pros

Highly customizable and extensible architecture
Supports multiple language models and external tools
Includes built-in memory and task planning capabilities
Active development and community support

Cons

Relatively new project, may have some stability issues
Documentation could be more comprehensive
Steeper learning curve compared to some simpler chatbot frameworks
Limited pre-built integrations compared to more established frameworks

Code Examples

Creating a simple Oumi agent:

from oumi import Oumi

agent = Oumi()
response = agent.chat("Hello, how are you?")
print(response)

Using a custom language model:

from oumi import Oumi
from oumi.llm import OpenAILLM

custom_llm = OpenAILLM(model="gpt-4")
agent = Oumi(llm=custom_llm)
response = agent.chat("Explain quantum computing.")
print(response)

Adding a custom tool to the agent:

from oumi import Oumi
from oumi.tool import Tool

def weather_tool(location):
    # Implement weather lookup logic here
    return f"The weather in {location} is sunny."

custom_tool = Tool("weather", weather_tool)
agent = Oumi(tools=[custom_tool])
response = agent.chat("What's the weather like in New York?")
print(response)

Getting Started

To get started with Oumi, follow these steps:

Install Oumi:

pip install oumi

Create a simple agent:

from oumi import Oumi

agent = Oumi()
response = agent.chat("Hello, Oumi!")
print(response)

Customize the agent with tools and a specific language model:

from oumi import Oumi
from oumi.llm import OpenAILLM
from oumi.tool import Tool

custom_llm = OpenAILLM(model="gpt-3.5-turbo")
custom_tool = Tool("example", lambda x: f"Example: {x}")

agent = Oumi(llm=custom_llm, tools=[custom_tool])
response = agent.chat("Use the example tool.")
print(response)

Competitor Comparisons

whisper

80,764

Robust Speech Recognition via Large-Scale Weak Supervision

Pros of Whisper

Highly accurate speech recognition across multiple languages
Extensive documentation and community support
Backed by OpenAI's research and resources

Cons of Whisper

Larger model size, requiring more computational resources
Limited to speech recognition tasks only
Potentially slower inference time for real-time applications

Code Comparison

Whisper:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

Oumi:

from oumi import Oumi

oumi = Oumi()
result = oumi.transcribe("audio.mp3")
print(result)

While both repositories offer speech recognition capabilities, Whisper focuses solely on this task with high accuracy across multiple languages. Oumi, on the other hand, aims to provide a more comprehensive AI assistant framework, potentially sacrificing some specialization for broader functionality.

Whisper's extensive documentation and community support make it easier for developers to implement and troubleshoot. However, its larger model size may require more computational resources, which could be a drawback for some applications.

Oumi's simpler API and potentially lighter resource requirements might make it more suitable for certain use cases, especially if additional AI assistant features are needed beyond speech recognition.

whisper.cpp

41,097

Port of OpenAI's Whisper model in C/C++

Pros of whisper.cpp

Highly optimized C/C++ implementation for efficient speech recognition
Supports various Whisper models and quantization levels
Cross-platform compatibility (Windows, macOS, Linux, iOS, Android)

Cons of whisper.cpp

Limited to speech recognition tasks only
Requires manual model downloading and setup
Less user-friendly for non-technical users

Code Comparison

whisper.cpp:

// Load model
struct whisper_context * ctx = whisper_init_from_file("ggml-base.en.bin");

// Process audio
whisper_full_default(ctx, wparams, pcmf32.data(), pcmf32.size());

// Print result
const char * text = whisper_full_get_segment_text(ctx, 0);
printf("%s\n", text);

oumi:

from oumi import Oumi

# Initialize Oumi
oumi = Oumi()

# Process audio and get transcription
transcription = oumi.transcribe("audio.wav")

print(transcription)

Summary

whisper.cpp is a highly optimized C/C++ implementation of OpenAI's Whisper model, focusing on efficient speech recognition across multiple platforms. It offers fine-grained control over model selection and quantization but requires more technical expertise to set up and use.

Oumi, on the other hand, provides a more user-friendly Python interface for various AI tasks, including speech recognition. While it may not offer the same level of optimization as whisper.cpp, it provides a simpler API and potentially broader functionality beyond just speech recognition.

Whisper

9,434

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

Pros of Whisper

Optimized for Windows with DirectCompute, offering faster performance on compatible hardware
Supports both CPU and GPU processing, providing flexibility for different system configurations
Implements advanced features like voice activity detection and automatic language detection

Cons of Whisper

Limited to Windows platform, reducing cross-platform compatibility
Requires specific hardware and drivers for optimal performance
Less focus on real-time transcription compared to Oumi

Code Comparison

Whisper (C++):

void CTranscribeTask::ProcessAudio( const float* pAudio, size_t cbAudio )
{
    const size_t samples = cbAudio / sizeof( float );
    m_audioBuffer.insert( m_audioBuffer.end(), pAudio, pAudio + samples );
    while( m_audioBuffer.size() >= m_samplesPerSegment )
        processSegment();
}

Oumi (Python):

def transcribe_stream(self, audio_stream, **kwargs):
    for chunk in audio_stream:
        if self.vad.is_speech(chunk):
            self.buffer.extend(chunk)
        elif len(self.buffer) > 0:
            text = self.transcribe(self.buffer, **kwargs)
            yield text
            self.buffer.clear()

The code snippets demonstrate different approaches to audio processing and transcription. Whisper focuses on efficient buffer management and segment processing, while Oumi emphasizes real-time streaming and voice activity detection.

whisperX

16,462

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Pros of WhisperX

Specialized in audio transcription and alignment
Offers word-level timestamps for accurate synchronization
Supports multiple languages and provides language detection

Cons of WhisperX

Limited to audio processing tasks
Requires specific dependencies and may have higher system requirements
Less versatile compared to Oumi's broader AI capabilities

Code Comparison

WhisperX:

import whisperx

model = whisperx.load_model("large-v2")
result = model.transcribe("audio.mp3")
print(result["text"])

Oumi:

from oumi import Oumi

oumi = Oumi()
response = oumi.chat("Transcribe this audio file: audio.mp3")
print(response)

Key Differences

WhisperX focuses on advanced audio transcription and alignment, providing detailed timestamps and multi-language support. It's ideal for projects requiring precise audio processing.

Oumi is a more general-purpose AI assistant, capable of handling various tasks beyond audio transcription. It offers a simpler interface for interacting with AI capabilities.

WhisperX may be more suitable for specialized audio projects, while Oumi provides a broader range of AI functionalities for diverse applications.

faster-whisper

15,668

Faster Whisper transcription with CTranslate2

Pros of faster-whisper

Specialized for speech recognition and transcription tasks
Optimized for faster performance using CTranslate2
Supports multiple languages and provides confidence scores

Cons of faster-whisper

Limited to audio processing and transcription
Requires additional dependencies like FFmpeg for audio file handling

Code comparison

faster-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

oumi:

from oumi import Oumi

oumi = Oumi()
response = oumi.chat("Transcribe the audio file 'audio.mp3'")
print(response)

Key differences

faster-whisper is a specialized tool for speech recognition and transcription, offering optimized performance for audio processing tasks. It provides detailed control over transcription parameters and supports multiple languages.

oumi, on the other hand, is a more general-purpose AI assistant that can handle a variety of tasks, including audio transcription, through natural language instructions. While it may not offer the same level of specialization for audio processing, it provides a more versatile and user-friendly interface for various AI-related tasks.

The choice between the two depends on the specific requirements of the project, with faster-whisper being more suitable for dedicated audio transcription tasks and oumi offering broader AI capabilities.

insanely-fast-whisper

8,325

Pros of insanely-fast-whisper

Focuses specifically on optimizing Whisper for speed, potentially offering faster transcription
Provides detailed benchmarks and comparisons with other Whisper implementations
Offers a simple command-line interface for easy use

Cons of insanely-fast-whisper

Limited to Whisper functionality, while oumi offers a broader range of AI capabilities
Less extensive documentation and community support compared to oumi
May require more technical expertise to set up and use effectively

Code Comparison

oumi:

from oumi import Oumi

oumi = Oumi()
response = oumi.chat("Tell me a joke")
print(response)

insanely-fast-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

The code snippets demonstrate the different focus areas of each project. oumi provides a more general-purpose AI interface, while insanely-fast-whisper is specifically tailored for audio transcription using an optimized Whisper model.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Everything you need to build state-of-the-art foundation models, end-to-end.

Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from data preparation and training to evaluation and deployment. Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

With Oumi, you can:

ð Train and fine-tune models from 10M to 405B parameters using state-of-the-art techniques (SFT, LoRA, QLoRA, DPO, and more)
ð¤ Work with both text and multimodal models (Llama, DeepSeek, Qwen, Phi, and others)
ð Synthesize and curate training data with LLM judges
â¡ï¸ Deploy models efficiently with popular inference engines (vLLM, SGLang)
ð Evaluate models comprehensively across standard benchmarks
ð Run anywhere - from laptops to clusters to clouds (AWS, Azure, GCP, Lambda, and more)
ð Integrate with both open models and commercial APIs (OpenAI, Anthropic, Vertex AI, Together, Parasail, ...)

All with one consistent API, production-grade reliability, and all the flexibility you need for research.

Learn more at oumi.ai, or jump right in with the quickstart guide.

ð Getting Started

Notebook	Try in Colab	Goal
ð¯ Getting Started: A Tour		Quick tour of core features: training, evaluation, inference, and job management
ð§ Model Finetuning Guide		End-to-end guide to LoRA tuning with data prep, training, and evaluation
ð Model Distillation		Guide to distilling large models into smaller, efficient ones
ð Model Evaluation		Comprehensive model evaluation using Oumi's evaluation framework
âï¸ Remote Training		Launch and monitor training jobs on cloud (AWS, Azure, GCP, Lambda, etc.) platforms
ð LLM-as-a-Judge		Filter and curate training data with built-in judges

ð§ Usage

Installation

Installing oumi in your environment is straightforward:

# Install the package (CPU & NPU only)
pip install oumi  # For local development & testing

# OR, with GPU support (Requires Nvidia or AMD GPU)
pip install oumi[gpu]  # For GPU training

# To get the latest version, install from the source
pip install git+https://github.com/oumi-ai/oumi.git

For more advanced installation options, see the installation guide.

Oumi CLI

You can quickly use the oumi command to train, evaluate, and infer models using one of the existing recipes:

# Training
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml

# Evaluation
oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml

# Inference
oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml --interactive

For more advanced options, see the training, evaluation, inference, and llm-as-a-judge guides.

Running Jobs Remotely

You can run jobs remotely on cloud platforms (AWS, Azure, GCP, Lambda, etc.) using the oumi launch command:

# GCP
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml

# AWS
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud aws

# Azure
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud azure

# Lambda
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud lambda

Note: Oumi is in beta and under active development. The core features are stable, but some advanced features might change as the platform improves.

ð» Why use Oumi?

If you need a comprehensive platform for training, evaluating, or deploying models, Oumi is a great choice.

Here are some of the key features that make Oumi stand out:

ð§ Zero Boilerplate: Get started in minutes with ready-to-use recipes for popular models and workflows. No need to write training loops or data pipelines.
ð¢ Enterprise-Grade: Built and validated by teams training models at scale
ð¯ Research Ready: Perfect for ML research with easily reproducible experiments, and flexible interfaces for customizing each component.
ð Broad Model Support: Works with most popular model architectures - from tiny models to the largest ones, text-only to multimodal.
ð SOTA Performance: Native support for distributed training techniques (FSDP, DDP) and optimized inference engines (vLLM, SGLang).
ð¤ Community First: 100% open source with an active community. No vendor lock-in, no strings attached.

ð Examples & Recipes

Explore the growing collection of ready-to-use configurations for state-of-the-art models and training workflows:

Note: These configurations are not an exhaustive list of what's supported, simply examples to get you started. You can find a more exhaustive list of supported models, and datasets (supervised fine-tuning, pre-training, preference tuning, and vision-language finetuning) in the oumi documentation.

ð DeepSeek R1 Family

Model	Example Configurations
DeepSeek R1 671B	Inference (Together AI)
Distilled Llama 8B	FFT â¢ LoRA â¢ QLoRA â¢ Inference â¢ Evaluation
Distilled Llama 70B	FFT â¢ LoRA â¢ QLoRA â¢ Inference â¢ Evaluation
Distilled Qwen 1.5B	FFT â¢ LoRA â¢ Inference â¢ Evaluation
Distilled Qwen 32B	LoRA â¢ Inference â¢ Evaluation

ð¦ Llama Family

Model	Example Configurations
Llama 3.1 8B	FFT â¢ LoRA â¢ QLoRA â¢ Pre-training â¢ Inference (vLLM) â¢ Inference â¢ Evaluation
Llama 3.1 70B	FFT â¢ LoRA â¢ QLoRA â¢ Inference â¢ Evaluation
Llama 3.1 405B	FFT â¢ LoRA â¢ QLoRA
Llama 3.2 1B	FFT â¢ LoRA â¢ QLoRA â¢ Inference (vLLM) â¢ Inference (SGLang) â¢ Inference â¢ Evaluation
Llama 3.2 3B	FFT â¢ LoRA â¢ QLoRA â¢ Inference (vLLM) â¢ Inference (SGLang) â¢ Inference â¢ Evaluation
Llama 3.3 70B	FFT â¢ LoRA â¢ QLoRA â¢ Inference (vLLM) â¢ Inference â¢ Evaluation
Llama 3.2 Vision 11B	SFT â¢ Inference (vLLM) â¢ Inference (SGLang) â¢ Evaluation

ð¨ Vision Models

Model	Example Configurations
Llama 3.2 Vision 11B	SFT â¢ LoRA â¢ Inference (vLLM) â¢ Inference (SGLang) â¢ Evaluation
LLaVA 7B	SFT â¢ Inference (vLLM) â¢ Inference
Phi3 Vision 4.2B	SFT â¢ Inference (vLLM)
Qwen2-VL 2B	SFT â¢ Inference (vLLM) â¢ Inference (SGLang) â¢ Inference â¢ Evaluation
SmolVLM-Instruct 2B	SFT

ð Even more options

This section lists all the language models that can be used with Oumi. Thanks to the integration with the ð¤ Transformers library, you can easily use any of these models for training, evaluation, or inference.

Models prefixed with a checkmark (â) have been thoroughly tested and validated by the Oumi community, with ready-to-use recipes available in the configs/recipes directory.

ð Click to see more supported models

Instruct Models

Model	Size	Paper	HF Hub	License	Open ¹
â SmolLM-Instruct	135M/360M/1.7B	Blog	Hub	Apache 2.0	â
â DeepSeek R1 Family	1.5B/8B/32B/70B/671B	Blog	Hub	MIT	â
â Llama 3.1 Instruct	8B/70B/405B	Paper	Hub	License	â
â Llama 3.2 Instruct	1B/3B	Paper	Hub	License	â
â Llama 3.3 Instruct	70B	Paper	Hub	License	â
â Phi-3.5-Instruct	4B/14B	Paper	Hub	License	â
Qwen2.5-Instruct	0.5B-70B	Paper	Hub	License	â
OLMo 2 Instruct	7B	Paper	Hub	Apache 2.0	â
MPT-Instruct	7B	Blog	Hub	Apache 2.0	â
Command R	35B/104B	Blog	Hub	License	â
Granite-3.1-Instruct	2B/8B	Paper	Hub	Apache 2.0	â
Gemma 2 Instruct	2B/9B	Blog	Hub	License	â
DBRX-Instruct	130B MoE	Blog	Hub	Apache 2.0	â
Falcon-Instruct	7B/40B	Paper	Hub	Apache 2.0	â

Vision-Language Models

Model	Size	Paper	HF Hub	License	Open
â Llama 3.2 Vision	11B	Paper	Hub	License	â
â LLaVA-1.5	7B	Paper	Hub	License	â
â Phi-3 Vision	4.2B	Paper	Hub	License	â
â BLIP-2	3.6B	Paper	Hub	MIT	â
â Qwen2-VL	2B	Blog	Hub	License	â
â SmolVLM-Instruct	2B	Blog	Hub	Apache 2.0	â

Base Models

Model	Size	Paper	HF Hub	License	Open
â SmolLM2	135M/360M/1.7B	Blog	Hub	Apache 2.0	â
â Llama 3.2	1B/3B	Paper	Hub	License	â
â Llama 3.1	8B/70B/405B	Paper	Hub	License	â
â GPT-2	124M-1.5B	Paper	Hub	MIT	â
DeepSeek V2	7B/13B	Blog	Hub	License	â
Gemma2	2B/9B	Blog	Hub	License	â
GPT-J	6B	Blog	Hub	Apache 2.0	â
GPT-NeoX	20B	Paper	Hub	Apache 2.0	â
Mistral	7B	Paper	Hub	Apache 2.0	â
Mixtral	8x7B/8x22B	Blog	Hub	Apache 2.0	â
MPT	7B	Blog	Hub	Apache 2.0	â
OLMo	1B/7B	Paper	Hub	Apache 2.0	â

Reasoning Models

Model	Size	Paper	HF Hub	License	Open	Recommended Parameters
Qwen QwQ	32B	Blog	Hub	License	â

Code Models

Model	Size	Paper	HF Hub	License	Open
â Qwen2.5 Coder	0.5B-32B	Blog	Hub	License	â
DeepSeek Coder	1.3B-33B	Paper	Hub	License	â
StarCoder 2	3B/7B/15B	Paper	Hub	License	â

Math Models

Model	Size	Paper	HF Hub	License	Open	Recommended Parameters
DeepSeek Math	7B	Paper	Hub	License	â

ð Documentation

To learn more about all the platform's capabilities, see the Oumi documentation.

ð¤ Join the Community!

Oumi is a community-first effort. Whether you are a developer, a researcher, or a non-technical user, all contributions are very welcome!

To contribute to the oumi repository, please check the CONTRIBUTING.md for guidance on how to contribute to send your first Pull Request.
Make sure to join our Discord community to get help, share your experiences, and contribute to the project!
If you are interested in joining one of the community's open-science efforts, check out our open collaboration page.

ð Acknowledgements

Oumi makes use of several libraries and tools from the open-source community. We would like to acknowledge and deeply thank the contributors of these projects! â¨ ð ð«

ð Citation

If you find Oumi useful in your research, please consider citing it:

@software{oumi2025,
  author = {Oumi Community},
  title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
  month = {January},
  year = {2025},
  url = {https://github.com/oumi-ai/oumi}
}

ð License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Open models are defined as models with fully open weights, training code, and data, and a permissive license. See Open Source Definitions for more information. ↩

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of Whisper

Cons of Whisper

Code Comparison

Pros of whisper.cpp

Cons of whisper.cpp

Code Comparison

Summary

Pros of Whisper

Cons of Whisper

Code Comparison

Pros of WhisperX

Cons of WhisperX

Code Comparison

Key Differences

Pros of faster-whisper

Cons of faster-whisper

Code comparison

Key differences

Pros of insanely-fast-whisper

Cons of insanely-fast-whisper

Code Comparison

Convert designs to code with AI

README

Everything you need to build state-of-the-art foundation models, end-to-end.

ð Getting Started

ð§ Usage

Installation

Oumi CLI

Running Jobs Remotely

ð» Why use Oumi?

ð Examples & Recipes

ð DeepSeek R1 Family

ð¦ Llama Family

ð¨ Vision Models

ð Even more options

Instruct Models

Vision-Language Models

Base Models

Reasoning Models

Code Models

Math Models

ð Documentation

ð¤ Join the Community!

ð Acknowledgements

ð Citation

ð License

Footnotes

Top Related Projects

Convert designs to code with AI

ð Getting Started

ð§ Usage

ð» Why use Oumi?

ð Examples & Recipes

ð DeepSeek R1 Family

ð¦ Llama Family

ð¨ Vision Models

ð Even more options

ð Documentation

ð¤ Join the Community!

ð Acknowledgements

ð Citation

ð License