Convert Figma logo to code with AI

oumi-ai logooumi

Everything you need to build state-of-the-art foundation models, end-to-end.

7,783
559
7,783
39

Top Related Projects

74,778

Robust Speech Recognition via Large-Scale Weak Supervision

Port of OpenAI's Whisper model in C/C++

9,079

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

14,609

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Faster Whisper transcription with CTranslate2

Quick Overview

Oumi is an open-source AI assistant framework designed to create customizable AI agents. It provides a flexible architecture for building conversational AI systems with various capabilities, including task planning, memory management, and integration with external tools and APIs.

Pros

  • Highly customizable and extensible architecture
  • Supports multiple language models and external tools
  • Includes built-in memory and task planning capabilities
  • Active development and community support

Cons

  • Relatively new project, may have some stability issues
  • Documentation could be more comprehensive
  • Steeper learning curve compared to some simpler chatbot frameworks
  • Limited pre-built integrations compared to more established frameworks

Code Examples

  1. Creating a simple Oumi agent:
from oumi import Oumi

agent = Oumi()
response = agent.chat("Hello, how are you?")
print(response)
  1. Using a custom language model:
from oumi import Oumi
from oumi.llm import OpenAILLM

custom_llm = OpenAILLM(model="gpt-4")
agent = Oumi(llm=custom_llm)
response = agent.chat("Explain quantum computing.")
print(response)
  1. Adding a custom tool to the agent:
from oumi import Oumi
from oumi.tool import Tool

def weather_tool(location):
    # Implement weather lookup logic here
    return f"The weather in {location} is sunny."

custom_tool = Tool("weather", weather_tool)
agent = Oumi(tools=[custom_tool])
response = agent.chat("What's the weather like in New York?")
print(response)

Getting Started

To get started with Oumi, follow these steps:

  1. Install Oumi:
pip install oumi
  1. Create a simple agent:
from oumi import Oumi

agent = Oumi()
response = agent.chat("Hello, Oumi!")
print(response)
  1. Customize the agent with tools and a specific language model:
from oumi import Oumi
from oumi.llm import OpenAILLM
from oumi.tool import Tool

custom_llm = OpenAILLM(model="gpt-3.5-turbo")
custom_tool = Tool("example", lambda x: f"Example: {x}")

agent = Oumi(llm=custom_llm, tools=[custom_tool])
response = agent.chat("Use the example tool.")
print(response)

Competitor Comparisons

74,778

Robust Speech Recognition via Large-Scale Weak Supervision

Pros of Whisper

  • Highly accurate speech recognition across multiple languages
  • Extensive documentation and community support
  • Backed by OpenAI's research and resources

Cons of Whisper

  • Larger model size, requiring more computational resources
  • Limited to speech recognition tasks only
  • Potentially slower inference time for real-time applications

Code Comparison

Whisper:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])

Oumi:

from oumi import Oumi

oumi = Oumi()
result = oumi.transcribe("audio.mp3")
print(result)

While both repositories offer speech recognition capabilities, Whisper focuses solely on this task with high accuracy across multiple languages. Oumi, on the other hand, aims to provide a more comprehensive AI assistant framework, potentially sacrificing some specialization for broader functionality.

Whisper's extensive documentation and community support make it easier for developers to implement and troubleshoot. However, its larger model size may require more computational resources, which could be a drawback for some applications.

Oumi's simpler API and potentially lighter resource requirements might make it more suitable for certain use cases, especially if additional AI assistant features are needed beyond speech recognition.

Port of OpenAI's Whisper model in C/C++

Pros of whisper.cpp

  • Highly optimized C++ implementation for efficient speech recognition
  • Supports various platforms including mobile and embedded devices
  • Offers real-time audio processing capabilities

Cons of whisper.cpp

  • Limited to speech recognition tasks only
  • Requires more technical expertise to integrate and use effectively
  • Less flexibility for customization compared to Oumi

Code Comparison

whisper.cpp:

#include "whisper.h"

int main() {
    struct whisper_context * ctx = whisper_init_from_file("model.bin");
    whisper_full_default(ctx, params, pcm, pcm_len);
    whisper_free(ctx);
}

Oumi:

from oumi import Oumi

oumi = Oumi()
response = oumi.chat("Hello, how are you?")
print(response)

Summary

whisper.cpp is a specialized, high-performance speech recognition library, while Oumi is a more versatile AI assistant framework. whisper.cpp excels in efficiency and platform support for audio processing, but Oumi offers broader functionality and easier integration for general AI tasks. The choice between them depends on the specific requirements of your project, whether it's focused solely on speech recognition or needs more comprehensive AI capabilities.

9,079

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

Pros of Whisper

  • Optimized for Windows with DirectCompute, offering faster performance on compatible hardware
  • Supports both CPU and GPU processing, providing flexibility for different system configurations
  • Implements advanced features like voice activity detection and automatic language detection

Cons of Whisper

  • Limited to Windows platform, reducing cross-platform compatibility
  • Requires specific hardware and drivers for optimal performance
  • Less focus on real-time transcription compared to Oumi

Code Comparison

Whisper (C++):

void CTranscribeTask::ProcessAudio( const float* pAudio, size_t cbAudio )
{
    const size_t samples = cbAudio / sizeof( float );
    m_audioBuffer.insert( m_audioBuffer.end(), pAudio, pAudio + samples );
    while( m_audioBuffer.size() >= m_samplesPerSegment )
        processSegment();
}

Oumi (Python):

def transcribe_stream(self, audio_stream, **kwargs):
    for chunk in audio_stream:
        if self.vad.is_speech(chunk):
            self.buffer.extend(chunk)
        elif len(self.buffer) > 0:
            text = self.transcribe(self.buffer, **kwargs)
            yield text
            self.buffer.clear()

The code snippets demonstrate different approaches to audio processing and transcription. Whisper focuses on efficient buffer management and segment processing, while Oumi emphasizes real-time streaming and voice activity detection.

14,609

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Pros of WhisperX

  • Specialized in audio transcription and alignment
  • Offers word-level timestamps for accurate synchronization
  • Supports multiple languages and provides language detection

Cons of WhisperX

  • Limited to audio processing tasks
  • Requires specific dependencies and may have higher system requirements
  • Less versatile compared to Oumi's broader AI capabilities

Code Comparison

WhisperX:

import whisperx

model = whisperx.load_model("large-v2")
result = model.transcribe("audio.mp3")
print(result["text"])

Oumi:

from oumi import Oumi

oumi = Oumi()
response = oumi.chat("Transcribe this audio file: audio.mp3")
print(response)

Key Differences

WhisperX focuses on advanced audio transcription and alignment, providing detailed timestamps and multi-language support. It's ideal for projects requiring precise audio processing.

Oumi is a more general-purpose AI assistant, capable of handling various tasks beyond audio transcription. It offers a simpler interface for interacting with AI capabilities.

WhisperX may be more suitable for specialized audio projects, while Oumi provides a broader range of AI functionalities for diverse applications.

Faster Whisper transcription with CTranslate2

Pros of faster-whisper

  • Specialized for speech recognition and transcription tasks
  • Optimized for faster performance using CTranslate2
  • Supports multiple languages and provides confidence scores

Cons of faster-whisper

  • Limited to audio processing and transcription
  • Requires additional dependencies like FFmpeg for audio file handling

Code comparison

faster-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

oumi:

from oumi import Oumi

oumi = Oumi()
response = oumi.chat("Transcribe the audio file 'audio.mp3'")
print(response)

Key differences

faster-whisper is a specialized tool for speech recognition and transcription, offering optimized performance for audio processing tasks. It provides detailed control over transcription parameters and supports multiple languages.

oumi, on the other hand, is a more general-purpose AI assistant that can handle a variety of tasks, including audio transcription, through natural language instructions. While it may not offer the same level of specialization for audio processing, it provides a more versatile and user-friendly interface for various AI-related tasks.

The choice between the two depends on the specific requirements of the project, with faster-whisper being more suitable for dedicated audio transcription tasks and oumi offering broader AI capabilities.

Pros of insanely-fast-whisper

  • Focuses specifically on optimizing Whisper for speed, potentially offering faster transcription
  • Provides detailed benchmarks and comparisons with other Whisper implementations
  • Offers a simple command-line interface for easy use

Cons of insanely-fast-whisper

  • Limited to Whisper functionality, while oumi offers a broader range of AI capabilities
  • Less extensive documentation and community support compared to oumi
  • May require more technical expertise to set up and use effectively

Code Comparison

oumi:

from oumi import Oumi

oumi = Oumi()
response = oumi.chat("Tell me a joke")
print(response)

insanely-fast-whisper:

from faster_whisper import WhisperModel

model = WhisperModel("large-v2", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.mp3", beam_size=5)

The code snippets demonstrate the different focus areas of each project. oumi provides a more general-purpose AI interface, while insanely-fast-whisper is specifically tailored for audio transcription using an optimized Whisper model.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

# Oumi: Open Universal Machine Intelligence

Documentation Blog Discord PyPI version License Tests GPU Tests GitHub Repo stars Code style: black pre-commit About

Everything you need to build state-of-the-art foundation models, end-to-end.

GitHub trending

Oumi is a fully open-source platform that streamlines the entire lifecycle of foundation models - from data preparation and training to evaluation and deployment. Whether you're developing on a laptop, launching large scale experiments on a cluster, or deploying models in production, Oumi provides the tools and workflows you need.

With Oumi, you can:

  • 🚀 Train and fine-tune models from 10M to 405B parameters using state-of-the-art techniques (SFT, LoRA, QLoRA, DPO, and more)
  • 🤖 Work with both text and multimodal models (Llama, DeepSeek, Qwen, Phi, and others)
  • 🔄 Synthesize and curate training data with LLM judges
  • ⚡️ Deploy models efficiently with popular inference engines (vLLM, SGLang)
  • 📊 Evaluate models comprehensively across standard benchmarks
  • 🌎 Run anywhere - from laptops to clusters to clouds (AWS, Azure, GCP, Lambda, and more)
  • 🔌 Integrate with both open models and commercial APIs (OpenAI, Anthropic, Vertex AI, Together, Parasail, ...)

All with one consistent API, production-grade reliability, and all the flexibility you need for research.

Learn more at oumi.ai, or jump right in with the quickstart guide.

🚀 Getting Started

NotebookTry in ColabGoal
🎯 Getting Started: A TourOpen In ColabQuick tour of core features: training, evaluation, inference, and job management
🔧 Model Finetuning GuideOpen In ColabEnd-to-end guide to LoRA tuning with data prep, training, and evaluation
📚 Model DistillationOpen In ColabGuide to distilling large models into smaller, efficient ones
📋 Model EvaluationOpen In ColabComprehensive model evaluation using Oumi's evaluation framework
☁️ Remote TrainingOpen In ColabLaunch and monitor training jobs on cloud (AWS, Azure, GCP, Lambda, etc.) platforms
📈 LLM-as-a-JudgeOpen In ColabFilter and curate training data with built-in judges

🔧 Usage

Installation

Installing oumi in your environment is straightforward:

# Install the package (CPU & NPU only)
pip install oumi  # For local development & testing

# OR, with GPU support (Requires Nvidia or AMD GPU)
pip install oumi[gpu]  # For GPU training

# To get the latest version, install from the source
pip install git+https://github.com/oumi-ai/oumi.git

For more advanced installation options, see the installation guide.

Oumi CLI

You can quickly use the oumi command to train, evaluate, and infer models using one of the existing recipes:

# Training
oumi train -c configs/recipes/smollm/sft/135m/quickstart_train.yaml

# Evaluation
oumi evaluate -c configs/recipes/smollm/evaluation/135m/quickstart_eval.yaml

# Inference
oumi infer -c configs/recipes/smollm/inference/135m_infer.yaml --interactive

For more advanced options, see the training, evaluation, inference, and llm-as-a-judge guides.

Running Jobs Remotely

You can run jobs remotely on cloud platforms (AWS, Azure, GCP, Lambda, etc.) using the oumi launch command:

# GCP
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml

# AWS
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud aws

# Azure
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud azure

# Lambda
oumi launch up -c configs/recipes/smollm/sft/135m/quickstart_gcp_job.yaml --resources.cloud lambda

Note: Oumi is in beta and under active development. The core features are stable, but some advanced features might change as the platform improves.

💻 Why use Oumi?

If you need a comprehensive platform for training, evaluating, or deploying models, Oumi is a great choice.

Here are some of the key features that make Oumi stand out:

  • 🔧 Zero Boilerplate: Get started in minutes with ready-to-use recipes for popular models and workflows. No need to write training loops or data pipelines.
  • 🏢 Enterprise-Grade: Built and validated by teams training models at scale
  • 🎯 Research Ready: Perfect for ML research with easily reproducible experiments, and flexible interfaces for customizing each component.
  • 🌐 Broad Model Support: Works with most popular model architectures - from tiny models to the largest ones, text-only to multimodal.
  • 🚀 SOTA Performance: Native support for distributed training techniques (FSDP, DDP) and optimized inference engines (vLLM, SGLang).
  • 🤝 Community First: 100% open source with an active community. No vendor lock-in, no strings attached.

📚 Examples & Recipes

Explore the growing collection of ready-to-use configurations for state-of-the-art models and training workflows:

Note: These configurations are not an exhaustive list of what's supported, simply examples to get you started. You can find a more exhaustive list of supported models, and datasets (supervised fine-tuning, pre-training, preference tuning, and vision-language finetuning) in the oumi documentation.

🐋 DeepSeek R1 Family

ModelExample Configurations
DeepSeek R1 671BInference (Together AI)
Distilled Llama 8BFFT • LoRA • QLoRA • Inference • Evaluation
Distilled Llama 70BFFT • LoRA • QLoRA • Inference • Evaluation
Distilled Qwen 1.5BFFT • LoRA • Inference • Evaluation
Distilled Qwen 32BLoRA • Inference • Evaluation

🦙 Llama Family

ModelExample Configurations
Llama 3.1 8BFFT • LoRA • QLoRA • Pre-training • Inference (vLLM) • Inference • Evaluation
Llama 3.1 70BFFT • LoRA • QLoRA • Inference • Evaluation
Llama 3.1 405BFFT • LoRA • QLoRA
Llama 3.2 1BFFT • LoRA • QLoRA • Inference (vLLM) • Inference (SGLang) • Inference • Evaluation
Llama 3.2 3BFFT • LoRA • QLoRA • Inference (vLLM) • Inference (SGLang) • Inference • Evaluation
Llama 3.3 70BFFT • LoRA • QLoRA • Inference (vLLM) • Inference • Evaluation
Llama 3.2 Vision 11BSFT • Inference (vLLM) • Inference (SGLang) • Evaluation

🎨 Vision Models

ModelExample Configurations
Llama 3.2 Vision 11BSFT • LoRA • Inference (vLLM) • Inference (SGLang) • Evaluation
LLaVA 7BSFT • Inference (vLLM) • Inference
Phi3 Vision 4.2BSFT • Inference (vLLM)
Qwen2-VL 2BSFT • Inference (vLLM) • Inference (SGLang) • Inference • Evaluation
SmolVLM-Instruct 2BSFT

🔍 Even more options

This section lists all the language models that can be used with Oumi. Thanks to the integration with the 🤗 Transformers library, you can easily use any of these models for training, evaluation, or inference.

Models prefixed with a checkmark (✅) have been thoroughly tested and validated by the Oumi community, with ready-to-use recipes available in the configs/recipes directory.

📋 Click to see more supported models

Instruct Models

ModelSizePaperHF HubLicenseOpen 1Recommended Parameters
✅ SmolLM-Instruct135M/360M/1.7BBlogHubApache 2.0✅
✅ DeepSeek R1 Family1.5B/8B/32B/70B/671BBlogHubMIT❌
✅ Llama 3.1 Instruct8B/70B/405BPaperHubLicense❌
✅ Llama 3.2 Instruct1B/3BPaperHubLicense❌
✅ Llama 3.3 Instruct70BPaperHubLicense❌
✅ Phi-3.5-Instruct4B/14BPaperHubLicense❌
Qwen2.5-Instruct0.5B-70BPaperHubLicense❌
OLMo 2 Instruct7BPaperHubApache 2.0✅
MPT-Instruct7BBlogHubApache 2.0✅
Command R35B/104BBlogHubLicense❌
Granite-3.1-Instruct2B/8BPaperHubApache 2.0❌
Gemma 2 Instruct2B/9BBlogHubLicense❌
DBRX-Instruct130B MoEBlogHubApache 2.0❌
Falcon-Instruct7B/40BPaperHubApache 2.0❌

Vision-Language Models

ModelSizePaperHF HubLicenseOpenRecommended Parameters
✅ Llama 3.2 Vision11BPaperHubLicense❌
✅ LLaVA-1.57BPaperHubLicense❌
✅ Phi-3 Vision4.2BPaperHubLicense❌
✅ BLIP-23.6BPaperHubMIT❌
✅ Qwen2-VL2BBlogHubLicense❌
✅ SmolVLM-Instruct2BBlogHubApache 2.0✅

Base Models

ModelSizePaperHF HubLicenseOpenRecommended Parameters
✅ SmolLM2135M/360M/1.7BBlogHubApache 2.0✅
✅ Llama 3.21B/3BPaperHubLicense❌
✅ Llama 3.18B/70B/405BPaperHubLicense❌
✅ GPT-2124M-1.5BPaperHubMIT✅
DeepSeek V27B/13BBlogHubLicense❌
Gemma22B/9BBlogHubLicense❌
GPT-J6BBlogHubApache 2.0✅
GPT-NeoX20BPaperHubApache 2.0✅
Mistral7BPaperHubApache 2.0❌
Mixtral8x7B/8x22BBlogHubApache 2.0❌
MPT7BBlogHubApache 2.0✅
OLMo1B/7BPaperHubApache 2.0✅

Reasoning Models

ModelSizePaperHF HubLicenseOpenRecommended Parameters
Qwen QwQ32BBlogHubLicense✅

Code Models

ModelSizePaperHF HubLicenseOpenRecommended Parameters
✅ Qwen2.5 Coder0.5B-32BBlogHubLicense❌
DeepSeek Coder1.3B-33BPaperHubLicense❌
StarCoder 23B/7B/15BPaperHubLicense✅

Math Models

ModelSizePaperHF HubLicenseOpenRecommended Parameters
DeepSeek Math7BPaperHubLicense❌

📖 Documentation

To learn more about all the platform's capabilities, see the Oumi documentation.

🤝 Join the Community!

Oumi is a community-first effort. Whether you are a developer, a researcher, or a non-technical user, all contributions are very welcome!

  • To contribute to the oumi repository, please check the CONTRIBUTING.md for guidance on how to contribute to send your first Pull Request.
  • Make sure to join our Discord community to get help, share your experiences, and contribute to the project!
  • If you are interested in joining one of the community's open-science efforts, check out our open collaboration page.

🙏 Acknowledgements

Oumi makes use of several libraries and tools from the open-source community. We would like to acknowledge and deeply thank the contributors of these projects! ✨ 🌟 💫

📝 Citation

If you find Oumi useful in your research, please consider citing it:

@software{oumi2025,
  author = {Oumi Community},
  title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models},
  month = {January},
  year = {2025},
  url = {https://github.com/oumi-ai/oumi}
}

📜 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Footnotes

  1. Open models are defined as models with fully open weights, training code, and data, and a permissive license. See Open Source Definitions for more information.