onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Open deep learning compiler stack for cpu, gpu and specialized accelerators
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Quick Overview
ONNX Runtime is a cross-platform, high-performance machine learning inference and training accelerator. It's designed to optimize and accelerate machine learning models across various hardware platforms and operating systems, supporting models from popular frameworks like PyTorch, TensorFlow, and scikit-learn.
Pros
- Improved performance and reduced inference time for machine learning models
- Wide compatibility with various ML frameworks and hardware platforms
- Automatic optimization of models for specific hardware
- Supports both CPU and GPU acceleration
Cons
- Learning curve for integration into existing ML pipelines
- Limited support for some specialized or custom operations
- Potential compatibility issues with older model versions
- May require model conversion for some frameworks
Code Examples
- Loading and running an ONNX model:
import onnxruntime as ort
import numpy as np
# Load the ONNX model
session = ort.InferenceSession("model.onnx")
# Prepare input data
input_name = session.get_inputs()[0].name
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
# Run inference
output = session.run(None, {input_name: input_data})
- Converting a PyTorch model to ONNX:
import torch
import torch.nn as nn
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.fc = nn.Linear(10, 5)
def forward(self, x):
return self.fc(x)
model = SimpleModel()
dummy_input = torch.randn(1, 10)
torch.onnx.export(model, dummy_input, "simple_model.onnx")
- Quantizing an ONNX model:
import onnx
from onnxruntime.quantization import quantize_dynamic
# Load the ONNX model
model = onnx.load("model.onnx")
# Quantize the model
quantized_model = quantize_dynamic(model, weight_type=QuantType.QUInt8)
# Save the quantized model
onnx.save(quantized_model, "quantized_model.onnx")
Getting Started
To get started with ONNX Runtime, follow these steps:
- Install ONNX Runtime:
pip install onnxruntime
- Load and run an ONNX model:
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("path/to/your/model.onnx")
input_name = session.get_inputs()[0].name
input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)
output = session.run(None, {input_name: input_data})
- For GPU acceleration, install the GPU version:
pip install onnxruntime-gpu
Competitor Comparisons
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Larger ecosystem with more tools, libraries, and community support
- Better support for distributed and large-scale machine learning
- More comprehensive documentation and tutorials
Cons of TensorFlow
- Steeper learning curve, especially for beginners
- Slower execution speed for some operations compared to ONNX Runtime
- Larger file size and memory footprint
Code Comparison
TensorFlow:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
ONNX Runtime:
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})
Both repositories provide powerful frameworks for machine learning and deep learning. TensorFlow offers a more comprehensive ecosystem with extensive tools and libraries, making it suitable for complex projects and research. However, it comes with a steeper learning curve and potentially slower execution for some operations.
ONNX Runtime, on the other hand, focuses on providing a lightweight and efficient inference engine for various machine learning models. It offers faster execution speed for certain operations and easier deployment across different platforms, but may have a smaller ecosystem compared to TensorFlow.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- More flexible and dynamic computational graph, allowing for easier debugging and experimentation
- Extensive ecosystem with a wide range of pre-trained models and libraries
- Strong community support and frequent updates
Cons of PyTorch
- Generally slower inference speed compared to ONNX Runtime
- Larger model file sizes, which can be a concern for deployment on edge devices
- Steeper learning curve for beginners due to its dynamic nature
Code Comparison
PyTorch:
import torch
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.add(x, y)
ONNX Runtime:
import onnxruntime as ort
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
sess = ort.InferenceSession("model.onnx")
z = sess.run(None, {"input1": x, "input2": y})[0]
The code examples demonstrate the difference in approach between PyTorch's dynamic computation and ONNX Runtime's static graph execution. PyTorch allows for more intuitive tensor operations, while ONNX Runtime requires a pre-defined model and explicit input/output handling.
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Pros of Core ML Tools
- Specifically designed for Apple platforms, offering seamless integration with iOS, macOS, and other Apple devices
- Provides tools for converting models from various frameworks (TensorFlow, Keras, scikit-learn) to Core ML format
- Supports on-device machine learning, optimizing for performance and privacy on Apple hardware
Cons of Core ML Tools
- Limited to Apple ecosystem, lacking cross-platform support
- Fewer supported model types and operations compared to ONNX Runtime
- Smaller community and ecosystem compared to the more widely-used ONNX format
Code Comparison
Core ML Tools conversion example:
import coremltools as ct
keras_model = ... # Your Keras model
coreml_model = ct.convert(keras_model)
coreml_model.save("model.mlmodel")
ONNX Runtime inference example:
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})
Both libraries serve different purposes: Core ML Tools focuses on model conversion for Apple platforms, while ONNX Runtime is a cross-platform inference engine. Choose based on your target platform and specific requirements.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Pros of JAX
- Offers automatic differentiation and GPU/TPU acceleration
- Provides a more flexible and customizable framework for machine learning research
- Supports functional programming paradigms, enabling easier composition of operations
Cons of JAX
- Steeper learning curve compared to ONNX Runtime
- Less optimized for production deployment and inference
- Smaller ecosystem and fewer pre-built models available
Code Comparison
ONNX Runtime example:
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})
JAX example:
import jax.numpy as jnp
from jax import grad, jit
def loss_fn(x):
return jnp.sum(x**2)
grad_fn = jit(grad(loss_fn))
result = grad_fn(jnp.array([1.0, 2.0, 3.0]))
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Pros of TVM
- More flexible and customizable for different hardware targets
- Supports a wider range of deep learning frameworks
- Offers advanced graph-level optimizations
Cons of TVM
- Steeper learning curve and more complex to use
- Less mature and stable compared to ONNX Runtime
- Smaller community and ecosystem
Code Comparison
TVM example:
import tvm
from tvm import relay
# Define a simple network
data = relay.var("data", relay.TensorType((1, 3, 224, 224), "float32"))
weight = relay.var("weight")
conv2d = relay.nn.conv2d(data, weight)
func = relay.Function([data, weight], conv2d)
# Compile the network
target = "llvm"
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(func, target)
ONNX Runtime example:
import onnxruntime as ort
import numpy as np
# Load pre-trained ONNX model
session = ort.InferenceSession("model.onnx")
# Prepare input data
input_data = np.random.rand(1, 3, 224, 224).astype(np.float32)
# Run inference
output = session.run(None, {"input": input_data})
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Pros of OpenBLAS
- Highly optimized linear algebra operations for various CPU architectures
- Lightweight and focused on BLAS (Basic Linear Algebra Subprograms) functionality
- Open-source with a strong community and long-standing reputation in scientific computing
Cons of OpenBLAS
- Limited to CPU operations, lacking GPU support unlike ONNX Runtime
- Narrower scope, focusing primarily on linear algebra operations rather than a full machine learning inference framework
- May require more manual integration and optimization for complex ML workflows
Code Comparison
OpenBLAS (C):
#include <cblas.h>
double x[] = {1, 2, 3, 4};
double y[] = {5, 6, 7, 8};
cblas_daxpy(4, 2.0, x, 1, y, 1);
ONNX Runtime (Python):
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: np.array([1, 2, 3, 4]).astype(np.float32)})
While OpenBLAS excels in optimized linear algebra operations, ONNX Runtime provides a more comprehensive solution for machine learning inference across various hardware platforms. OpenBLAS is ideal for projects requiring high-performance linear algebra, while ONNX Runtime is better suited for end-to-end ML deployment and inference tasks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ONNX Runtime is a cross-platform inference and training machine-learning accelerator.
ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →
ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →
Get Started & Resources
-
General Information: onnxruntime.ai
-
Usage documentation and tutorials: onnxruntime.ai/docs
-
YouTube video tutorials: youtube.com/@ONNXRuntime
-
Companion sample repositories:
- ONNX Runtime Inferencing: microsoft/onnxruntime-inference-examples
- ONNX Runtime Training: microsoft/onnxruntime-training-examples
Builtin Pipeline Status
System | Inference | Training |
---|---|---|
Windows | ||
Linux | ||
Mac | ||
Android | ||
iOS | ||
Web | ||
Other |
Third-party Pipeline Status
System | Inference | Training |
---|---|---|
Linux |
Data/Telemetry
Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.
Contributions and Feedback
We welcome contributions! Please see the contribution guidelines.
For feature requests or bug reports, please file a GitHub Issue.
For general discussion or questions, please use GitHub Discussions.
Code of Conduct
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
License
This project is licensed under the MIT License.
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Open deep learning compiler stack for cpu, gpu and specialized accelerators
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot