optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

2,960

553

2,960

311

View on GitHub

Top Related Projects

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

allennlp

11,862

An open-source NLP research library, built on PyTorch.

fairscale

3,351

PyTorch extensions for high performance and large scale training.

apex

8,693

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

horovod

14,559

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Quick Overview

Optimum is an extension of the Hugging Face Transformers library, designed to provide hardware-specific optimizations for training and inference of transformer models. It offers a unified API for various hardware accelerators and optimization techniques, enabling users to easily deploy and optimize their models across different platforms.

Pros

Seamless integration with Hugging Face Transformers ecosystem
Support for multiple hardware accelerators (e.g., NVIDIA GPUs, Intel CPUs, Apple Silicon)
Easy-to-use API for model optimization and quantization
Improved performance and efficiency for transformer models

Cons

Limited to transformer-based models
May require additional hardware-specific dependencies
Learning curve for users unfamiliar with hardware optimization techniques
Some optimizations may not be available for all model architectures

Code Examples

Loading and optimizing a model for Intel CPUs:

from optimum.intel import IPEXModel
from transformers import AutoTokenizer

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = IPEXModel.from_pretrained(model_name)

inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)

Quantizing a model for faster inference:

from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig

quantizer = ORTQuantizer.from_pretrained(model_name)
qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
quantized_model = quantizer.quantize(quantization_config=qconfig)

Using Optimum with Apple Silicon:

from optimum.bettertransformer import BetterTransformer
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name)
bt_model = BetterTransformer.transform(model)

inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = bt_model(**inputs)

Getting Started

To get started with Optimum, follow these steps:

Install Optimum:

pip install optimum

Install hardware-specific dependencies (e.g., for Intel CPUs):

pip install optimum[intel]

Use Optimum in your code:

from optimum.intel import IPEXModel
from transformers import AutoTokenizer

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = IPEXModel.from_pretrained(model_name)

# Your code for inference or fine-tuning

For more detailed information and advanced usage, refer to the Optimum documentation on the Hugging Face website.

Competitor Comparisons

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

More comprehensive optimization techniques, including ZeRO-Infinity for extreme model sizes
Offers advanced pipeline parallelism and 3D parallelism for distributed training
Provides more fine-grained control over optimization strategies

Cons of DeepSpeed

Steeper learning curve and more complex setup compared to Optimum
Less integrated with Hugging Face ecosystem and transformers library
May require more manual configuration for optimal performance

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
                                                     model=model,
                                                     model_parameters=params)

Optimum:

from optimum.deepspeed import DeepSpeedConfig
ds_config = DeepSpeedConfig(config_file_or_dict)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

Summary

DeepSpeed offers more advanced optimization techniques and fine-grained control, making it suitable for large-scale distributed training and extreme model sizes. However, it has a steeper learning curve and requires more manual configuration. Optimum, on the other hand, provides easier integration with the Hugging Face ecosystem and a simpler setup process, but may offer fewer advanced optimization options for extreme scenarios.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

More comprehensive toolkit for sequence modeling tasks
Supports a wider range of architectures and models
Offers more advanced features for research and experimentation

Cons of fairseq

Steeper learning curve and more complex setup
Less focus on optimization and deployment
May require more manual configuration for specific tasks

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model')
translations = model.translate(['Hello world!'])

optimum:

from optimum.pipelines import pipeline
translator = pipeline("translation", model="t5-small")
result = translator("Hello world!", target_lang="fr")

Key Differences

fairseq provides more low-level control and customization options
optimum focuses on ease of use and optimization for various hardware
fairseq is better suited for research and advanced NLP tasks
optimum integrates seamlessly with Hugging Face's ecosystem
fairseq offers more flexibility in model architecture design
optimum provides better out-of-the-box performance optimization

allennlp

11,862

An open-source NLP research library, built on PyTorch.

Pros of AllenNLP

More focused on research and experimentation in NLP
Provides a rich set of tools for building and evaluating complex NLP models
Offers a configuration-based approach for easy model definition and experimentation

Cons of AllenNLP

Steeper learning curve compared to Optimum
Less emphasis on optimization and deployment across different hardware
Smaller ecosystem and community compared to the Hugging Face ecosystem

Code Comparison

AllenNLP:

from allennlp.data import DatasetReader, Instance
from allennlp.data.fields import TextField
from allennlp.data.token_indexers import SingleIdTokenIndexer

class MyDatasetReader(DatasetReader):
    def _read(self, file_path: str) -> Iterable[Instance]:
        with open(file_path, "r") as f:
            for line in f:
                yield self.text_to_instance(line.strip())

Optimum:

from datasets import load_dataset
from optimum.onnxruntime import ORTModelForSequenceClassification

dataset = load_dataset("glue", "mrpc", split="train")
model = ORTModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", export=True)

This comparison highlights the different focus areas of AllenNLP and Optimum. AllenNLP provides more flexibility for custom dataset creation and model architecture, while Optimum emphasizes ease of use and optimization for various hardware platforms within the Hugging Face ecosystem.

fairscale

3,351

PyTorch extensions for high performance and large scale training.

Pros of FairScale

More focused on large-scale distributed training and optimization techniques
Offers advanced sharding strategies for model and optimizer states
Provides implementation of cutting-edge techniques like ZeRO and Fully Sharded Data Parallel

Cons of FairScale

Less integration with popular model architectures and frameworks
Steeper learning curve for users not familiar with distributed training concepts
More limited in scope compared to Optimum's broader optimization features

Code Comparison

FairScale example (Fully Sharded Data Parallel):

from fairscale.nn.data_parallel import FullyShardedDataParallel as FSDP

model = FSDP(model)

Optimum example (Quantization-Aware Training):

from optimum.intel import IPEXModel

model = IPEXModel.from_pretrained("bert-base-uncased")
model.prepare_for_qat()

Both libraries aim to improve model training and inference efficiency, but they approach it differently. FairScale focuses on distributed training and memory optimization, while Optimum provides a broader set of tools for various optimization techniques across different hardware platforms.

apex

8,693

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Pros of Apex

Highly optimized for NVIDIA GPUs, offering better performance on supported hardware
Provides more low-level control over mixed precision training
Includes additional optimization techniques like LAMB optimizer and fused CUDA kernels

Cons of Apex

Limited to NVIDIA GPUs, reducing portability across different hardware
Requires manual installation and setup, which can be complex
Less frequently updated compared to Optimum

Code Comparison

Apex:

from apex import amp
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
with amp.scale_loss(loss, optimizer) as scaled_loss:
    scaled_loss.backward()

Optimum:

from optimum.bettertransformer import BetterTransformer
model = BetterTransformer.transform(model)
model.half()  # Enable mixed precision
loss.backward()

Both libraries aim to improve performance and efficiency in deep learning training, but they approach it differently. Apex focuses on NVIDIA-specific optimizations and mixed precision training, while Optimum provides a more hardware-agnostic solution with a focus on ease of use and integration with Hugging Face's ecosystem. Optimum offers a higher-level API that simplifies the process of applying optimizations, making it more accessible to users who may not need fine-grained control over the optimization process.

horovod

14,559

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Pros of Horovod

Specializes in distributed deep learning training across multiple GPUs and machines
Supports multiple deep learning frameworks (TensorFlow, PyTorch, MXNet)
Highly scalable and efficient for large-scale training tasks

Cons of Horovod

Steeper learning curve and more complex setup compared to Optimum
Limited focus on model optimization and deployment
Less integration with pre-trained models and datasets

Code Comparison

Horovod:

import horovod.tensorflow as hvd
hvd.init()
optimizer = tf.optimizers.Adam(0.001 * hvd.size())
optimizer = hvd.DistributedOptimizer(optimizer)

Optimum:

from optimum.onnxruntime import ORTTrainer
trainer = ORTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

Horovod focuses on distributed training across multiple GPUs, while Optimum emphasizes easy integration with Hugging Face's ecosystem and optimization for various hardware. Horovod provides more flexibility for large-scale distributed training, but Optimum offers a more user-friendly approach for optimizing and deploying models, especially those from the Transformers library.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ð¤ Optimum

Optimum is an extension of Transformers ð¤ Diffusers ð§¨ TIMM ð¼ï¸ and Sentence-Transformers ð¤, providing a set of optimization tools and enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

Installation

Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of Optimum, you can check the documentation and install the required dependencies according to the table below:

Accelerator	Installation
ONNX Runtime	`pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]`
Intel Neural Compressor	`pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`
OpenVINO	`pip install --upgrade --upgrade-strategy eager optimum[openvino]`
IPEX	`pip install --upgrade --upgrade-strategy eager optimum[ipex]`
NVIDIA TensorRT-LLM	`docker run -it --gpus all --ipc host huggingface/optimum-nvidia`
AMD Instinct GPUs and Ryzen AI NPU	`pip install --upgrade --upgrade-strategy eager optimum[amd]`
AWS Trainum & Inferentia	`pip install --upgrade --upgrade-strategy eager optimum[neuronx]`
Intel Gaudi Accelerators (HPU)	`pip install --upgrade --upgrade-strategy eager optimum[habana]`
FuriosaAI	`pip install --upgrade --upgrade-strategy eager optimum[furiosa]`

The --upgrade --upgrade-strategy eager option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, append optimum[accelerator_type] to the above command:

python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Accelerated Inference

Optimum provides multiple tools to export and run optimized models on various ecosystems:

ONNX / ONNX Runtime, one of the most popular open formats for model export, and a high-performance inference engine for deployment.
OpenVINO, a toolkit for optimizing, quantizing and deploying deep learning models on Intel hardware.
ExecuTorch, PyTorchâs native solution for on-device inference across mobile and edge devices.
TensorFlow Lite, a lightweight solution for running TensorFlow models on mobile and edge.
Intel Gaudi Accelerators enabling optimal performance on first-gen Gaudi, Gaudi2 and Gaudi3.
AWS Inferentia for accelerated inference on Inf2 and Inf1 instances.
NVIDIA TensorRT-LLM.

The export and optimizations can be done both programmatically and with a command line.

ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters,onnxruntime]

It is possible to export Transformers and Diffusers models to the ONNX format and perform graph optimization as well as quantization easily.

For more information on the ONNX export, please check the documentation.

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend.

More details on how to run ONNX models with ORTModelForXXX classes here.

Intel (OpenVINO + Neural Compressor + IPEX)

Before you begin, make sure you have all the necessary libraries installed.

You can find more information on the different integration in our documentation and in the examples of optimum-intel.

ExecuTorch

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum-executorch@git+https://github.com/huggingface/optimum-executorch.git

Users can export Transformers models to ExecuTorch and run inference on edge devices within PyTorch's ecosystem.

For more information about export Transformers to ExecuTorch, please check the doc for Optimum-ExecuTorch.

TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters-tf]

Just as for ONNX, it is possible to export models to TensorFlow Lite and quantize them. You can find more information in our documentation.

Quanto

Quanto is a pytorch quantization backend which allows you to quantize a model either using the python API or the optimum-cli.

You can see more details and examples in the Quanto repository.

Accelerated training

Optimum provides wrappers around the original Transformers Trainer to enable training on powerful hardware easily. We support many providers:

Intel Gaudi Accelerators (HPU) enabling optimal performance on first-gen Gaudi, Gaudi2 and Gaudi3.
AWS Trainium for accelerated training on Trn1 and Trn1n instances.
ONNX Runtime (optimized for GPUs).

Intel Gaudi Accelerators

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade --upgrade-strategy eager optimum[habana]

You can find examples in the documentation and in the examples.

AWS Trainium

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade --upgrade-strategy eager optimum[neuronx]

You can find examples in the documentation and in the tutorials.

ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[onnxruntime-training]

You can find examples in the documentation and in the examples.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of DeepSpeed

Cons of DeepSpeed

Code Comparison

Summary

Pros of fairseq

Cons of fairseq

Code Comparison

Key Differences

Pros of AllenNLP

Cons of AllenNLP

Code Comparison

Pros of FairScale

Cons of FairScale

Code Comparison

Pros of Apex

Cons of Apex

Code Comparison

Pros of Horovod

Cons of Horovod

Code Comparison

Convert designs to code with AI

README

ð¤ Optimum

Installation

Accelerated Inference

ONNX + ONNX Runtime

Intel (OpenVINO + Neural Compressor + IPEX)

ExecuTorch

TensorFlow Lite

Quanto

Accelerated training

Intel Gaudi Accelerators

AWS Trainium

ONNX Runtime

Top Related Projects

Convert designs to code with AI

ð¤ Optimum