Convert Figma logo to code with AI

huggingface logooptimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

2,820
515
2,820
391

Top Related Projects

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

30,829

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

11,782

An open-source NLP research library, built on PyTorch.

PyTorch extensions for high performance and large scale training.

8,589

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

14,339

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Quick Overview

Optimum is an extension of the Hugging Face Transformers library, designed to provide hardware-specific optimizations for training and inference of transformer models. It offers a unified API for various hardware accelerators and optimization techniques, enabling users to easily deploy and optimize their models across different platforms.

Pros

  • Seamless integration with Hugging Face Transformers ecosystem
  • Support for multiple hardware accelerators (e.g., NVIDIA GPUs, Intel CPUs, Apple Silicon)
  • Easy-to-use API for model optimization and quantization
  • Improved performance and efficiency for transformer models

Cons

  • Limited to transformer-based models
  • May require additional hardware-specific dependencies
  • Learning curve for users unfamiliar with hardware optimization techniques
  • Some optimizations may not be available for all model architectures

Code Examples

  1. Loading and optimizing a model for Intel CPUs:
from optimum.intel import IPEXModel
from transformers import AutoTokenizer

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = IPEXModel.from_pretrained(model_name)

inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)
  1. Quantizing a model for faster inference:
from optimum.onnxruntime import ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig

quantizer = ORTQuantizer.from_pretrained(model_name)
qconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
quantized_model = quantizer.quantize(quantization_config=qconfig)
  1. Using Optimum with Apple Silicon:
from optimum.bettertransformer import BetterTransformer
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name)
bt_model = BetterTransformer.transform(model)

inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = bt_model(**inputs)

Getting Started

To get started with Optimum, follow these steps:

  1. Install Optimum:
pip install optimum
  1. Install hardware-specific dependencies (e.g., for Intel CPUs):
pip install optimum[intel]
  1. Use Optimum in your code:
from optimum.intel import IPEXModel
from transformers import AutoTokenizer

model_name = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = IPEXModel.from_pretrained(model_name)

# Your code for inference or fine-tuning

For more detailed information and advanced usage, refer to the Optimum documentation on the Hugging Face website.

Competitor Comparisons

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • More comprehensive optimization techniques, including ZeRO-Infinity for extreme model sizes
  • Offers advanced pipeline parallelism and 3D parallelism for distributed training
  • Provides more fine-grained control over optimization strategies

Cons of DeepSpeed

  • Steeper learning curve and more complex setup compared to Optimum
  • Less integrated with Hugging Face ecosystem and transformers library
  • May require more manual configuration for optimal performance

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
                                                     model=model,
                                                     model_parameters=params)

Optimum:

from optimum.deepspeed import DeepSpeedConfig
ds_config = DeepSpeedConfig(config_file_or_dict)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

Summary

DeepSpeed offers more advanced optimization techniques and fine-grained control, making it suitable for large-scale distributed training and extreme model sizes. However, it has a steeper learning curve and requires more manual configuration. Optimum, on the other hand, provides easier integration with the Hugging Face ecosystem and a simpler setup process, but may offer fewer advanced optimization options for extreme scenarios.

30,829

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • More comprehensive toolkit for sequence modeling tasks
  • Supports a wider range of architectures and models
  • Offers more advanced features for research and experimentation

Cons of fairseq

  • Steeper learning curve and more complex setup
  • Less focus on optimization and deployment
  • May require more manual configuration for specific tasks

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model')
translations = model.translate(['Hello world!'])

optimum:

from optimum.pipelines import pipeline
translator = pipeline("translation", model="t5-small")
result = translator("Hello world!", target_lang="fr")

Key Differences

  • fairseq provides more low-level control and customization options
  • optimum focuses on ease of use and optimization for various hardware
  • fairseq is better suited for research and advanced NLP tasks
  • optimum integrates seamlessly with Hugging Face's ecosystem
  • fairseq offers more flexibility in model architecture design
  • optimum provides better out-of-the-box performance optimization
11,782

An open-source NLP research library, built on PyTorch.

Pros of AllenNLP

  • More focused on research and experimentation in NLP
  • Provides a rich set of tools for building and evaluating complex NLP models
  • Offers a configuration-based approach for easy model definition and experimentation

Cons of AllenNLP

  • Steeper learning curve compared to Optimum
  • Less emphasis on optimization and deployment across different hardware
  • Smaller ecosystem and community compared to the Hugging Face ecosystem

Code Comparison

AllenNLP:

from allennlp.data import DatasetReader, Instance
from allennlp.data.fields import TextField
from allennlp.data.token_indexers import SingleIdTokenIndexer

class MyDatasetReader(DatasetReader):
    def _read(self, file_path: str) -> Iterable[Instance]:
        with open(file_path, "r") as f:
            for line in f:
                yield self.text_to_instance(line.strip())

Optimum:

from datasets import load_dataset
from optimum.onnxruntime import ORTModelForSequenceClassification

dataset = load_dataset("glue", "mrpc", split="train")
model = ORTModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english", export=True)

This comparison highlights the different focus areas of AllenNLP and Optimum. AllenNLP provides more flexibility for custom dataset creation and model architecture, while Optimum emphasizes ease of use and optimization for various hardware platforms within the Hugging Face ecosystem.

PyTorch extensions for high performance and large scale training.

Pros of FairScale

  • More focused on large-scale distributed training and optimization techniques
  • Offers advanced sharding strategies for model and optimizer states
  • Provides implementation of cutting-edge techniques like ZeRO and Fully Sharded Data Parallel

Cons of FairScale

  • Less integration with popular model architectures and frameworks
  • Steeper learning curve for users not familiar with distributed training concepts
  • More limited in scope compared to Optimum's broader optimization features

Code Comparison

FairScale example (Fully Sharded Data Parallel):

from fairscale.nn.data_parallel import FullyShardedDataParallel as FSDP

model = FSDP(model)

Optimum example (Quantization-Aware Training):

from optimum.intel import IPEXModel

model = IPEXModel.from_pretrained("bert-base-uncased")
model.prepare_for_qat()

Both libraries aim to improve model training and inference efficiency, but they approach it differently. FairScale focuses on distributed training and memory optimization, while Optimum provides a broader set of tools for various optimization techniques across different hardware platforms.

8,589

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Pros of Apex

  • Highly optimized for NVIDIA GPUs, offering better performance on supported hardware
  • Provides more low-level control over mixed precision training
  • Includes additional optimization techniques like LAMB optimizer and fused CUDA kernels

Cons of Apex

  • Limited to NVIDIA GPUs, reducing portability across different hardware
  • Requires manual installation and setup, which can be complex
  • Less frequently updated compared to Optimum

Code Comparison

Apex:

from apex import amp
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
with amp.scale_loss(loss, optimizer) as scaled_loss:
    scaled_loss.backward()

Optimum:

from optimum.bettertransformer import BetterTransformer
model = BetterTransformer.transform(model)
model.half()  # Enable mixed precision
loss.backward()

Both libraries aim to improve performance and efficiency in deep learning training, but they approach it differently. Apex focuses on NVIDIA-specific optimizations and mixed precision training, while Optimum provides a more hardware-agnostic solution with a focus on ease of use and integration with Hugging Face's ecosystem. Optimum offers a higher-level API that simplifies the process of applying optimizations, making it more accessible to users who may not need fine-grained control over the optimization process.

14,339

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Pros of Horovod

  • Specializes in distributed deep learning training across multiple GPUs and machines
  • Supports multiple deep learning frameworks (TensorFlow, PyTorch, MXNet)
  • Highly scalable and efficient for large-scale training tasks

Cons of Horovod

  • Steeper learning curve and more complex setup compared to Optimum
  • Limited focus on model optimization and deployment
  • Less integration with pre-trained models and datasets

Code Comparison

Horovod:

import horovod.tensorflow as hvd
hvd.init()
optimizer = tf.optimizers.Adam(0.001 * hvd.size())
optimizer = hvd.DistributedOptimizer(optimizer)

Optimum:

from optimum.onnxruntime import ORTTrainer
trainer = ORTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
)

Horovod focuses on distributed training across multiple GPUs, while Optimum emphasizes easy integration with Hugging Face's ecosystem and optimization for various hardware. Horovod provides more flexibility for large-scale distributed training, but Optimum offers a more user-friendly approach for optimizing and deploying models, especially those from the Transformers library.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ONNX Runtime

Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

Installation

🤗 Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

AcceleratorInstallation
ONNX Runtimepip install --upgrade --upgrade-strategy eager optimum[onnxruntime]
Intel Neural Compressorpip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
OpenVINOpip install --upgrade --upgrade-strategy eager optimum[openvino]
NVIDIA TensorRT-LLMdocker run -it --gpus all --ipc host huggingface/optimum-nvidia
AMD Instinct GPUs and Ryzen AI NPUpip install --upgrade --upgrade-strategy eager optimum[amd]
AWS Trainum & Inferentiapip install --upgrade --upgrade-strategy eager optimum[neuronx]
Habana Gaudi Processor (HPU)pip install --upgrade --upgrade-strategy eager optimum[habana]
FuriosaAIpip install --upgrade --upgrade-strategy eager optimum[furiosa]

The --upgrade --upgrade-strategy eager option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, append optimum[accelerator_type] to the above command:

python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Accelerated Inference

🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:

  • ONNX / ONNX Runtime
  • TensorFlow Lite
  • OpenVINO
  • Habana first-gen Gaudi / Gaudi2, more details here
  • AWS Inferentia 2 / Inferentia 1, more details here
  • NVIDIA TensorRT-LLM , more details here

The export and optimizations can be done both programmatically and with a command line.

ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters,onnxruntime]

It is possible to export 🤗 Transformers and Diffusers models to the ONNX format and perform graph optimization as well as quantization easily.

For more information on the ONNX export, please check the documentation.

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using ONNX Runtime in the backend.

More details on how to run ONNX models with ORTModelForXXX classes here.

TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[exporters-tf]

Just as for ONNX, it is possible to export models to TensorFlow Lite and quantize them. You can find more information in our documentation.

Intel (OpenVINO + Neural Compressor + IPEX)

Before you begin, make sure you have all the necessary libraries installed.

You can find more information on the different integration in our documentation and in the examples of optimum-intel.

Quanto

Quanto is a pytorch quantization backenb which allowss you to quantize a model either using the python API or the optimum-cli.

You can see more details and examples in the Quanto repository.

Accelerated training

🤗 Optimum provides wrappers around the original 🤗 Transformers Trainer to enable training on powerful hardware easily. We support many providers:

  • Habana's Gaudi processors
  • AWS Trainium instances, check here
  • ONNX Runtime (optimized for GPUs)

Habana

Before you begin, make sure you have all the necessary libraries installed :

pip install --upgrade --upgrade-strategy eager optimum[habana]

You can find examples in the documentation and in the examples.

ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

pip install optimum[onnxruntime-training]

You can find examples in the documentation and in the examples.