inference

Reference implementations of MLPerf™ inference benchmarks

1,403

555

1,403

229

View on GitHub

Top Related Projects

models

77,497

Models and examples built with TensorFlow

examples

23,172

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

DeepLearningExamples

14,199

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

onnxruntime

16,412

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

onnx

18,872

Open standard for machine learning interoperability

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Quick Overview

MLCommons/inference is an open-source repository for machine learning inference benchmarking. It provides a suite of benchmarks and tools to measure the performance of ML inference across various hardware platforms and software frameworks. The project aims to establish industry-standard metrics for evaluating ML inference systems.

Pros

Comprehensive benchmarking suite covering diverse ML tasks and models
Supports multiple hardware platforms and software frameworks
Regularly updated with new benchmarks and improvements
Promotes transparency and standardization in ML inference evaluation

Cons

Complex setup and configuration process for some benchmarks
Requires significant computational resources for running full benchmark suites
May not cover all emerging ML models or specialized use cases
Learning curve for understanding and interpreting benchmark results

Code Examples

# Example 1: Loading and running a ResNet50 model
from mlperf_inference_impl import ResNet50Benchmark

benchmark = ResNet50Benchmark()
results = benchmark.run()
print(results)

# Example 2: Configuring a custom benchmark scenario
from mlperf_inference_impl import BenchmarkConfiguration

config = BenchmarkConfiguration(
    model_name="bert-large",
    scenario="Server",
    batch_size=64,
    target_qps=1000
)

# Example 3: Submitting benchmark results
from mlperf_inference_impl import SubmissionChecker

checker = SubmissionChecker()
is_valid = checker.validate_submission("path/to/results")
print(f"Submission is valid: {is_valid}")

Getting Started

To get started with MLCommons/inference:

Clone the repository:

git clone https://github.com/mlcommons/inference.git
cd inference

Install dependencies:
```
pip install -r requirements.txt
```

Choose a benchmark and run it:

python3 run.py --benchmark=resnet50 --scenario=SingleStream

Review the results in the generated output directory.

For more detailed instructions and advanced usage, refer to the documentation in the repository.

Competitor Comparisons

models

77,497

Models and examples built with TensorFlow

Pros of models

Extensive collection of pre-trained models for various tasks
Well-documented and maintained by Google's TensorFlow team
Includes tutorials and examples for easy implementation

Cons of models

Primarily focused on TensorFlow, limiting flexibility for other frameworks
May have a steeper learning curve for beginners due to its comprehensive nature

Code Comparison

models:

import tensorflow as tf
from official.nlp import bert
model = bert.BertModel(config=bert_config)
outputs = model(input_ids, attention_mask=input_mask)

inference:

import mlperf_loadgen as lg
settings = lg.TestSettings()
settings.scenario = lg.TestScenario.SingleStream
sut = SystemUnderTest()
sut.issue_queries(query_samples)

Key Differences

models focuses on providing pre-trained models and implementations
inference emphasizes benchmarking and performance evaluation
models is TensorFlow-centric, while inference is framework-agnostic
inference is designed for standardized ML performance testing
models offers a wider range of model architectures and applications

Both repositories serve different purposes in the machine learning ecosystem, with models being more suitable for model development and implementation, while inference is tailored for standardized performance benchmarking across different hardware and software configurations.

examples

23,172

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Pros of examples

Broader range of examples covering various PyTorch use cases
More beginner-friendly with simpler implementations
Regularly updated with new PyTorch features and best practices

Cons of examples

Less focused on benchmarking and performance optimization
May not include industry-standard inference scenarios
Limited emphasis on cross-platform compatibility

Code Comparison

examples:

import torch
import torchvision.models as models

model = models.resnet18(pretrained=True)
input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)

inference:

import mlperf_loadgen as lg
import numpy as np

settings = lg.TestSettings()
settings.scenario = lg.TestScenario.SingleStream
settings.mode = lg.TestMode.PerformanceOnly
sut = lg.ConstructSUT(issue_queries, flush_queries, process_latencies)

The examples code showcases a simple model inference, while inference focuses on benchmarking setup using MLPerf LoadGen.

DeepLearningExamples

14,199

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Pros of DeepLearningExamples

Broader range of deep learning models and applications
More detailed documentation and tutorials for each example
Optimized for NVIDIA hardware, potentially offering better performance

Cons of DeepLearningExamples

Limited to NVIDIA-specific implementations and optimizations
May not provide standardized benchmarking across different hardware platforms
Less focus on inference-specific optimizations and techniques

Code Comparison

DeepLearningExamples (PyTorch ResNet50 inference):

model = torchvision.models.resnet50(pretrained=True).cuda().eval()
input_tensor = torch.randn(1, 3, 224, 224).cuda()
with torch.no_grad():
    output = model(input_tensor)

inference (MLPerf ResNet50 v1.5 inference):

model = tf.keras.applications.resnet50.ResNet50(weights='imagenet')
input_tensor = tf.random.normal([1, 224, 224, 3])
warmup_iterations = 10
for _ in range(warmup_iterations):
    _ = model(input_tensor)

Both repositories provide examples for deep learning inference, but DeepLearningExamples offers a wider range of models and applications, while inference focuses on standardized benchmarking and cross-platform compatibility. DeepLearningExamples is more tailored to NVIDIA hardware, potentially offering better performance on those platforms, but may be less suitable for cross-vendor comparisons.

onnxruntime

16,412

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

Broader scope: Supports a wide range of ML frameworks and hardware accelerators
Production-ready: Optimized for performance and deployment in various environments
Active development: Frequent updates and extensive documentation

Cons of ONNX Runtime

Steeper learning curve: More complex API due to its broader feature set
Larger footprint: Heavier library with more dependencies

Code Comparison

ONNX Runtime:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

MLCommons Inference:

from mlperf_loadgen import TestSettings, PerformanceOnly, LoadGenerator
from backend import Backend

settings = TestSettings()
settings.scenario = PerformanceOnly()
lg = LoadGenerator(settings)
lg.load(Backend())

Summary

ONNX Runtime offers a more comprehensive solution for ML model deployment across various frameworks and hardware, while MLCommons Inference focuses on benchmarking and standardizing ML inference performance. ONNX Runtime is better suited for production environments, while MLCommons Inference is ideal for performance testing and comparisons across different ML systems.

onnx

18,872

Open standard for machine learning interoperability

Pros of ONNX

Broader ecosystem support and wider adoption across various frameworks and tools
More comprehensive model representation, supporting a wider range of operations and architectures
Active development with frequent updates and improvements

Cons of ONNX

Steeper learning curve due to its more complex architecture and extensive feature set
May introduce overhead for simpler use cases or when working with specific frameworks

Code Comparison

ONNX example:

import onnx

# Create an ONNX model
model = onnx.ModelProto()
# Add nodes, inputs, outputs, etc.
onnx.save(model, "model.onnx")

MLCommons Inference example:

from mlperf_inference_src.loadgen import *

# Configure and run inference
settings = TestSettings()
scenario = TestScenario.SingleStream
load_generator = GenerateTest(settings, scenario)

While ONNX focuses on model representation and interoperability, MLCommons Inference emphasizes benchmarking and standardized testing for inference workloads. ONNX provides a more versatile format for exchanging models between different frameworks, while MLCommons Inference offers a structured approach to measuring and comparing inference performance across various hardware and software configurations.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Extensive library of pre-trained models for various NLP tasks
User-friendly API with easy-to-use abstractions for fine-tuning and inference
Active community and frequent updates with state-of-the-art models

Cons of transformers

Primarily focused on NLP tasks, less versatile for other domains
Can be resource-intensive for large models, requiring significant computational power

Code comparison

transformers:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)

inference:

import mlperf_loadgen as lg

settings = lg.TestSettings()
settings.scenario = lg.TestScenario.SingleStream
settings.mode = lg.TestMode.PerformanceOnly
sut = lg.ConstructSUT(issue_queries, flush_queries, process_latencies)
qsl = lg.ConstructQSL(total_sample_count, perf_sample_count, load_query_samples, unload_query_samples)
lg.StartTest(sut, qsl, settings)

The transformers library provides a high-level API for working with pre-trained models, while inference focuses on benchmarking and performance testing for machine learning models across various scenarios.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

MLPerfâ¢ Inference Benchmark Suite

MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios.

Please see the MLPerf Inference benchmark paper for a detailed description of the benchmarks along with the motivation and guiding principles behind the benchmark suite. If you use any part of this benchmark (e.g., reference implementations, submissions, etc.), please cite the following:

@misc{reddi2019mlperf,
    title={MLPerf Inference Benchmark},
    author={Vijay Janapa Reddi and Christine Cheng and David Kanter and Peter Mattson and Guenther Schmuelling and Carole-Jean Wu and Brian Anderson and Maximilien Breughe and Mark Charlebois and William Chou and Ramesh Chukka and Cody Coleman and Sam Davis and Pan Deng and Greg Diamos and Jared Duke and Dave Fick and J. Scott Gardner and Itay Hubara and Sachin Idgunji and Thomas B. Jablin and Jeff Jiao and Tom St. John and Pankaj Kanwar and David Lee and Jeffery Liao and Anton Lokhmotov and Francisco Massa and Peng Meng and Paulius Micikevicius and Colin Osborne and Gennady Pekhimenko and Arun Tejusve Raghunath Rajan and Dilip Sequeira and Ashish Sirasao and Fei Sun and Hanlin Tang and Michael Thomson and Frank Wei and Ephrem Wu and Lingjie Xu and Koichi Yamada and Bing Yu and George Yuan and Aaron Zhong and Peizhao Zhang and Yuchen Zhou},
    year={2019},
    eprint={1911.02549},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Please see here for the MLPerf inference documentation website which includes automated commands to run MLPerf inference benchmarks using different implementations.

MLPerf Inference v5.0 (submission deadline February 28, 2025)

For submissions, please use the master branch and any commit since the 5.0 seed release although it is best to use the latest commit in the master branch.

For power submissions please use SPEC PTD 1.11.1 (needs special access) and any commit of the power-dev repository after the code-freeze

model	reference app	framework	dataset	category
resnet50-v1.5	vision/classification_and_detection	tensorflow, onnx, tvm, ncnn	imagenet2012	edge,datacenter
retinanet 800x800	vision/classification_and_detection	pytorch, onnx	openimages resized to 800x800	edge,datacenter
bert	language/bert	tensorflow, pytorch, onnx	squad-1.1	edge
dlrm-v2	recommendation/dlrm_v2	pytorch	Multihot Criteo Terabyte	datacenter
3d-unet	vision/medical_imaging/3d-unet-kits19	pytorch, tensorflow, onnx	KiTS19	edge,datacenter
gpt-j	language/gpt-j	pytorch	CNN-Daily Mail	edge,datacenter
stable-diffusion-xl	text_to_image	pytorch	COCO 2014	edge,datacenter
llama2-70b	language/llama2-70b	pytorch	OpenOrca	datacenter
llama3.1-405b	language/llama3-405b	pytorch	LongBench, LongDataCollections, Ruler, GovReport	datacenter
mixtral-8x7b	language/mixtral-8x7b	pytorch	OpenOrca, MBXP, GSM8K	datacenter
rgat	graph/rgat	pytorch	IGBH	datacenter
pointpainting	automotive/3d-object-detection	pytorch, onnx	Waymo Open Dataset	edge

Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.

MLPerf Inference v4.1 (submission deadline July 26, 2024)

For submissions, please use the master branch and any commit since the 4.1 seed release although it is best to use the latest commit. v4.1 tag will be created from the master branch after the result publication.

For power submissions please use SPEC PTD 1.10 (needs special access) and any commit of the power-dev repository after the code-freeze

model	reference app	framework	dataset	category
resnet50-v1.5	vision/classification_and_detection	tensorflow, onnx, tvm, ncnn	imagenet2012	edge,datacenter
retinanet 800x800	vision/classification_and_detection	pytorch, onnx	openimages resized to 800x800	edge,datacenter
bert	language/bert	tensorflow, pytorch, onnx	squad-1.1	edge,datacenter
dlrm-v2	recommendation/dlrm_v2	pytorch	Multihot Criteo Terabyte	datacenter
3d-unet	vision/medical_imaging/3d-unet-kits19	pytorch, tensorflow, onnx	KiTS19	edge,datacenter
gpt-j	language/gpt-j	pytorch	CNN-Daily Mail	edge,datacenter
stable-diffusion-xl	text_to_image	pytorch	COCO 2014	edge,datacenter
llama2-70b	language/llama2-70b	pytorch	OpenOrca	datacenter
mixtral-8x7b	language/mixtral-8x7b	pytorch	OpenOrca, MBXP, GSM8K	datacenter

Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.

MLPerf Inference v4.0 (submission February 23, 2024)

There is an extra one-week extension allowed only for the llama2-70b submissions. For submissions, please use the master branch and any commit since the 4.0 seed release although it is best to use the latest commit. v4.0 tag will be created from the master branch after the result publication.

For power submissions please use SPEC PTD 1.10 (needs special access) and any commit of the power-dev repository after the code-freeze

model	reference app	framework	dataset	category
resnet50-v1.5	vision/classification_and_detection	tensorflow, onnx, tvm, ncnn	imagenet2012	edge,datacenter
retinanet 800x800	vision/classification_and_detection	pytorch, onnx	openimages resized to 800x800	edge,datacenter
bert	language/bert	tensorflow, pytorch, onnx	squad-1.1	edge,datacenter
dlrm-v2	recommendation/dlrm_v2	pytorch	Multihot Criteo Terabyte	datacenter
3d-unet	vision/medical_imaging/3d-unet-kits19	pytorch, tensorflow, onnx	KiTS19	edge,datacenter
rnnt	speech_recognition/rnnt	pytorch	OpenSLR LibriSpeech Corpus	edge,datacenter
gpt-j	language/gpt-j	pytorch	CNN-Daily Mail	edge,datacenter
stable-diffusion-xl	text_to_image	pytorch	COCO 2014	edge,datacenter
llama2-70b	language/llama2-70b	pytorch	OpenOrca	datacenter

Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.

MLPerf Inference v3.1 (submission August 18, 2023)

Please use v3.1 tag (git checkout v3.1) if you would like to reproduce the v3.1 results.

For reproducing power submissions please use the master branch of the MLCommons power-dev repository and checkout to e9e16b1299ef61a2a5d8b9abf5d759309293c440.

You can see the individual README files in the benchmark task folders for more details regarding the benchmarks. For reproducing the submitted results please see the README files under the respective submitter folders in the inference v3.1 results repository.

model	reference app	framework	dataset	category
resnet50-v1.5	vision/classification_and_detection	tensorflow, onnx, tvm, ncnn	imagenet2012	edge,datacenter
retinanet 800x800	vision/classification_and_detection	pytorch, onnx	openimages resized to 800x800	edge,datacenter
bert	language/bert	tensorflow, pytorch, onnx	squad-1.1	edge,datacenter
dlrm-v2	recommendation/dlrm_v2	pytorch	Multihot Criteo Terabyte	datacenter
3d-unet	vision/medical_imaging/3d-unet-kits19	pytorch, tensorflow, onnx	KiTS19	edge,datacenter
rnnt	speech_recognition/rnnt	pytorch	OpenSLR LibriSpeech Corpus	edge,datacenter
gpt-j	language/gpt-j	pytorch	CNN-Daily Mail	edge,datacenter

MLPerf Inference v3.0 (submission 03/03/2023)

Please use the v3.0 tag (git checkout v3.0) if you would like to reproduce v3.0 results.

You can see the individual Readme files in the reference app for more details.

model	reference app	framework	dataset	category
resnet50-v1.5	vision/classification_and_detection	tensorflow, onnx, tvm	imagenet2012	edge,datacenter
retinanet 800x800	vision/classification_and_detection	pytorch, onnx	openimages resized to 800x800	edge,datacenter
bert	language/bert	tensorflow, pytorch, onnx	squad-1.1	edge,datacenter
dlrm	recommendation/dlrm	pytorch, tensorflow	Criteo Terabyte	datacenter
3d-unet	vision/medical_imaging/3d-unet-kits19	pytorch, tensorflow, onnx	KiTS19	edge,datacenter
rnnt	speech_recognition/rnnt	pytorch	OpenSLR LibriSpeech Corpus	edge,datacenter