Convert Figma logo to code with AI

mlcommons logoinference

Reference implementations of MLPerf™ inference benchmarks

1,329
545
1,329
211

Top Related Projects

77,312

Models and examples built with TensorFlow

22,863

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

18,282

Open standard for machine learning interoperability

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Quick Overview

MLCommons/inference is an open-source repository for machine learning inference benchmarking. It provides a suite of benchmarks and tools to measure the performance of ML inference across various hardware platforms and software frameworks. The project aims to establish industry-standard metrics for evaluating ML inference systems.

Pros

  • Comprehensive benchmarking suite covering diverse ML tasks and models
  • Supports multiple hardware platforms and software frameworks
  • Regularly updated with new benchmarks and improvements
  • Promotes transparency and standardization in ML inference evaluation

Cons

  • Complex setup and configuration process for some benchmarks
  • Requires significant computational resources for running full benchmark suites
  • May not cover all emerging ML models or specialized use cases
  • Learning curve for understanding and interpreting benchmark results

Code Examples

# Example 1: Loading and running a ResNet50 model
from mlperf_inference_impl import ResNet50Benchmark

benchmark = ResNet50Benchmark()
results = benchmark.run()
print(results)
# Example 2: Configuring a custom benchmark scenario
from mlperf_inference_impl import BenchmarkConfiguration

config = BenchmarkConfiguration(
    model_name="bert-large",
    scenario="Server",
    batch_size=64,
    target_qps=1000
)
# Example 3: Submitting benchmark results
from mlperf_inference_impl import SubmissionChecker

checker = SubmissionChecker()
is_valid = checker.validate_submission("path/to/results")
print(f"Submission is valid: {is_valid}")

Getting Started

To get started with MLCommons/inference:

  1. Clone the repository:

    git clone https://github.com/mlcommons/inference.git
    cd inference
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Choose a benchmark and run it:

    python3 run.py --benchmark=resnet50 --scenario=SingleStream
    
  4. Review the results in the generated output directory.

For more detailed instructions and advanced usage, refer to the documentation in the repository.

Competitor Comparisons

77,312

Models and examples built with TensorFlow

Pros of models

  • Extensive collection of pre-trained models for various tasks
  • Well-documented and maintained by Google's TensorFlow team
  • Includes tutorials and examples for easy implementation

Cons of models

  • Primarily focused on TensorFlow, limiting flexibility for other frameworks
  • May have a steeper learning curve for beginners due to its comprehensive nature

Code Comparison

models:

import tensorflow as tf
from official.nlp import bert
model = bert.BertModel(config=bert_config)
outputs = model(input_ids, attention_mask=input_mask)

inference:

import mlperf_loadgen as lg
settings = lg.TestSettings()
settings.scenario = lg.TestScenario.SingleStream
sut = SystemUnderTest()
sut.issue_queries(query_samples)

Key Differences

  • models focuses on providing pre-trained models and implementations
  • inference emphasizes benchmarking and performance evaluation
  • models is TensorFlow-centric, while inference is framework-agnostic
  • inference is designed for standardized ML performance testing
  • models offers a wider range of model architectures and applications

Both repositories serve different purposes in the machine learning ecosystem, with models being more suitable for model development and implementation, while inference is tailored for standardized performance benchmarking across different hardware and software configurations.

22,863

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Pros of examples

  • Broader range of examples covering various PyTorch use cases
  • More beginner-friendly with simpler implementations
  • Regularly updated with new PyTorch features and best practices

Cons of examples

  • Less focused on benchmarking and performance optimization
  • May not include industry-standard inference scenarios
  • Limited emphasis on cross-platform compatibility

Code Comparison

examples:

import torch
import torchvision.models as models

model = models.resnet18(pretrained=True)
input_tensor = torch.randn(1, 3, 224, 224)
output = model(input_tensor)

inference:

import mlperf_loadgen as lg
import numpy as np

settings = lg.TestSettings()
settings.scenario = lg.TestScenario.SingleStream
settings.mode = lg.TestMode.PerformanceOnly
sut = lg.ConstructSUT(issue_queries, flush_queries, process_latencies)

The examples code showcases a simple model inference, while inference focuses on benchmarking setup using MLPerf LoadGen.

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Pros of DeepLearningExamples

  • Broader range of deep learning models and applications
  • More detailed documentation and tutorials for each example
  • Optimized for NVIDIA hardware, potentially offering better performance

Cons of DeepLearningExamples

  • Limited to NVIDIA-specific implementations and optimizations
  • May not provide standardized benchmarking across different hardware platforms
  • Less focus on inference-specific optimizations and techniques

Code Comparison

DeepLearningExamples (PyTorch ResNet50 inference):

model = torchvision.models.resnet50(pretrained=True).cuda().eval()
input_tensor = torch.randn(1, 3, 224, 224).cuda()
with torch.no_grad():
    output = model(input_tensor)

inference (MLPerf ResNet50 v1.5 inference):

model = tf.keras.applications.resnet50.ResNet50(weights='imagenet')
input_tensor = tf.random.normal([1, 224, 224, 3])
warmup_iterations = 10
for _ in range(warmup_iterations):
    _ = model(input_tensor)

Both repositories provide examples for deep learning inference, but DeepLearningExamples offers a wider range of models and applications, while inference focuses on standardized benchmarking and cross-platform compatibility. DeepLearningExamples is more tailored to NVIDIA hardware, potentially offering better performance on those platforms, but may be less suitable for cross-vendor comparisons.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

  • Broader scope: Supports a wide range of ML frameworks and hardware accelerators
  • Production-ready: Optimized for performance and deployment in various environments
  • Active development: Frequent updates and extensive documentation

Cons of ONNX Runtime

  • Steeper learning curve: More complex API due to its broader feature set
  • Larger footprint: Heavier library with more dependencies

Code Comparison

ONNX Runtime:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

MLCommons Inference:

from mlperf_loadgen import TestSettings, PerformanceOnly, LoadGenerator
from backend import Backend

settings = TestSettings()
settings.scenario = PerformanceOnly()
lg = LoadGenerator(settings)
lg.load(Backend())

Summary

ONNX Runtime offers a more comprehensive solution for ML model deployment across various frameworks and hardware, while MLCommons Inference focuses on benchmarking and standardizing ML inference performance. ONNX Runtime is better suited for production environments, while MLCommons Inference is ideal for performance testing and comparisons across different ML systems.

18,282

Open standard for machine learning interoperability

Pros of ONNX

  • Broader ecosystem support and wider adoption across various frameworks and tools
  • More comprehensive model representation, supporting a wider range of operations and architectures
  • Active development with frequent updates and improvements

Cons of ONNX

  • Steeper learning curve due to its more complex architecture and extensive feature set
  • May introduce overhead for simpler use cases or when working with specific frameworks

Code Comparison

ONNX example:

import onnx

# Create an ONNX model
model = onnx.ModelProto()
# Add nodes, inputs, outputs, etc.
onnx.save(model, "model.onnx")

MLCommons Inference example:

from mlperf_inference_src.loadgen import *

# Configure and run inference
settings = TestSettings()
scenario = TestScenario.SingleStream
load_generator = GenerateTest(settings, scenario)

While ONNX focuses on model representation and interoperability, MLCommons Inference emphasizes benchmarking and standardized testing for inference workloads. ONNX provides a more versatile format for exchanging models between different frameworks, while MLCommons Inference offers a structured approach to measuring and comparing inference performance across various hardware and software configurations.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of transformers

  • Extensive library of pre-trained models for various NLP tasks
  • User-friendly API with easy-to-use abstractions for fine-tuning and inference
  • Active community and frequent updates with state-of-the-art models

Cons of transformers

  • Primarily focused on NLP tasks, less versatile for other domains
  • Can be resource-intensive for large models, requiring significant computational power

Code comparison

transformers:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)

inference:

import mlperf_loadgen as lg

settings = lg.TestSettings()
settings.scenario = lg.TestScenario.SingleStream
settings.mode = lg.TestMode.PerformanceOnly
sut = lg.ConstructSUT(issue_queries, flush_queries, process_latencies)
qsl = lg.ConstructQSL(total_sample_count, perf_sample_count, load_query_samples, unload_query_samples)
lg.StartTest(sut, qsl, settings)

The transformers library provides a high-level API for working with pre-trained models, while inference focuses on benchmarking and performance testing for machine learning models across various scenarios.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

MLPerf™ Inference Benchmark Suite

MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios.

Please see the MLPerf Inference benchmark paper for a detailed description of the benchmarks along with the motivation and guiding principles behind the benchmark suite. If you use any part of this benchmark (e.g., reference implementations, submissions, etc.), please cite the following:

@misc{reddi2019mlperf,
    title={MLPerf Inference Benchmark},
    author={Vijay Janapa Reddi and Christine Cheng and David Kanter and Peter Mattson and Guenther Schmuelling and Carole-Jean Wu and Brian Anderson and Maximilien Breughe and Mark Charlebois and William Chou and Ramesh Chukka and Cody Coleman and Sam Davis and Pan Deng and Greg Diamos and Jared Duke and Dave Fick and J. Scott Gardner and Itay Hubara and Sachin Idgunji and Thomas B. Jablin and Jeff Jiao and Tom St. John and Pankaj Kanwar and David Lee and Jeffery Liao and Anton Lokhmotov and Francisco Massa and Peng Meng and Paulius Micikevicius and Colin Osborne and Gennady Pekhimenko and Arun Tejusve Raghunath Rajan and Dilip Sequeira and Ashish Sirasao and Fei Sun and Hanlin Tang and Michael Thomson and Frank Wei and Ephrem Wu and Lingjie Xu and Koichi Yamada and Bing Yu and George Yuan and Aaron Zhong and Peizhao Zhang and Yuchen Zhou},
    year={2019},
    eprint={1911.02549},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Please see here for the MLPerf inference documentation website which includes automated commands to run MLPerf inference benchmarks using different implementations.

MLPerf Inference v5.0 (submission deadline February 28, 2025)

For submissions, please use the master branch and any commit since the 5.0 seed release although it is best to use the latest commit in the master branch.

For power submissions please use SPEC PTD 1.11.1 (needs special access) and any commit of the power-dev repository after the code-freeze

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnx, tvm, ncnnimagenet2012edge,datacenter
retinanet 800x800vision/classification_and_detectionpytorch, onnxopenimages resized to 800x800edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge
dlrm-v2recommendation/dlrm_v2pytorchMultihot Criteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19edge,datacenter
gpt-jlanguage/gpt-jpytorchCNN-Daily Mailedge,datacenter
stable-diffusion-xltext_to_imagepytorchCOCO 2014edge,datacenter
llama2-70blanguage/llama2-70bpytorchOpenOrcadatacenter
llama3.1-405blanguage/llama3-405bpytorchLongBench, LongDataCollections, Ruler, GovReportdatacenter
mixtral-8x7blanguage/mixtral-8x7bpytorchOpenOrca, MBXP, GSM8Kdatacenter
rgatgraph/rgatpytorchIGBHdatacenter
pointpaintingautomotive/3d-object-detectionpytorch, onnxWaymo Open Datasetedge
  • Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.

MLPerf Inference v4.1 (submission deadline July 26, 2024)

For submissions, please use the master branch and any commit since the 4.1 seed release although it is best to use the latest commit. v4.1 tag will be created from the master branch after the result publication.

For power submissions please use SPEC PTD 1.10 (needs special access) and any commit of the power-dev repository after the code-freeze

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnx, tvm, ncnnimagenet2012edge,datacenter
retinanet 800x800vision/classification_and_detectionpytorch, onnxopenimages resized to 800x800edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrm-v2recommendation/dlrm_v2pytorchMultihot Criteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19edge,datacenter
gpt-jlanguage/gpt-jpytorchCNN-Daily Mailedge,datacenter
stable-diffusion-xltext_to_imagepytorchCOCO 2014edge,datacenter
llama2-70blanguage/llama2-70bpytorchOpenOrcadatacenter
mixtral-8x7blanguage/mixtral-8x7bpytorchOpenOrca, MBXP, GSM8Kdatacenter
  • Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.

MLPerf Inference v4.0 (submission February 23, 2024)

There is an extra one-week extension allowed only for the llama2-70b submissions. For submissions, please use the master branch and any commit since the 4.0 seed release although it is best to use the latest commit. v4.0 tag will be created from the master branch after the result publication.

For power submissions please use SPEC PTD 1.10 (needs special access) and any commit of the power-dev repository after the code-freeze

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnx, tvm, ncnnimagenet2012edge,datacenter
retinanet 800x800vision/classification_and_detectionpytorch, onnxopenimages resized to 800x800edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrm-v2recommendation/dlrm_v2pytorchMultihot Criteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19edge,datacenter
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpusedge,datacenter
gpt-jlanguage/gpt-jpytorchCNN-Daily Mailedge,datacenter
stable-diffusion-xltext_to_imagepytorchCOCO 2014edge,datacenter
llama2-70blanguage/llama2-70bpytorchOpenOrcadatacenter
  • Framework here is given for the reference implementation. Submitters are free to use their own frameworks to run the benchmark.

MLPerf Inference v3.1 (submission August 18, 2023)

Please use v3.1 tag (git checkout v3.1) if you would like to reproduce the v3.1 results.

For reproducing power submissions please use the master branch of the MLCommons power-dev repository and checkout to e9e16b1299ef61a2a5d8b9abf5d759309293c440.

You can see the individual README files in the benchmark task folders for more details regarding the benchmarks. For reproducing the submitted results please see the README files under the respective submitter folders in the inference v3.1 results repository.

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnx, tvm, ncnnimagenet2012edge,datacenter
retinanet 800x800vision/classification_and_detectionpytorch, onnxopenimages resized to 800x800edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrm-v2recommendation/dlrm_v2pytorchMultihot Criteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19edge,datacenter
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpusedge,datacenter
gpt-jlanguage/gpt-jpytorchCNN-Daily Mailedge,datacenter

MLPerf Inference v3.0 (submission 03/03/2023)

Please use the v3.0 tag (git checkout v3.0) if you would like to reproduce v3.0 results.

You can see the individual Readme files in the reference app for more details.

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnx, tvmimagenet2012edge,datacenter
retinanet 800x800vision/classification_and_detectionpytorch, onnxopenimages resized to 800x800edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrmrecommendation/dlrmpytorch, tensorflowCriteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19edge,datacenter
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpusedge,datacenter

MLPerf Inference v2.1 (submission 08/05/2022)

Use the r2.1 branch (git checkout r2.1) if you want to submit or reproduce v2.1 results.

See the individual Readme files in the reference app for details.

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnximagenet2012edge,datacenter
retinanet 800x800vision/classification_and_detectionpytorch, onnxopenimages resized to 800x800edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrmrecommendation/dlrmpytorch, tensorflowCriteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19edge,datacenter
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpusedge,datacenter

MLPerf Inference v2.0 (submission 02/25/2022)

Use the r2.0 branch (git checkout r2.0) if you want to submit or reproduce v2.0 results.

See the individual Readme files in the reference app for details.

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnximagenet2012edge,datacenter
ssd-mobilenet 300x300vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 300x300edge
ssd-resnet34 1200x1200vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 1200x1200edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrmrecommendation/dlrmpytorch, tensorflowCriteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unet-kits19pytorch, tensorflow, onnxKiTS19edge,datacenter
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpusedge,datacenter

MLPerf Inference v1.1 (submission 08/13/2021)

Use the r1.1 branch (git checkout r1.1) if you want to submit or reproduce v1.1 results.

See the individual Readme files in the reference app for details.

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnximagenet2012edge,datacenter
ssd-mobilenet 300x300vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 300x300edge
ssd-resnet34 1200x1200vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 1200x1200edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrmrecommendation/dlrmpytorch, tensorflowCriteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unetpytorch, tensorflow(?), onnx(?)BraTS 2019edge,datacenter
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpusedge,datacenter

MLPerf Inference v1.0 (submission 03/19/2021)

Use the r1.0 branch (git checkout r1.0) if you want to submit or reproduce v1.0 results.

See the individual Readme files in the reference app for details.

modelreference appframeworkdatasetcategory
resnet50-v1.5vision/classification_and_detectiontensorflow, onnximagenet2012edge,datacenter
ssd-mobilenet 300x300vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 300x300edge
ssd-resnet34 1200x1200vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 1200x1200edge,datacenter
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1edge,datacenter
dlrmrecommendation/dlrmpytorch, tensorflow(?)Criteo Terabytedatacenter
3d-unetvision/medical_imaging/3d-unetpytorch, tensorflow(?), onnx(?)BraTS 2019edge,datacenter
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpusedge,datacenter

MLPerf Inference v0.7 (submission 9/18/2020)

Use the r0.7 branch (git checkout r0.7) if you want to submit or reproduce v0.7 results.

See the individual Readme files in the reference app for details.

modelreference appframeworkdataset
resnet50-v1.5vision/classification_and_detectiontensorflow, pytorch, onnximagenet2012
ssd-mobilenet 300x300vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 300x300
ssd-resnet34 1200x1200vision/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 1200x1200
bertlanguage/berttensorflow, pytorch, onnxsquad-1.1
dlrmrecommendation/dlrmpytorch, tensorflow(?), onnx(?)Criteo Terabyte
3d-unetvision/medical_imaging/3d-unetpytorch, tensorflow(?), onnx(?)BraTS 2019
rnntspeech_recognition/rnntpytorchOpenSLR LibriSpeech Corpus

MLPerf Inference v0.5

Use the r0.5 branch (git checkout r0.5) if you want to reproduce v0.5 results.

See the individual Readme files in the reference app for details.

modelreference appframeworkdataset
resnet50-v1.5v0.5/classification_and_detectiontensorflow, pytorch, onnximagenet2012
mobilenet-v1v0.5/classification_and_detectiontensorflow, pytorch, onnximagenet2012
ssd-mobilenet 300x300v0.5/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 300x300
ssd-resnet34 1200x1200v0.5/classification_and_detectiontensorflow, pytorch, onnxcoco resized to 1200x1200
gnmtv0.5/translation/gnmt/tensorflow, pytorchSee Readme