tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

12,482

3,622

12,482

143

View on GitHub

Top Related Projects

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

onnx

19,372

Open standard for machine learning interoperability

onnxruntime

17,390

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

jax

32,985

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

mlpack

5,377

mlpack: a fast, header-only C++ machine learning library

Quick Overview

Apache TVM is an open-source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on various hardware backends, including mobile devices, embedded systems, and cloud platforms.

Pros

Supports multiple hardware targets and deep learning frameworks
Provides automatic optimization and tuning capabilities
Offers a flexible and extensible architecture for custom optimizations
Enables efficient deployment of machine learning models on diverse platforms

Cons

Steep learning curve for beginners
Documentation can be complex and sometimes outdated
Limited support for certain specialized hardware accelerators
Requires expertise in both machine learning and hardware optimization

Code Examples

Defining and compiling a simple computation:

import tvm
from tvm import te

n = te.var("n")
A = te.placeholder((n,), name="A")
B = te.compute(A.shape, lambda i: A[i] * 2, name="B")
s = te.create_schedule(B.op)
f = tvm.build(s, [A, B], "llvm", name="double")

Optimizing a convolution operation:

import tvm
from tvm import te, auto_scheduler

@auto_scheduler.register_workload
def conv2d(N, H, W, CO, CI, KH, KW, stride, padding):
    data = te.placeholder((N, CI, H, W), name="data")
    kernel = te.placeholder((CO, CI, KH, KW), name="kernel")
    conv = tvm.topi.nn.conv2d_nchw(data, kernel, stride, padding, dilation=1, out_dtype="float32")
    return [data, kernel, conv]

target = tvm.target.Target("cuda")
task = auto_scheduler.SearchTask(func=conv2d, args=(1, 224, 224, 64, 3, 7, 7, 2, 3), target=target)

tune_option = auto_scheduler.TuningOptions(
    num_measure_trials=200,
    measure_callbacks=[auto_scheduler.RecordToFile("conv2d.json")],
    verbose=2,
)

sch, args = auto_scheduler.auto_schedule(task, tuning_options=tune_option)

Deploying a pre-trained model:

import tvm
from tvm import relay
import tflite

tflite_model_file = "mobilenet_v1_1.0_224_quant.tflite"
tflite_model_buf = open(tflite_model_file, "rb").read()
tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)

input_tensor = "input"
input_shape = (1, 224, 224, 3)
input_dtype = "uint8"

mod, params = relay.frontend.from_tflite(tflite_model,
                                         shape_dict={input_tensor: input_shape},
                                         dtype_dict={input_tensor: input_dtype})

target = "llvm"
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target=target, params=params)

dev = tvm.device(str(target), 0)
module = runtime.GraphModule(lib["default"](dev))

Getting Started

To get started with Apache TVM:

Install TVM:

git clone --recursive https://github.com/apache/tvm tvm
cd tvm
mkdir build
cp cmake/config.cmake build
cd build
cmake ..
make -j4

Set up Python environment:

export TVM_HOME=/path/to/tvm
export PYTHONPATH=$TVM_HOME/python:${PYTHONPATH}

Run a simple example:

import tvm
from tvm import te

A = te.placeholder((10,), name="A")
B = te.compute(A.

Competitor Comparisons

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

More user-friendly and intuitive API for deep learning tasks
Extensive ecosystem with pre-trained models and libraries
Dynamic computational graphs for flexible model development

Cons of PyTorch

Less optimized for deployment on edge devices and mobile platforms
Limited support for specialized hardware accelerators compared to TVM
Steeper learning curve for low-level optimizations and custom operators

Code Comparison

PyTorch:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.matmul(x, y)

TVM:

import tvm
from tvm import te

n = te.var("n")
A = te.placeholder((n,), name="A")
B = te.placeholder((n,), name="B")
C = te.compute(A.shape, lambda i: A[i] * B[i])

PyTorch focuses on high-level tensor operations and automatic differentiation, making it easier for researchers and developers to build and train neural networks. TVM, on the other hand, provides a lower-level approach with more control over hardware-specific optimizations and compilation for various targets.

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Larger ecosystem and community support
More comprehensive documentation and tutorials
Wider range of pre-trained models and tools

Cons of TensorFlow

Steeper learning curve for beginners
Less flexibility for low-level optimizations
Heavier resource requirements

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

TVM:

import tvm
from tvm import relay

def simple_net(data):
    dense1 = relay.nn.dense(data, relay.var("dense1_weight"))
    relu1 = relay.nn.relu(dense1)
    dense2 = relay.nn.dense(relu1, relay.var("dense2_weight"))
    return relay.nn.softmax(dense2)

TensorFlow provides a higher-level API for model creation, while TVM offers more low-level control for optimization. TVM focuses on optimizing and deploying models across various hardware platforms, whereas TensorFlow is a more comprehensive framework for building and training machine learning models.

onnx

19,372

Open standard for machine learning interoperability

Pros of ONNX

Widely adopted standard for neural network exchange
Supports a broader range of frameworks and tools
Simpler model representation and easier to understand

Cons of ONNX

Limited runtime optimization capabilities
Less focus on end-to-end deployment and hardware-specific optimizations
Narrower scope, primarily for model exchange rather than compilation

Code Comparison

ONNX model definition:

import onnx

node = onnx.helper.make_node('Relu', inputs=['X'], outputs=['Y'])
graph = onnx.helper.make_graph([node], 'test', [X], [Y])
model = onnx.helper.make_model(graph)

TVM model definition and compilation:

import tvm
from tvm import relay

x = relay.var('x', shape=(1, 10))
y = relay.nn.relu(x)
func = relay.Function([x], y)
mod = tvm.IRModule.from_expr(func)
target = tvm.target.Target('llvm')
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target)

ONNX focuses on model representation and interoperability, while TVM provides a more comprehensive approach to model optimization and deployment across various hardware targets. TVM offers more advanced compilation techniques and runtime optimizations, making it better suited for performance-critical applications and specialized hardware.

onnxruntime

17,390

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

Broader hardware support and optimizations for various devices
Easier integration with existing ML frameworks and tools
More extensive documentation and community support

Cons of ONNX Runtime

Less flexibility for custom operators and optimizations
Limited support for certain advanced deep learning models
Potentially higher memory usage for some workloads

Code Comparison

ONNX Runtime example:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

TVM example:

import tvm
from tvm import relay

mod, params = relay.frontend.from_onnx(onnx_model)
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target, params=params)

Both ONNX Runtime and TVM are powerful frameworks for optimizing and deploying machine learning models. ONNX Runtime excels in ease of use and broad hardware support, while TVM offers more flexibility for advanced optimizations and custom operators. The choice between the two depends on specific project requirements, target hardware, and the level of customization needed.

jax

32,985

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Pros of JAX

Seamless integration with NumPy and automatic differentiation
Efficient compilation to XLA for GPU and TPU acceleration
Strong support for functional programming paradigms

Cons of JAX

Steeper learning curve for users not familiar with functional programming
Limited support for dynamic shapes and control flow compared to TVM
Smaller ecosystem and fewer pre-built models than TVM

Code Comparison

JAX example:

import jax.numpy as jnp
from jax import grad, jit

def f(x):
    return jnp.sum(jnp.sin(x))

grad_f = jit(grad(f))

TVM example:

import tvm
from tvm import te

n = te.var("n")
A = te.placeholder((n,), name="A")
B = te.compute(A.shape, lambda i: tvm.tir.sin(A[i]), name="B")
s = te.create_schedule(B.op)

Both frameworks offer powerful capabilities for optimizing and accelerating numerical computations, but they approach the problem from different angles. JAX focuses on providing a NumPy-like interface with automatic differentiation and XLA compilation, while TVM offers a more flexible approach to tensor expressions and scheduling optimizations across various hardware targets.

mlpack

5,377

mlpack: a fast, header-only C++ machine learning library

Pros of mlpack

Focuses on scalable machine learning algorithms, offering a wide range of ML techniques
Provides bindings for multiple languages, including Python, Julia, and R
Emphasizes ease of use and fast prototyping for ML applications

Cons of mlpack

Less suitable for deep learning and neural network optimization compared to TVM
Smaller community and ecosystem compared to TVM's backing by Apache
Limited support for hardware-specific optimizations and cross-platform deployment

Code Comparison

mlpack (C++):

#include <mlpack/core.hpp>
#include <mlpack/methods/linear_regression/linear_regression.hpp>

arma::mat X, y;
mlpack::regression::LinearRegression lr(X, y);
arma::vec predictions;
lr.Predict(X_test, predictions);

TVM (Python):

import tvm
from tvm import relay

data = relay.var("data", shape=(1, 3, 224, 224))
weight = relay.var("weight")
conv2d = relay.nn.conv2d(data, weight)
func = relay.Function([data, weight], conv2d)

Both libraries offer different approaches to machine learning tasks. mlpack focuses on traditional ML algorithms with a C++ core, while TVM specializes in deep learning optimizations and cross-platform deployment.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

<img src=https://raw.githubusercontent.com/apache/tvm-site/main/images/logo/tvm-logo-small.png width=128/> Open Deep Learning Compiler Stack

Documentation | Contributors | Community | Release Notes

Apache TVM is a compiler stack for deep learning systems. It is designed to close the gap between the productivity-focused deep learning frameworks and the performance- and efficiency-focused hardware backends. TVM works with deep learning frameworks to provide end-to-end compilation for different backends.

License

TVM is licensed under the Apache-2.0 license.

Getting Started

Check out the TVM Documentation site for installation instructions, tutorials, examples, and more. The Getting Started with TVM tutorial is a great place to start.

Contribute to TVM

TVM adopts the Apache committer model. We aim to create an open-source project maintained and owned by the community. Check out the Contributor Guide.

History and Acknowledgement

TVM started as a research project for deep learning compilation. The first version of the project benefited a lot from the following projects:

Halide: Part of TVM's TIR and arithmetic simplification module originates from Halide. We also learned and adapted some parts of the lowering pipeline from Halide.
Loopy: use of integer set analysis and its loop transformation primitives.
Theano: the design inspiration of symbolic scan operator for recurrence.

Since then, the project has gone through several rounds of redesigns. The current design is also drastically different from the initial design, following the development trend of the ML compiler community.

The most recent version focuses on a cross-level design with TensorIR as the tensor-level representation and Relax as the graph-level representation and Python-first transformations. The project's current design goal is to make the ML compiler accessible by enabling most transformations to be customizable in Python and bringing a cross-level representation that can jointly optimize computational graphs, tensor programs, and libraries. The project is also a foundation infra for building Python-first vertical compilers for domains, such as LLMs.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot