glow

Compiler for Neural Network hardware accelerators

3,311

699

3,311

435

View on GitHub

Top Related Projects

TensorComprehensions

1,759

A domain specific language to express machine learning workloads.

mlir

1,748

"Multi-Level Intermediate Representation" Compiler Infrastructure

iree

3,202

A retargetable MLIR-based machine learning compiler and runtime toolkit.

plaidml

4,578

PlaidML is a framework for making deep learning work everywhere.

tvm

12,482

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Quick Overview

Glow is a machine learning compiler and execution engine for hardware accelerators, developed by Facebook (now Meta). It's designed to optimize and run neural networks on various hardware platforms, focusing on low-latency inference and efficient memory usage.

Pros

Supports multiple hardware targets, including CPUs, GPUs, and specialized AI accelerators
Optimizes neural network models for improved performance and reduced memory footprint
Integrates well with PyTorch, allowing seamless conversion of PyTorch models
Provides a flexible graph transformation framework for custom optimizations

Cons

Relatively complex setup and learning curve for new users
Limited documentation and examples compared to more mainstream frameworks
Primarily focused on inference, with less emphasis on training capabilities
May require frequent updates to keep up with rapidly evolving hardware accelerators

Code Examples

Loading and running a PyTorch model with Glow:

import torch
from torch.utils.bundled_inputs import bundled_inputs
from torch_glow import enable_glow_fusion

@enable_glow_fusion()
def run_model(model, inputs):
    return model(*inputs)

# Load your PyTorch model
model = torch.jit.load("path/to/your/model.pt")

# Prepare input data
example_inputs = bundled_inputs(model)

# Run the model using Glow
output = run_model(model, example_inputs)

Compiling a model for a specific backend:

import torch_glow
from torch_glow import CompilationSpec, InputSpec

# Define input specifications
input_specs = [
    InputSpec("input", [1, 3, 224, 224], torch.float32),
]

# Create a compilation specification
comp_spec = CompilationSpec()
comp_spec.set_input_specs(input_specs)

# Compile the model for a specific backend (e.g., "CPU")
compiled_model = torch_glow.compile(model, comp_spec, backend="CPU")

Performing quantization-aware training with Glow:

import torch
from torch_glow import enable_glow_fusion

@enable_glow_fusion()
def quantized_training_step(model, inputs, labels, optimizer):
    optimizer.zero_grad()
    outputs = model(inputs)
    loss = criterion(outputs, labels)
    loss.backward()
    optimizer.step()
    return loss

# Your quantization-aware training loop
for epoch in range(num_epochs):
    for inputs, labels in dataloader:
        loss = quantized_training_step(model, inputs, labels, optimizer)

Getting Started

To get started with Glow:

Install PyTorch and Glow:

pip install torch torchvision
pip install torch-glow

Import the necessary modules:

import torch
import torch_glow

Enable Glow fusion for your PyTorch model:

from torch_glow import enable_glow_fusion

@enable_glow_fusion()
def run_model(model, inputs):
    return model(*inputs)

# Use run_model() to execute your PyTorch model with Glow optimizations

Competitor Comparisons

TensorComprehensions

1,759

A domain specific language to express machine learning workloads.

Pros of TensorComprehensions

Focuses on automatic code generation for specific tensor operations
Provides a domain-specific language for expressing computations
Integrates with existing deep learning frameworks like PyTorch

Cons of TensorComprehensions

More specialized and narrower in scope compared to Glow
Less mature and potentially less stable
May require more manual intervention for complex operations

Code Comparison

TensorComprehensions:

def matmul(float(M,K) A, float(K,N) B) -> (C) {
    C(m,n) +=! A(m,k) * B(k,n)
}

Glow:

Node *matmul = F->createMatMul("matmul", A, B);
SaveNode *result = F->createSave("result", matmul);

TensorComprehensions uses a custom DSL to define tensor operations, while Glow employs a more traditional C++ API for creating computational graphs. TensorComprehensions focuses on generating optimized code for specific operations, whereas Glow provides a broader compiler infrastructure for neural network models.

Both projects aim to improve performance in deep learning applications, but they approach the problem from different angles. TensorComprehensions is more suited for developers who need fine-grained control over specific tensor operations, while Glow offers a more comprehensive solution for compiling and optimizing entire neural network models.

mlir

1,748

"Multi-Level Intermediate Representation" Compiler Infrastructure

Pros of MLIR

More comprehensive and flexible intermediate representation (IR) system
Broader scope, supporting multiple frontends and backends beyond just machine learning
Active development and support from Google and the wider community

Cons of MLIR

Steeper learning curve due to its more complex architecture
Less mature ecosystem compared to Glow's focus on PyTorch integration
May be overkill for projects solely focused on machine learning optimization

Code Comparison

MLIR example:

func @simple_mul(%arg0: tensor<4xf32>, %arg1: tensor<4xf32>) -> tensor<4xf32> {
  %0 = "tf.Mul"(%arg0, %arg1) : (tensor<4xf32>, tensor<4xf32>) -> tensor<4xf32>
  return %0 : tensor<4xf32>
}

Glow example:

Function *F = mod.createFunction("simple_mul");
auto *input1 = mod.createPlaceholder(ElemKind::FloatTy, {4}, "input1", false);
auto *input2 = mod.createPlaceholder(ElemKind::FloatTy, {4}, "input2", false);
auto *mul = F->createMul("mul", input1, input2);
F->createSave("save", mul);

Summary

MLIR offers a more versatile and powerful IR system with broader applications, while Glow provides a more focused solution for PyTorch-based machine learning optimization. MLIR's flexibility comes at the cost of increased complexity, whereas Glow offers a simpler approach for specific use cases.

iree

3,202

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Pros of IREE

Broader target support, including mobile and embedded devices
More active development and community engagement
Flexible multi-backend architecture for various hardware targets

Cons of IREE

Steeper learning curve due to its complexity
Less mature ecosystem compared to Glow
Potentially higher overhead for simpler deployment scenarios

Code Comparison

IREE example:

import iree.compiler as ireec
import numpy as np

module = ireec.compile_str("""
func @add(%a: tensor<4xf32>, %b: tensor<4xf32>) -> tensor<4xf32> {
  %0 = mhlo.add %a, %b : tensor<4xf32>
  return %0 : tensor<4xf32>
}
""", target_backends=["vulkan-spirv"])

Glow example:

#include "glow/ExecutionEngine/ExecutionEngine.h"
#include "glow/Graph/Graph.h"
#include "glow/Support/Support.h"

glow::PlaceholderBindings bindings;
glow::ExecutionEngine EE;
auto &mod = EE.getModule();
auto *F = mod.createFunction("main");

Both IREE and Glow aim to optimize and deploy machine learning models, but they differ in their approach and target platforms. IREE offers more flexibility for various hardware targets, while Glow is more tightly integrated with the PyTorch ecosystem. The choice between them depends on specific project requirements and deployment scenarios.

plaidml

4,578

PlaidML is a framework for making deep learning work everywhere.

Pros of PlaidML

Supports a wider range of hardware, including GPUs from NVIDIA, AMD, and Intel
Offers automatic kernel generation for various backends
Provides a more flexible approach to defining custom operations

Cons of PlaidML

Smaller community and ecosystem compared to Glow
Less optimized for specific hardware targets like mobile devices
Fewer pre-trained models and examples available

Code Comparison

PlaidML example:

import plaidml.keras
plaidml.keras.install_backend()
from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(32, input_shape=(16,), activation='relu'),
    Dense(10, activation='softmax')
])

Glow example:

#include "glow/ExecutionEngine/ExecutionEngine.h"
#include "glow/Graph/Graph.h"
#include "glow/IR/IR.h"

glow::ExecutionEngine EE;
auto &mod = EE.getModule();
auto *F = mod.createFunction("main");

Both PlaidML and Glow aim to provide efficient deep learning frameworks, but they have different focuses and strengths. PlaidML offers broader hardware support and flexibility, while Glow is more optimized for specific targets and has a larger ecosystem due to its association with PyTorch.

tvm

12,482

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Pros of TVM

Broader hardware support, including CPUs, GPUs, and specialized AI accelerators
More flexible and customizable compilation pipeline
Active community and frequent updates

Cons of TVM

Steeper learning curve due to its complexity
Potentially slower compilation times for simpler models

Code Comparison

TVM example:

import tvm
from tvm import relay

def example_network():
    data = relay.var("data", shape=(1, 3, 224, 224))
    weight = relay.var("weight")
    conv = relay.nn.conv2d(data, weight)
    return relay.Function([data, weight], conv)

Glow example:

#include "glow/Graph/Graph.h"

void exampleNetwork(glow::Module &mod) {
  auto *F = mod.createFunction("main");
  auto *input = mod.createPlaceholder(ElemKind::FloatTy, {1, 3, 224, 224}, "input", false);
  auto *filter = mod.createConstant(ElemKind::FloatTy, {16, 3, 3, 3}, "filter");
  auto *conv = F->createConv("conv", input, filter, 16, 3, 1, 1, 1);
}

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Glow is a machine learning compiler and execution engine for hardware accelerators. It is designed to be used as a backend for high-level machine learning frameworks. The compiler is designed to allow state of the art compiler optimizations and code generation of neural network graphs. This library is in active development. The project plan is described in the Github issues section and in the Roadmap wiki page.

Partners

Contributions to Glow are welcomed and encouraged! Glow is developed in collaboration with the following partners:

How does it work?

Glow lowers a traditional neural network dataflow graph into a two-phase strongly-typed intermediate representation (IR). The high-level IR allows the optimizer to perform domain-specific optimizations. The lower-level instruction-based address-only IR allows the compiler to perform memory-related optimizations, such as instruction scheduling, static memory allocation and copy elimination. At the lowest level, the optimizer performs machine-specific code generation to take advantage of specialized hardware features. Glow features a lowering phase which enables the compiler to support a high number of input operators as well as a large number of hardware targets by eliminating the need to implement all operators on all targets. The lowering phase is designed to reduce the input space and allow new hardware backends to focus on a small number of linear algebra primitives. The design philosophy is described in an arXiv paper.

Getting Started

System Requirements

Glow builds and runs on macOS and Linux. The software depends on a modern C++ compiler that supports C++11, on CMake, LLVM (>=7.0), glog, protocol buffers, and libpng.

Get Glow!

git clone git@github.com:pytorch/glow.git  # or: git clone https://github.com/pytorch/glow.git
cd glow

Submodules

Glow depends on a few submodules: googletest, onnx, and a library for FP16 conversions.

To get them, from the glow directory, run:

git submodule update --init --recursive

Source dependencies

Glow depends on fmt, which must be built from source:

git clone https://github.com/fmtlib/fmt
mkdir fmt/build
cd fmt/build
cmake ..
make
sudo make install

macOS

Install the required dependencies using either Homebrew or MacPorts. If using Homebrew, run:

brew install cmake graphviz libpng ninja protobuf wget glog autopep8 llvm   \
    boost double-conversion gflags jemalloc libevent lz4 openssl pkg-config \
    snappy xz

If using MacPorts, run:

port install cmake graphviz libpng ninja protobuf-cpp wget google-glog \
    boost double-conversion gflags jemalloc libevent lz4 openssl snappy xz
# Choose version >= 7
export LLVM_VERSION=7
port install llvm-$LLVM_VERSION.0

Note that LLVM is installed in a non-default location to avoid conflicts with the system's LLVM --Homebrew usually installs LLVM in /usr/local/opt/llvm/, whereas MacPorts installs it in /opt/local/libexec/llvm-$LLVM_VERSION.0/. This means that CMake will need to be told where to find LLVM when building; instructions on that can be found here.

Finally, create a symbolic link to the Homebrew- or MacPorts-installed clang-* tools so that the utils/format.sh script is able to find them later on. For a Homebrew-managed installation, run:

ln -s "/usr/local/opt/llvm/bin/clang-format" "/usr/local/bin/clang-format"
ln -s "/usr/local/opt/llvm/bin/clang-tidy" "/usr/local/bin/clang-tidy"

For MacPorts, run:

ln -s "/opt/local/libexec/llvm-$LLVM_VERSION.0/bin/clang-format" "/usr/local/bin/clang-format"
ln -s "/opt/local/libexec/llvm-$LLVM_VERSION.0/bin/clang-tidy" "/usr/local/bin/clang-tidy"

Note: Starting with macOS Mojave, Xcode's command line tools changed header layout. In order for Glow to build on Mojave, you might need to install macOS_SDK_headers_for_macOS_10.14.pkg, located in /Library/Developer/CommandLineTools/Packages/. For macOS Catalina you might need to explicitly specify SDKROOT: export SDKROOT="/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk"

Ubuntu

[The following instructions have been tested on Ubuntu 16.04 and 18.04]

In order to build Glow on Ubuntu it is necessary to install a few packages. The following command should install the required dependencies:

sudo apt-get install clang clang-8 cmake graphviz libpng-dev \
    libprotobuf-dev llvm-8 llvm-8-dev ninja-build protobuf-compiler wget \
    opencl-headers libgoogle-glog-dev libboost-all-dev \
    libdouble-conversion-dev libevent-dev libssl-dev libgflags-dev \
    libjemalloc-dev libpthread-stubs0-dev liblz4-dev libzstd-dev libbz2-dev \
    libsodium-dev libfmt-dev

[Note: Ubuntu 16.04 and 18.04 ship with llvm-6 and need to be upgraded before building Glow. Building Glow on Ubuntu 16.04 with llvm-7 fails because llvm-7 xenial distribution uses an older c++ ABI, however building Glow on Ubuntu 18.04 with llvm-7 has been tested and is successful]

It may be desirable to use update-alternatives to manage the version of clang/clang++:

sudo update-alternatives --install /usr/bin/clang clang \
    /usr/lib/llvm-8/bin/clang 50
sudo update-alternatives --install /usr/bin/clang++ clang++ \
    /usr/lib/llvm-8/bin/clang++ 50

Glow uses the system default C/C++ compiler (/usr/bin/c++), and so you may also want to switch your default C/C++ compiler to clang:

sudo update-alternatives --config cc
    # Select the option corresponding to /usr/bin/clang ...
sudo update-alternatives --config c++
    # Select the option corresponding to /usr/bin/clang++ ...

Glow should build just fine with gcc (e.g. gcc 5.4), but we mostly use clang and are more attentive to compatibility with clang.

Finally, in order to support the ONNX net serialization format, Glow requires protobuf >= 2.6.1, but the above command may install older version on older Ubuntu (e.g. 14.04). If this is the case, we suggest to look at utils/install_protobuf.sh to install a newer version from source.

For details on installing OpenCL on Ubuntu please see these instructions.

Configure and Build

To build the compiler, create a build directory and run cmake on the source directory. It's a good idea to build two configurations (Release and Debug) because some programs take a really long time to run in Debug mode. It's also a good idea to build the project outside of the source directory.

mkdir build_Debug
cd build_Debug
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ../glow
ninja all

It's possible to configure and build the compiler with any CMake generator, like GNU Makefiles, Ninja and Xcode build.

For platform-specific build instructions and advanced options, such as building with Address-Sanitizers refer to this guide: Building the Compiler.

If you're running macOS v10.14 (Mojave) and ninja all fails because it can't find headers (e.g. string.h), run this command to fix it, and try again. More information is available here under "Command Line Tools".

open /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg

For macOS v10.15 (Catalina) you might need to explicitly specify SDKROOT:

export SDKROOT="/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk"

Building with dependencies (LLVM)

By default, Glow will use a system provided LLVM. Note that Glow requires LLVM 7.0 or later. If you have LLVM installed in a non-default location (for example, if you installed it using Homebrew on macOS), you need to tell CMake where to find llvm using -DLLVM_DIR. For example, if LLVM were installed in /usr/local/opt:

cmake -G Ninja ../glow \
    -DCMAKE_BUILD_TYPE=Debug \
    -DLLVM_DIR=/usr/local/opt/llvm/lib/cmake/llvm

If LLVM is not available on your system you'll need to build it manually. Run the script '/utils/build_llvm.sh to clone, build and install LLVM in a local directory. You will need to configure Glow with the flag -DLLVM_DIR to tell the build system where to find LLVM given the local directory you installed it in (e.g. -DLLVM_DIR=/path/to/llvm_install/lib/cmake/llvm if using build_llvm.sh).

Testing and Running

Unit tests

The project has a few unit tests in the tests/unittests subdirectory. To run all of them, simply run ninja test.

C++ API examples

A few test programs that use Glow's C++ API are found under the examples/ subdirectory. The mnist, cifar10, fr2en and ptb programs train and run digit recognition, image classification and language modeling benchmarks, respectively.

To run these programs, build Glow in Release mode, then run the following commands to download the cifar10, mnist and ptb databases.

python ../glow/utils/download_datasets_and_models.py --all-datasets

Now run the examples. Note that the databases should be in the current working directory.

./bin/mnist
./bin/cifar10
./bin/fr2en
./bin/ptb
./bin/char-rnn

If everything goes well you should see:

mnist: pictures from the mnist digits database
cifar10: image classifications that steadily improve
fr2en: an interactive French-to-English translator
ptb: decreasing perplexity on the dataset as the network trains
char-rnn: generates random text based on some document

Note that the default build mode is Debug, which means that the compiler itself is easy to debug because the binary contains debug info, lots of assertions, and the optimizations are disabled. It also means that the compiler and runtime are very slow, and the execution time can be hundreds of times slower than that of release builds. If you wish to benchmark the compiler, run long benchmarks, or release the product then you should compile the compiler in Release mode. Check the main CMake file for more details.

More details on testing and running Glow can be found in: Testing the Glow Compiler.

Ahead-of-time Compilation

Glow can be used to compile neural networks into object files containing native code. We provide resnet50 (both quantized and non-quantized versions) as an example of this capability in examples/bundles/resnet50. See Creating Standalone Executable Bundles for more detail.

Contributing

To get started contributing, please refer to the following guides:

Communication

Forums: discuss implementations, research, etc: https://discuss.pytorch.org/c/glow. Make sure to label topic with the "glow" category.
GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.

License

Glow is licensed under the Apache 2.0 License.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot