glow

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"

3,139

516

3,139

View on GitHub

Top Related Projects

tensor2tensor

16,089

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

pytorch

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

jax

32,065

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

horovod

14,454

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

fairseq

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

DeepSpeed

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Quick Overview

Glow is a machine learning compiler and runtime for hardware accelerators, developed by OpenAI. It aims to optimize neural network models for various hardware targets, including GPUs, CPUs, and specialized AI accelerators. Glow focuses on providing efficient execution of deep learning models across different platforms.

Pros

Supports multiple hardware targets, enabling flexibility in deployment
Optimizes neural network models for improved performance
Integrates with popular deep learning frameworks like PyTorch and TensorFlow
Open-source project with active development and community support

Cons

Steep learning curve for users unfamiliar with compiler technologies
Limited documentation and examples for some advanced features
May require additional effort to integrate with custom hardware or less common accelerators
Performance gains can vary depending on the specific model and hardware combination

Code Examples

Loading and compiling a model:

import torch
from torch_glow import enable_glow

# Enable Glow for PyTorch
enable_glow()

# Load a pre-trained model
model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)

# Compile the model with Glow
compiled_model = torch.jit.trace(model, torch.randn(1, 3, 224, 224))

Running inference with the compiled model:

import torch
import numpy as np

# Prepare input data
input_data = torch.randn(1, 3, 224, 224)

# Run inference
with torch.no_grad():
    output = compiled_model(input_data)

# Process the output
predicted_class = np.argmax(output.numpy())

Configuring Glow backend options:

from torch_glow import enable_glow, Backend

# Enable Glow with specific backend options
enable_glow(
    backend=Backend.CPU,
    num_devices=1,
    debug_compile=True
)

Getting Started

To get started with Glow:

Install Glow and its dependencies:

git clone https://github.com/pytorch/glow.git
cd glow
mkdir build && cd build
cmake -G Ninja ..
ninja all

Install the PyTorch integration:

pip install torch-glow

Use Glow in your PyTorch project:

import torch
from torch_glow import enable_glow

enable_glow()
# Your PyTorch code here

Competitor Comparisons

tensor2tensor

16,089

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Pros of tensor2tensor

Broader scope, covering a wide range of machine learning tasks and models
More active development and larger community support
Extensive documentation and tutorials for easier adoption

Cons of tensor2tensor

Steeper learning curve due to its comprehensive nature
Potentially slower execution compared to Glow's optimized flow-based approach
May require more computational resources for some tasks

Code comparison

tensor2tensor:

import tensorflow as tf
import tensor2tensor as t2t

problem = t2t.problems.problem("image_mnist")
model = t2t.models.transformer.Transformer(...)
trainer = t2t.utils.trainer_lib.create_trainer(...)

Glow:

import torch
from glow import models

model = models.Glow(...)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss = model(x)

The code snippets demonstrate the different approaches:

tensor2tensor uses a problem-model-trainer structure
Glow focuses on flow-based models with PyTorch integration

Both repositories offer powerful tools for machine learning tasks, but tensor2tensor provides a more comprehensive framework while Glow specializes in flow-based generative models.

pytorch

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Wider adoption and larger community support
More flexible and dynamic computational graph
Extensive ecosystem of tools and libraries

Cons of PyTorch

Steeper learning curve for beginners
Slightly slower execution compared to static graph frameworks

Code Comparison

PyTorch:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = x + y
print(z)

Glow:

import glow

x = glow.placeholder(shape=[3], dtype=glow.float32)
y = glow.placeholder(shape=[3], dtype=glow.float32)
z = x + y
print(glow.Session().run(z, {x: [1, 2, 3], y: [4, 5, 6]}))

PyTorch offers a more Pythonic and intuitive approach, while Glow follows a static graph model similar to TensorFlow. PyTorch's dynamic computation graph allows for easier debugging and more natural Python integration, whereas Glow's static graph may provide optimization benefits for certain use cases.

jax

32,065

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Pros of JAX

More flexible and general-purpose, supporting a wider range of machine learning tasks
Better performance on GPUs and TPUs through XLA compilation
Active development with frequent updates and growing ecosystem

Cons of JAX

Steeper learning curve, especially for those familiar with NumPy/PyTorch
Less focus on specific generative models compared to Glow
Smaller community and fewer pre-trained models available

Code Comparison

Glow (PyTorch-based):

import torch
from glow import Glow

model = Glow(in_channel=3, n_flow=32, n_block=3)
x = torch.randn(16, 3, 64, 64)
z, logdet = model(x)

JAX:

import jax
import jax.numpy as jnp

def model(params, x):
    # Define your model architecture here
    return output

x = jnp.random.normal(key, (16, 3, 64, 64))
output = jax.jit(model)(params, x)

Summary

JAX offers more flexibility and performance for general machine learning tasks, while Glow is specifically designed for generative models. JAX has a steeper learning curve but provides better optimization capabilities. Glow may be easier to use for specific generative tasks but is less versatile overall.

horovod

14,454

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Pros of Horovod

Designed specifically for distributed deep learning, offering better scalability across multiple GPUs and nodes
Supports multiple deep learning frameworks (TensorFlow, PyTorch, MXNet, Keras)
Easier to integrate into existing deep learning workflows

Cons of Horovod

More focused on distributed training, less versatile for general machine learning tasks
Requires additional setup and configuration for distributed environments
May have a steeper learning curve for users not familiar with distributed computing concepts

Code Comparison

Glow example (ONNX model loading):

auto *F = mod.createFunction("main");
Placeholder *output;
PlaceholderBindings bindings;
ONNXModelLoader onnxLD(modelPath, {}, {}, *F);
output = EXIT_ON_ERR(onnxLD.getSingleOutput());

Horovod example (distributed training):

import horovod.tensorflow as hvd
hvd.init()
config = tf.ConfigProto()
config.gpu_options.visible_device_list = str(hvd.local_rank())
hvd.DistributedOptimizer(optimizer)

fairseq

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Broader focus on sequence modeling tasks, including machine translation, text summarization, and language modeling
More extensive documentation and tutorials, making it easier for newcomers to get started
Active community with frequent updates and contributions

Cons of fairseq

Potentially steeper learning curve due to its broader scope and more complex architecture
May require more computational resources for training and inference compared to Glow's optimized approach

Code Comparison

fairseq example (PyTorch-based):

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', 'checkpoint.pt')
tokens = model.encode('Hello world!')
translated = model.translate(tokens)

Glow example (TensorFlow-based):

import glow
model = glow.get_glow_model('/path/to/model')
latents = model.encode('Hello world!')
reconstructed = model.decode(latents)

Summary

fairseq offers a more comprehensive toolkit for various sequence modeling tasks, with better documentation and community support. However, it may be more complex and resource-intensive compared to Glow. Glow focuses specifically on flow-based generative models, potentially offering a more streamlined experience for certain tasks. The code examples highlight the different frameworks (PyTorch vs. TensorFlow) and approaches to encoding and decoding text.

DeepSpeed

37,573

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Focuses on optimizing large-scale model training and inference
Offers ZeRO optimization for efficient memory usage
Provides a more comprehensive suite of optimization techniques

Cons of DeepSpeed

Steeper learning curve due to its extensive feature set
May be overkill for smaller models or simpler training tasks

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

Glow:

import glow
model = glow.models.Glow(
    image_shape=(3, 32, 32),
    hidden_channels=512,
    K=32,
    L=3,
    actnorm_scale=1.0,
    flow_permutation='invconv',
    flow_coupling='affine',
    LU_decomposed=True
)

Key Differences

DeepSpeed is more focused on large-scale model training optimization
Glow is specifically designed for generative flow models
DeepSpeed offers a broader range of optimization techniques
Glow provides a more specialized framework for specific types of models

Use Cases

DeepSpeed: Large language models, distributed training scenarios
Glow: Image generation, density estimation, and other flow-based tasks

Community and Support

DeepSpeed: Larger community, more frequent updates
Glow: Smaller community, less frequent updates but still maintained

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Status: Archive (code is provided as-is, no updates expected)

Glow

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"

To use pretrained CelebA-HQ model, make your own manipulation vectors and run our interactive demo, check demo folder.

Requirements

Tensorflow (tested with v1.8.0)
Horovod (tested with v0.13.8) and (Open)MPI

Run

pip install -r requirements.txt

To setup (Open)MPI, check instructions on Horovod github page.

Download datasets

For small scale experiments, use MNIST/CIFAR-10 (directly downloaded by train.py using keras)

For larger scale experiments, the datasets used are in the Google Cloud locations https://openaipublic.azureedge.net/glow-demo/data/{dataset_name}-tfr.tar. The dataset_names are below, we mention the exact preprocessing / downsampling method for a correct comparison of likelihood.

Quantitative results

imagenet-oord - 20GB. Unconditional ImageNet 32x32 and 64x64, as described in PixelRNN/RealNVP papers (we downloaded this processed version).
lsun_realnvp - 140GB. LSUN 96x96. Random 64x64 crops taken at processing time, as described in RealNVP.

Qualitative results

celeba - 4GB. CelebA-HQ 256x256 dataset, as described in Progressive growing of GAN's. For 1024x1024 version (120GB), use celeba-full-tfr.tar while downloading.
imagenet - 20GB. ImageNet 32x32 and 64x64 with class labels. Centre cropped, area downsampled.
lsun - 700GB. LSUN 256x256. Centre cropped, area downsampled.

To download and extract celeb for example, run

wget https://openaipublic.azureedge.net/glow-demo/data/celeba-tfr.tar
tar -xvf celeb-tfr.tar

Change hps.data_dir in train.py file to point to the above folder (or use the --data_dir flag when you run train.py)

For lsun, since download can be quite big, you can instead follow the instructions in data_loaders/generate_tfr/lsun.py to generate the tfr file directly from LSUN images. church_outdoor will be the smallest category.

Simple Train with 1 GPU

Run wtih small depth to test

CUDA_VISIBLE_DEVICES=0 python train.py --depth 1

Train with multiple GPUs using MPI and Horovod

Run default training script with 8 GPUs:

mpiexec -n 8 python train.py

Ablation experiments

mpiexec -n 8 python train.py --problem cifar10 --image_size 32 --n_level 3 --depth 32 --flow_permutation [0/1/2] --flow_coupling [0/1] --seed [0/1/2] --learntop --lr 0.001

Pretrained models, logs and samples

wget https://openaipublic.azureedge.net/glow-demo/logs/abl-[reverse/shuffle/1x1]-[add/aff].tar

CIFAR-10 Quantitative result

mpiexec -n 8 python train.py --problem cifar10 --image_size 32 --n_level 3 --depth 32 --flow_permutation 2 --flow_coupling 1 --seed 0 --learntop --lr 0.001 --n_bits_x 8

ImageNet 32x32 Quantitative result

mpiexec -n 8 python train.py --problem imagenet-oord --image_size 32 --n_level 3 --depth 48 --flow_permutation 2 --flow_coupling 1 --seed 0 --learntop --lr 0.001 --n_bits_x 8

ImageNet 64x64 Quantitative result

mpiexec -n 8 python train.py --problem imagenet-oord --image_size 64 --n_level 4 --depth 48 --flow_permutation 2 --flow_coupling 1 --seed 0 --learntop --lr 0.001 --n_bits_x 8

LSUN 64x64 Quantitative result

mpiexec -n 8 python train.py --problem lsun_realnvp --category [bedroom/church_outdoor/tower] --image_size 64 --n_level 3 --depth 48 --flow_permutation 2 --flow_coupling 1 --seed 0 --learntop --lr 0.001 --n_bits_x 8

Pretrained models, logs and samples

wget https://openaipublic.azureedge.net/glow-demo/logs/lsun-rnvp-[bdr/crh/twr].tar

CelebA-HQ 256x256 Qualitative result

mpiexec -n 40 python train.py --problem celeba --image_size 256 --n_level 6 --depth 32 --flow_permutation 2 --flow_coupling 0 --seed 0 --learntop --lr 0.001 --n_bits_x 5

LSUN 96x96 and 128x128 Qualitative result

mpiexec -n 40 python train.py --problem lsun --category [bedroom/church_outdoor/tower] --image_size [96/128] --n_level 5 --depth 64 --flow_permutation 2 --flow_coupling 0 --seed 0 --learntop --lr 0.001 --n_bits_x 5

Logs and samples

wget https://openaipublic.azureedge.net/glow-demo/logs/lsun-bdr-[96/128].tar

Conditional CIFAR-10 Qualitative result

mpiexec -n 8 python train.py --problem cifar10 --image_size 32 --n_level 3 --depth 32 --flow_permutation 2 --flow_coupling 0 --seed 0 --learntop --lr 0.001 --n_bits_x 5 --ycond --weight_y=0.01

Conditional ImageNet 32x32 Qualitative result

mpiexec -n 8 python train.py --problem imagenet --image_size 32 --n_level 3 --depth 48 --flow_permutation 2 --flow_coupling 0 --seed 0 --learntop --lr 0.001 --n_bits_x 5 --ycond --weight_y=0.01

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot