giraffe

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

1,240

156

1,240

View on GitHub

Top Related Projects

imaginaire

4,058

NVIDIA's Deep Imagination Team's PyTorch Library

taming-transformers

6,229

Taming Transformers for High-Resolution Image Synthesis

pytorch3d

9,337

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

detr

14,567

End-to-End Object Detection with Transformers

Quick Overview

GIRAFFE (Generative Implicit Representations of Appearance, Features, and Environment) is a novel 3D-aware image synthesis framework. It combines compositional 3D scene representations with neural rendering to generate high-quality images with explicit control over camera pose and scene composition.

Pros

Enables fine-grained control over 3D scene composition and camera viewpoints
Produces high-quality, realistic images with disentangled 3D properties
Allows for object-centric scene manipulation and editing
Demonstrates impressive results on complex datasets like CLEVR and CompCars

Cons

Requires significant computational resources for training and inference
May struggle with highly complex or diverse real-world scenes
Limited to a fixed number of objects in the scene
Potential difficulties in scaling to larger, more varied datasets

Code Examples

# Load a pre-trained GIRAFFE model
model = load_model('path/to/pretrained/model')

# Generate an image with specific camera and object parameters
image = model.generate(
    camera_pose=[0, 0, 1],
    object_positions=[[0, 0, 0], [1, 1, 0]],
    object_rotations=[[0, 0, 0], [0, 45, 0]]
)

# Manipulate object properties in an existing scene
scene = model.encode_scene(input_image)
scene.objects[0].position = [1, 0, 0]
scene.objects[1].rotation = [0, 90, 0]
new_image = model.render_scene(scene)

# Interpolate between two scenes
scene1 = model.encode_scene(image1)
scene2 = model.encode_scene(image2)
interpolated_images = model.interpolate_scenes(scene1, scene2, steps=10)

Getting Started

To get started with GIRAFFE:

Clone the repository:

git clone https://github.com/autonomousvision/giraffe.git
cd giraffe

Install dependencies:
```
pip install -r requirements.txt
```
Download pre-trained models or prepare your dataset for training.

Use the provided scripts for training or inference:

python train.py --config configs/clevr_config.yaml
python generate.py --config configs/clevr_config.yaml --checkpoint path/to/checkpoint

Refer to the repository's README for more detailed instructions on dataset preparation, training, and inference.

Competitor Comparisons

imaginaire

4,058

NVIDIA's Deep Imagination Team's PyTorch Library

Pros of imaginaire

More comprehensive library with multiple GAN-based models and techniques
Better documentation and examples for various use cases
Active development and regular updates

Cons of imaginaire

Higher complexity and steeper learning curve
Requires more computational resources due to its extensive features

Code Comparison

imaginaire:

from imaginaire.trainers import BaseTrainer
from imaginaire.utils.distributed import init_dist
from imaginaire.utils.distributed import master_only_print as print

trainer = BaseTrainer(cfg)
trainer.train()

GIRAFFE:

from giraffe import GIRAFFETrainer

trainer = GIRAFFETrainer(config)
trainer.train()

Summary

imaginaire offers a more comprehensive suite of GAN-based models and techniques, with better documentation and regular updates. However, it comes with increased complexity and resource requirements. GIRAFFE, on the other hand, provides a more focused implementation of the GIRAFFE model, which may be easier to use for specific tasks but lacks the broader feature set of imaginaire.

taming-transformers

6,229

Taming Transformers for High-Resolution Image Synthesis

Pros of taming-transformers

More versatile, supporting various image synthesis tasks beyond 3D-aware generation
Implements advanced techniques like VQGANs for high-quality image generation
Extensive documentation and examples for easier implementation

Cons of taming-transformers

Higher computational requirements due to complex architecture
Steeper learning curve for newcomers to the field
Less focused on 3D-aware generation compared to GIRAFFE

Code Comparison

GIRAFFE (3D-aware generation):

def forward(self, z_shape, z_app, camera, **kwargs):
    batch_size = z_shape.shape[0]
    p = self.sample_points(batch_size, camera, **kwargs)
    feat = self.decode_shape(z_shape)
    rgb_sigma = self.decode_color(feat, z_app, p, **kwargs)
    return rgb_sigma

taming-transformers (VQGAN):

def get_input(self, batch, k):
    x = batch[k]
    if len(x.shape) == 3:
        x = x[..., None]
    x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format)
    return x.float()

pytorch3d

9,337

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Pros of PyTorch3D

More comprehensive 3D deep learning library with a wider range of functionalities
Better integration with PyTorch ecosystem and GPU acceleration
Larger community support and more frequent updates

Cons of PyTorch3D

Steeper learning curve due to its extensive feature set
Heavier resource requirements for some operations
Less focused on specific 3D generative tasks compared to GIRAFFE

Code Comparison

PyTorch3D example:

import torch
from pytorch3d.structures import Meshes
from pytorch3d.renderer import Textures

verts = torch.randn(4, 3)
faces = torch.tensor([[0, 1, 2], [1, 2, 3]])
mesh = Meshes(verts=[verts], faces=[faces])

GIRAFFE example:

import torch
from giraffe.models import Generator

generator = Generator(z_dim=256, img_size=64)
z = torch.randn(1, 256)
img = generator(z)

PyTorch3D offers a more general-purpose approach to 3D operations, while GIRAFFE focuses on generative 3D-aware image synthesis. PyTorch3D provides lower-level building blocks for 3D deep learning, whereas GIRAFFE offers a higher-level API for specific generative tasks.

detr

14,567

End-to-End Object Detection with Transformers

Pros of DETR

More widely adopted and supported by a large tech company (Facebook)
Extensive documentation and examples for various use cases
Better performance on standard object detection benchmarks

Cons of DETR

Higher computational requirements for training and inference
More complex architecture, potentially harder to modify or extend
Less flexibility for novel view synthesis tasks

Code Comparison

DETR (PyTorch):

class DETR(nn.Module):
    def __init__(self, num_classes, hidden_dim, nheads,
                 num_encoder_layers, num_decoder_layers):
        super().__init__()
        self.transformer = Transformer(
            d_model=hidden_dim,
            dropout=0.1,
            nhead=nheads,
            dim_feedforward=2048,
            num_encoder_layers=num_encoder_layers,
            num_decoder_layers=num_decoder_layers,
        )

GIRAFFE (PyTorch):

class GIRAFFE(nn.Module):
    def __init__(self, device=None, **kwargs):
        super().__init__()
        self.device = device
        self.generator = Generator(**kwargs).to(device)
        self.discriminator = Discriminator(**kwargs).to(device)
        self.bds_discriminator = BDSDiscriminator(**kwargs).to(device)

DETR focuses on object detection using a transformer-based architecture, while GIRAFFE is designed for 3D-aware image synthesis with a generator-discriminator setup. DETR's code emphasizes the transformer structure, whereas GIRAFFE's code highlights its generative nature with separate generator and discriminator components.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields

Project Page | Paper | Supplementary | Video | Slides | Blog | Talk

Add Clevr Tranlation Horizontal Cars Interpolate Shape Faces

If you find our code or paper useful, please cite as

@inproceedings{GIRAFFE,
    title = {GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields},
    author = {Niemeyer, Michael and Geiger, Andreas},
    booktitle = {Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year = {2021}
}

TL; DR - Quick Start

Rotating Cars Tranlation Horizontal Cars

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called giraffe using

conda env create -f environment.yml
conda activate giraffe

You can now test our code on the provided pre-trained models. For example, simply run

python render.py configs/256res/cars_256_pretrained.yaml

This script should create a model output folder out/cars256_pretrained. The animations are then saved to the respective subfolders in out/cars256_pretrained/rendering.

Usage

Datasets

To train a model from scratch or to use our ground truth activations for evaluation, you have to download the respective dataset.

For this, please run

bash scripts/download_dataset.sh

and following the instructions. This script should download and unpack the data automatically into the data/ folder.

Controllable Image Synthesis

To render images of a trained model, run

python render.py CONFIG.yaml

where you replace CONFIG.yaml with the correct config file. The easiest way is to use a pre-trained model. You can do this by using one of the config files which are indicated with *_pretrained.yaml.

For example, for our model trained on Cars at 256x256 pixels, run

python render.py configs/256res/cars_256_pretrained.yaml

or for celebA-HQ at 256x256 pixels, run

python render.py configs/256res/celebahq_256_pretrained.yaml

Our script will automatically download the model checkpoints and render images. You can find the outputs in the out/*_pretrained folders.

Please note that the config files *_pretrained.yaml are only for evaluation or rendering, not for training new models: when these configs are used for training, the model will be trained from scratch, but during inference our code will still use the pre-trained model.

FID Evaluation

For evaluation of the models, we provide the script eval.py. You can run it using

python eval.py CONFIG.yaml

The script generates 20000 images and calculates the FID score.

Note: For some experiments, the numbers in the paper might slightly differ because we used the evaluation protocol from GRAF to fairly compare against the methods reported in GRAF.

Training

Finally, to train a new network from scratch, run

python train.py CONFIG.yaml

where you replace CONFIG.yaml with the name of the configuration file you want to use.

You can monitor on http://localhost:6006 the training process using tensorboard:

cd OUTPUT_DIR
tensorboard --logdir ./logs

where you replace OUTPUT_DIR with the respective output directory. For available training options, please take a look at configs/default.yaml.

2D-GAN Baseline

For convinience, we have implemented a 2D-GAN baseline which closely follows this GAN_stability repo. For example, you can train a 2D-GAN on CompCars at 64x64 pixels similar to our GIRAFFE method by running

python train.py configs/64res/cars_64_2dgan.yaml

Using Your Own Dataset

If you want to train a model on a new dataset, you first need to generate ground truth activations for the intermediate or final FID calculations. For this, you can use the script in scripts/calc_fid/precalc_fid.py. For example, if you want to generate an FID file for the comprehensive cars dataset at 64x64 pixels, you need to run

python scripts/precalc_fid.py  "data/comprehensive_cars/images/*.jpg" --regex True --gpu 0 --out-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz" --img-size 64

or for LSUN churches, you need to run

python scripts/precalc_fid.py path/to/LSUN --class-name scene_categories/church_outdoor_train_lmdb --lsun True --gpu 0 --out-file data/church/fid_files/church_64.npz --img-size 64

Note: We apply the same transformations to the ground truth images for this FID calculation as we do during training. If you want to use your own dataset, you need to adjust the image transformations in the script accordingly. Further, you might need to adjust the object-level and camera transformations to your dataset.

Evaluating Generated Images

We provide the script eval_files.py for evaluating the FID score of your own generated images. For example, if you would like to evaluate your images on CompCars at 64x64 pixels, save them to an npy file and run

python eval_files.py --input-file "path/to/your/images.npy" --gt-file "data/comprehensive_cars/fid_files/comprehensiveCars_64.npz"

Futher Information

More Work on Implicit Representations

If you like the GIRAFFE project, please check out related works on neural representions from our group:

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot