nerfies

This is the code for Deformable Neural Radiance Fields, a.k.a. Nerfies.

1,627

216

1,627

View on GitHub

Top Related Projects

pytorch3d

8,643

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

instant-ngp

15,796

Instant neural graphics primitives: lightning fast NeRF and more

nerf

9,767

Code release for NeRF (Neural Radiance Fields)

nerf_pl

2,700

NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

nerf-pytorch

5,359

A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.

Quick Overview

Nerfies is a project by Google Research that introduces a novel method for creating free-viewpoint portraits from casually captured selfie videos. It combines neural radiance fields (NeRF) with a deformation model to handle non-rigid scenes, allowing for the creation of 3D selfie avatars that can be animated and viewed from any angle.

Pros

Enables creation of 3D selfie avatars from casual video captures
Handles non-rigid scenes and subject movement during capture
Produces high-quality, photorealistic results
Allows for novel view synthesis and animation of the captured subject

Cons

Requires significant computational resources for training and rendering
May struggle with extreme pose changes or complex lighting conditions
Limited to portrait-style captures, not suitable for full-body or environment reconstruction
Potential privacy concerns with creating detailed 3D models of individuals

Code Examples

This project is primarily a research paper and does not provide a public code library. However, the authors have released some example code for visualization purposes. Here's a simplified example of how one might use the provided visualization tools:

import numpy as np
import matplotlib.pyplot as plt
from nerfies import visualization

# Load example data (this is a placeholder, actual data loading would depend on the specific implementation)
poses = np.load('example_poses.npy')
images = np.load('example_images.npy')

# Visualize camera poses
visualization.plot_camera_poses(poses)
plt.show()

# Display image grid
visualization.image_grid(images, rows=2, cols=3)
plt.show()

Getting Started

As this is primarily a research project without a public code release, there isn't a standard "getting started" process. However, interested researchers or developers can:

Read the full paper: "Nerfies: Deformable Neural Radiance Fields" available on arXiv.
Explore the project website: https://nerfies.github.io/
Check the GitHub repository for any updates or code releases: https://github.com/google/nerfies
Implement the method based on the paper details if working on similar research projects.

Note that full implementation would require expertise in computer vision, deep learning, and 3D graphics.

Competitor Comparisons

pytorch3d

8,643

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Pros of PyTorch3D

Broader scope: Offers a comprehensive set of tools for 3D deep learning, including rendering, mesh operations, and point cloud processing
Better integration with PyTorch ecosystem: Seamlessly works with PyTorch models and GPU acceleration
More active development: Regular updates and contributions from a larger community

Cons of PyTorch3D

Steeper learning curve: Requires more extensive knowledge of 3D computer vision concepts
Higher computational requirements: Some operations can be resource-intensive, especially for complex 3D scenes

Code Comparison

PyTorch3D example (rendering a mesh):

renderer = MeshRenderer(
    rasterizer=MeshRasterizer(cameras=cameras, raster_settings=raster_settings),
    shader=SoftPhongShader(device=device, cameras=cameras)
)
images = renderer(meshes, lights=lights, materials=materials)

Nerfies example (optimizing a NeRF model):

model = NerfModel()
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-4)
for step in range(num_steps):
    with tf.GradientTape() as tape:
        loss = compute_loss(model, batch)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

instant-ngp

15,796

Instant neural graphics primitives: lightning fast NeRF and more

Pros of instant-ngp

Significantly faster rendering and training times
Supports a wider range of 3D representations (NeRF, SDF, density grids)
Achieves higher visual quality in less time

Cons of instant-ngp

Less focus on dynamic scenes and deformable objects
Requires more powerful hardware (CUDA-enabled GPU) for optimal performance
May be more complex to set up and use for beginners

Code Comparison

nerfies

def render_rays(ray_batch, model, options):
    rays_o, rays_d = ray_batch[:, 0:3], ray_batch[:, 3:6]
    viewdirs = ray_batch[:, -3:] if ray_batch.shape[-1] > 8 else None
    bounds = torch.reshape(ray_batch[..., 6:8], [-1, 1, 2])
    near, far = bounds[..., 0], bounds[..., 1]
    return render_rays_helper(rays_o, rays_d, viewdirs, near, far, model, options)

instant-ngp

__global__ void render_kernel(
    const uint32_t n_elements,
    const uint32_t n_training_images,
    const TrainingImageMetadata* __restrict__ metadata,
    const float* __restrict__ transforms,
    const float* __restrict__ focal_lengths,
    float* __restrict__ render_buffer,
    float* __restrict__ depth_buffer
) {
    // Kernel implementation
}

nerf

9,767

Code release for NeRF (Neural Radiance Fields)

Pros of NeRF

Pioneered the original Neural Radiance Fields concept
Simpler implementation, easier to understand for beginners
Focuses on static scenes, which can be advantageous for certain applications

Cons of NeRF

Limited to static scenes, unlike Nerfies which handles dynamic subjects
Lacks advanced features like appearance modeling and background regularization
May require more images for high-quality results compared to Nerfies

Code Comparison

NeRF (simplified ray sampling):

def sample_along_rays(rays_o, rays_d, near, far, n_samples):
    t = torch.linspace(0., 1., n_samples)
    z = near * (1. - t) + far * t
    return rays_o[..., None, :] + rays_d[..., None, :] * z[..., :, None]

Nerfies (simplified pose optimization):

def optimize_camera_poses(model, optimizer, images, poses):
    loss = 0
    for image, pose in zip(images, poses):
        pred = model(pose)
        loss += mse_loss(pred, image)
    loss.backward()
    optimizer.step()

Both repositories implement Neural Radiance Fields, but Nerfies extends the concept to handle dynamic scenes and includes additional features for improved quality and flexibility.

nerf_pl

2,700

NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

Pros of nerf_pl

Implemented in PyTorch Lightning, offering better organization and scalability
Includes more NeRF variants and extensions, providing a wider range of options
Offers a more comprehensive set of training and evaluation scripts

Cons of nerf_pl

Less focus on dynamic scenes and deformable objects compared to nerfies
May have a steeper learning curve due to the inclusion of multiple NeRF variants

Code Comparison

nerfies:

def render_rays(ray_batch,
                model,
                chunk=8192,
                near=0.0,
                far=1.0,
                noise_std=0.0,
                randomized=False,
                white_bkgd=False):
  # ... (implementation details)

nerf_pl:

def render_rays(models,
                embeddings,
                rays,
                N_samples=64,
                use_disp=False,
                perturb=0,
                noise_std=1,
                N_importance=0,
                chunk=1024*32,
                white_back=False,
                test_time=False,
                **kwargs):
  # ... (implementation details)

Both repositories implement NeRF (Neural Radiance Fields) techniques, but nerf_pl offers a broader range of implementations and is built on PyTorch Lightning for improved organization. nerfies focuses more on dynamic scenes and deformable objects, while nerf_pl provides a wider variety of NeRF variants and extensions.

nerf-pytorch

5,359

A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.

Pros of nerf-pytorch

Simpler implementation, making it easier to understand and modify
Focuses on the core NeRF algorithm without additional features
More lightweight and potentially faster to train on smaller datasets

Cons of nerf-pytorch

Lacks advanced features and optimizations present in Nerfies
May not handle dynamic scenes or deformations as effectively
Limited documentation and examples compared to Nerfies

Code Comparison

nerf-pytorch:

def render_rays(ray_batch,
                network_fn,
                network_query_fn,
                N_samples,
                retraw=False,
                lindisp=False,
                perturb=0.,
                N_importance=0,
                network_fine=None,
                white_bkgd=False,
                raw_noise_std=0.,
                verbose=False):
    # ... (implementation details)

Nerfies:

def render_image(render_fn, rays, config, rng, verbose=False):
  """Render all the pixels of an image (in test mode).

  Args:
    render_fn: function, jit-ed render function.
    rays: a `Rays` namedtuple, the rays to be rendered.
    config: A config dict.
    rng: jnp.ndarray, random number generator (used in training mode only).
    verbose: bool, print progress indicators.

  Returns:
    rgb: jnp.ndarray, rendered color image.
    disp: jnp.ndarray, rendered disparity image.
    acc: jnp.ndarray, rendered accumulated weights per pixel.
  """
  # ... (implementation details)

pixel-nerf

1,386

PixelNeRF Official Repository

Pros of pixel-nerf

Faster rendering and inference times
Supports novel view synthesis from sparse input views
More memory-efficient implementation

Cons of pixel-nerf

Less detailed reconstructions compared to Nerfies
Limited ability to handle dynamic scenes or deformations
May struggle with complex lighting conditions

Code Comparison

pixel-nerf:

class PixelNeRF(nn.Module):
    def __init__(self, D=8, W=256, input_ch=3, input_ch_views=3, output_ch=4):
        super(PixelNeRF, self).__init__()
        self.D = D
        self.W = W
        self.input_ch = input_ch
        self.input_ch_views = input_ch_views
        self.output_ch = output_ch

Nerfies:

class NerfModel(nn.Module):
    def __init__(self, num_coarse_samples, num_fine_samples, use_viewdirs,
                 num_instances, latent_dim):
        super().__init__()
        self.num_coarse_samples = num_coarse_samples
        self.num_fine_samples = num_fine_samples
        self.use_viewdirs = use_viewdirs

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Nerfies: Deformable Neural Radiance Fields

This is the code for Nerfies: Deformable Neural Radiance Fields.

This codebase is implemented using JAX, building on JaxNeRF.

This repository has been updated to reflect the version used for our ICCV 2021 submission.

Demo

We provide an easy-to-get-started demo using Google Colab!

These Colabs will allow you to train a basic version of our method using Cloud TPUs (or GPUs) on Google Colab.

Note that due to limited compute resources available, these are not the fully featured models. If you would like to train a fully featured Nerfie, please refer to the instructions below on how to train on your own machine.

Description	Link
Process a video into a Nerfie dataset
Train a Nerfie
Render a Nerfie video

Setup

The code can be run under any environment with Python 3.8 and above. (It may run with lower versions, but we have not tested it).

We recommend using Miniconda and setting up an environment:

conda create --name nerfies python=3.8

Next, install the required packages:

pip install -r requirements.txt

Install the appropriate JAX distribution for your environment by following the instructions here. For example:

# For CUDA version 11.0
pip install --upgrade "jax[cuda111]" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Training

After preparing a dataset, you can train a Nerfie by running:

export DATASET_PATH=/path/to/dataset
export EXPERIMENT_PATH=/path/to/save/experiment/to
python train.py \
    --data_dir $DATASET_PATH \
    --base_folder $EXPERIMENT_PATH \
    --gin_configs configs/test_vrig.gin

To plot telemetry to Tensorboard and render checkpoints on the fly, also launch an evaluation job by running:

python eval.py \
    --data_dir $DATASET_PATH \
    --base_folder $EXPERIMENT_PATH \
    --gin_configs configs/test_vrig.gin

The two jobs should use a mutually exclusive set of GPUs. This division allows the training job to run without having to stop for evaluation.

Configuration

We use Gin for configuration.
We provide a couple preset configurations.
Please refer to config.py for documentation on what each configuration does.
Preset configs:
- gpu_vrig_paper.gin: This is the configuration we used to generate the table in the paper. It requires 8 GPUs for training.
- gpu_fullhd.gin: This is a high-resolution model and will take around 3 days to train on 8 GPUs.
- gpu_quarterhd.gin: This is a low-resolution model and will take around 14 hours to train on 8 GPUs.
- test_local.gin: This is a test configuration to see if the code runs. It probably will not result in a good looking result.
- test_vrig.gin: This is a test configuration to see if the code runs for validation rig captures. It probably will not result in a good looking result.
Training on fewer GPUs will require tuning of the batch size and learning rates. We've provided an example configuration for 4 GPUs in gpu_quarterhd_4gpu.gin but we have not tested it, so please only use it as a reference.

Datasets

A dataset is a directory with the following structure:

dataset
    âââ camera
    âÂ Â  âââ ${item_id}.json
    âââ camera-paths
    âââ rgb
    âÂ Â  âââ ${scale}x
    âÂ Â  âââ âââ ${item_id}.png
    âââ metadata.json
    âââ points.npy
    âââ dataset.json
    âââ scene.json

At a high level, a dataset is simply the following:

A collection of images (e.g., from a video).
Camera parameters for each image.

We have a unique identifier for each image which we call item_id, and this is used to match the camera and images. An item_id can be any string, but typically it is some alphanumeric string such as 000054.

`camera`

This directory contains cameras corresponding to each image.
We use a camera model identical to the OpenCV camera model, which is also supported by COLMAP.
Each camera is a serialized version of the Camera class defined in camera.py and looks like this:

{
  // A 3x3 world-to-camera rotation matrix representing the camera orientation.
  "orientation": [
    [0.9839, -0.0968, 0.1499],
    [-0.0350, -0.9284, -0.3699],
    [0.1749, 0.358, -0.9168]
  ],
  // The 3D position of the camera in world-space.
  "position": [-0.3236, -3.26428, 5.4160],
  // The focal length of the camera.
  "focal_length": 2691,
  // The principle point [u_0, v_0] of the camera.
  "principal_point": [1220, 1652],
  // The skew of the camera.
  "skew": 0.0,
  // The aspect ratio for the camera pixels.
  "pixel_aspect_ratio": 1.0,
  // Parameters for the radial distortion of the camera.
  "radial_distortion": [0.1004, -0.2090, 0.0],
  // Parameters for the tangential distortion of the camera.
  "tangential": [0.001109, -2.5733e-05],
  // The image width and height in pixels.
  "image_size": [2448, 3264]
}

`camera-paths`

This directory contains test-time camera paths which can be used to render videos.
Each sub-directory in this path should contain a sequence of JSON files.
The naming scheme does not matter, but the cameras will be sorted by their filenames.

`rgb`

This directory contains images at various scales.
Each subdirectory should be named ${scale}x where ${scale} is an integer scaling factor. For example, 1x would contain the original images while 4x would contain images a quarter of the size.
We assume the images are in PNG format.
It is important the scaled images are integer factors of the original to allow the use of area relation when scaling the images to prevent MoirÃ©. A simple way to do this is to simply trim the borders of the image to be divisible by the maximum scale factor you want.

`metadata.json`

This defines the 'metadata' IDs used for embedding lookups.
Contains a dictionary of the following format:

{
    "${item_id}": {
        // The embedding ID used to fetch the deformation latent code
        // passed to the deformation field.
        "warp_id": 0,
        // The embedding ID used to fetch the appearance latent code
        // which is passed to the second branch of the template NeRF.
        "appearance_id": 0,
        // For validation rig datasets, we use the camera ID instead
        // of the appearance ID. For example, this would be '0' for the
        // left camera and '1' for the right camera. This can potentially
        // also be used for multi-view setups as well.
        "camera_id": 0
    },
    ...
},

`scene.json`

Contains information about how we will parse the scene.
See comments inline.

{
  // The scale factor we will apply to the pointcloud and cameras. This is
  // important since it controls what scale is used when computing the positional
  // encoding.
  "scale": 0.0387243672920458,
  // Defines the origin of the scene. The scene will be translated such that
  // this point becomes the origin. Defined in unscaled coordinates.
  "center": [
    1.1770838526103944e-08,
    -2.58235339289195,
    -1.29117656263135
  ],
  // The distance of the near plane from the camera center in scaled coordinates.
  "near": 0.02057418950149491,
  // The distance of the far plane from the camera center in scaled coordinates.
  "far": 0.8261601717667288
}

`dataset.json`

Defines the training/validation split of the dataset.
See inline comments:

{
  // The total number of images in the dataset.
  "count": 114,
  // The total number of training images (exemplars) in the dataset.
  "num_exemplars": 57,
  // A list containins all item IDs in the dataset.
  "ids": [...],
  // A list containing all training item IDs in the dataset.
  "train_ids": [...],
  // A list containing all validation item IDs in the dataset.
  // This should be mutually exclusive with `train_ids`.
  "val_ids": [...],
}

`points.npy`

A numpy file containing a single array of size (N,3) containing the background points.
This is required if you want to use the background regularization loss.

Citing

If you find our work useful, please consider citing:

@article{park2021nerfies
  author    = {Park, Keunhong 
               and Sinha, Utkarsh 
               and Barron, Jonathan T. 
               and Bouaziz, Sofien 
               and Goldman, Dan B 
               and Seitz, Steven M. 
               and Martin-Brualla, Ricardo},
  title     = {Nerfies: Deformable Neural Radiance Fields},
  journal   = {ICCV},
  year      = {2021},
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot