Top Related Projects
Instant neural graphics primitives: lightning fast NeRF and more
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
Code release for NeRF (Neural Radiance Fields)
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
PixelNeRF Official Repository
Quick Overview
MultiNeRF is a collection of NeRF (Neural Radiance Fields) extensions developed by Google Research. It includes implementations of Mip-NeRF 360, Ref-NeRF, and RawNeRF, which are advanced techniques for 3D scene reconstruction and novel view synthesis from 2D images.
Pros
- Improved rendering quality and consistency compared to original NeRF
- Better handling of large-scale scenes and unbounded captures
- Support for reflections and view-dependent effects
- Ability to work with raw image data for enhanced realism
Cons
- Computationally intensive, requiring significant GPU resources
- Complex setup and configuration process
- Limited documentation for some advanced features
- Steep learning curve for users unfamiliar with NeRF concepts
Code Examples
- Loading a dataset:
from internal import datasets
dataset = datasets.get_dataset('blender', 'lego', 'train')
- Creating a model:
from internal import models
model = models.construct_model(config)
- Training the model:
from internal import train_utils
state = train_utils.create_train_state(rng, model, config.lr_init)
state = train_utils.train_step(state, batch, rng)
- Rendering a novel view:
from internal import utils
rays = utils.namedtuple_map(lambda x: x[0], batch['rays'])
rendering = models.render_image(state, rays, rng=rng)
Getting Started
-
Clone the repository:
git clone https://github.com/google-research/multinerf.git cd multinerf
-
Install dependencies:
pip install -r requirements.txt
-
Download a dataset (e.g., Blender dataset):
bash scripts/download_example_data.sh
-
Run training:
python train.py --gin_configs configs/blender_360.gin --exp_name my_experiment
-
Render novel views:
python render.py --gin_configs configs/blender_360.gin --exp_name my_experiment
Competitor Comparisons
Instant neural graphics primitives: lightning fast NeRF and more
Pros of instant-ngp
- Significantly faster rendering and training times
- Supports real-time rendering and interactive visualization
- Utilizes GPU acceleration for improved performance
Cons of instant-ngp
- Limited to smaller scenes and datasets
- Less flexible in handling complex, multi-view scenarios
- May produce lower quality results for intricate details
Code Comparison
multinerf:
def render_rays(ray_batch, model, options):
# Complex ray marching and volumetric rendering
# Handles multi-scale representations
# Supports various sampling strategies
instant-ngp:
__global__ void render_kernel(
const float3* __restrict__ rays_o,
const float3* __restrict__ rays_d,
float4* __restrict__ output
) {
// Efficient parallel rendering on GPU
// Uses hash-based encoding for fast queries
}
Both projects aim to create neural representations of 3D scenes, but instant-ngp focuses on speed and real-time performance, while multinerf emphasizes flexibility and quality for complex scenes.
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
Pros of pytorch3d
- Broader scope: Covers a wide range of 3D deep learning tasks, not limited to NeRF
- Better integration with PyTorch ecosystem
- More extensive documentation and tutorials
Cons of pytorch3d
- Less specialized for NeRF-specific tasks
- May have a steeper learning curve for NeRF-focused projects
- Potentially slower for NeRF-specific computations
Code Comparison
pytorch3d example (rendering a mesh):
renderer = MeshRenderer(
rasterizer=MeshRasterizer(cameras=cameras, raster_settings=raster_settings),
shader=SoftPhongShader(device=device, cameras=cameras)
)
images = renderer(meshes, lights=lights, materials=materials)
multinerf example (rendering a NeRF):
rgb, disp, acc, extras = render_image(
render_fn,
rays,
config.chunk,
config.dataset.near,
config.dataset.far,
use_viewdirs=config.use_viewdirs,
rand=False,
)
Both repositories offer powerful tools for 3D rendering and deep learning, but pytorch3d provides a more general-purpose framework while multinerf focuses specifically on NeRF implementations.
Code release for NeRF (Neural Radiance Fields)
Pros of NeRF
- Simpler implementation, easier to understand and modify
- Original implementation, serving as a foundation for many subsequent works
- Lightweight and requires less computational resources
Cons of NeRF
- Limited to static scenes and single objects
- Lower rendering quality compared to more advanced methods
- Slower rendering times for complex scenes
Code Comparison
NeRF:
def render_rays(ray_batch,
network_fn,
network_query_fn,
N_samples,
retraw=False,
lindisp=False,
perturb=0.,
N_importance=0,
network_fine=None,
white_bkgd=False,
raw_noise_std=0.,
verbose=False):
# ... (implementation details)
MultiNeRF:
def render_image(render_fn: Callable[[Rays], Dict[str, Any]],
rays: Rays,
config: configs.Config,
verbose: bool = False,
device: Optional[str] = None) -> Dict[str, Any]:
# ... (implementation details)
MultiNeRF offers a more modular and flexible approach to rendering, with support for various NeRF variants and advanced features like multi-scale rendering and dynamic scenes. NeRF provides a straightforward implementation of the original concept, which can be beneficial for learning and experimentation.
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
Pros of nerf-pytorch
- Simpler implementation, making it easier to understand and modify
- Lightweight and focused solely on the core NeRF algorithm
- Better suited for educational purposes and quick experimentation
Cons of nerf-pytorch
- Limited features compared to the more comprehensive multinerf
- May not perform as well on complex scenes or larger datasets
- Lacks advanced techniques like mip-NeRF or instant-ngp implementations
Code Comparison
nerf-pytorch:
def render_rays(ray_batch,
network_fn,
network_query_fn,
N_samples,
retraw=False,
lindisp=False,
perturb=0.,
N_importance=0,
network_fine=None,
white_bkgd=False,
raw_noise_std=0.,
verbose=False):
# ... (implementation details)
multinerf:
def render_image(render_fn: Callable[[Rays], Dict[str, Any]],
rays: Rays,
config: Config,
verbose: bool = False,
chunk: int = 8192) -> Dict[str, Any]:
# ... (implementation details)
The code comparison shows that multinerf uses more modern Python typing and has a more modular structure, while nerf-pytorch has a more straightforward implementation with fewer abstractions.
NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
Pros of nerf_pl
- Simpler implementation, making it easier to understand and modify
- Uses PyTorch Lightning, which provides a more structured and scalable framework
- Includes a colab notebook for easy experimentation without local setup
Cons of nerf_pl
- Less advanced features compared to multinerf (e.g., no multi-scale representation)
- May not achieve the same level of rendering quality as multinerf
- Limited to simpler scenes and datasets
Code Comparison
multinerf:
def render_image(render_fn, rays, config, verbose=False):
"""Render all the pixels of an image (in test mode).
Args:
render_fn: function, jit-ed render function.
rays: a `Rays` namedtuple, the rays to be rendered.
config: A config dict.
verbose: print progress indicators.
Returns:
rgb: [H, W, 3] np.array, the rendered RGB image.
disp: [H, W] np.array, the disparity map.
acc: [H, W] np.array, the accumulated opacity.
nerf_pl:
def forward(self, rays_o, rays_d, viewdirs, global_step=None, **kwargs):
"""Do batched inference on rays using chunk."""
B = rays_o.shape[0]
results = defaultdict(list)
for i in range(0, B, self.chunk):
rendered_ray_chunks = \
render_rays(self.models,
self.embeddings,
rays_o[i:i+self.chunk],
rays_d[i:i+self.chunk],
viewdirs[i:i+self.chunk],
self.ndc,
self.near,
self.far,
use_viewdirs=self.use_viewdirs,
perturb=self.perturb,
noise_std=self.noise_std,
N_samples=self.N_samples,
N_importance=self.N_importance,
chunk=self.chunk, # chunk size is effective in val mode
white_back=self.white_back,
test_time=True,
**kwargs)
for k, v in rendered_ray_chunks.items():
results[k] += [v]
for k, v in results.items():
results[k] = torch.cat(v, 0)
return results
PixelNeRF Official Repository
Pros of pixel-nerf
- Simpler implementation, making it easier to understand and modify
- Faster training and inference times for single-object scenes
- Better generalization to novel views with limited input images
Cons of pixel-nerf
- Lower image quality and detail compared to multinerf
- Less suitable for complex, multi-object scenes
- Limited ability to handle large-scale environments
Code Comparison
pixel-nerf:
class PixelNeRF(nn.Module):
def __init__(self, D=8, W=256, input_ch=3, input_ch_views=3, output_ch=4):
super(PixelNeRF, self).__init__()
self.D = D
self.W = W
self.input_ch = input_ch
self.input_ch_views = input_ch_views
self.output_ch = output_ch
multinerf:
class NeRF(nn.Module):
def __init__(self, D=8, W=256, input_ch=3, input_ch_views=3, output_ch=4, skips=[4], use_viewdirs=False):
super(NeRF, self).__init__()
self.D = D
self.W = W
self.input_ch = input_ch
self.input_ch_views = input_ch_views
self.skips = skips
self.use_viewdirs = use_viewdirs
The code comparison shows that multinerf has additional parameters like skips
and use_viewdirs
, indicating a more complex architecture compared to pixel-nerf's simpler implementation.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
MultiNeRF: A Code Release for Mip-NeRF 360, Ref-NeRF, and RawNeRF
This is not an officially supported Google product.
This repository contains the code release for three CVPR 2022 papers: Mip-NeRF 360, Ref-NeRF, and RawNeRF. This codebase was written by integrating our internal implementations of Ref-NeRF and RawNeRF into our mip-NeRF 360 implementation. As such, this codebase should exactly reproduce the results shown in mip-NeRF 360, but may differ slightly when reproducing Ref-NeRF or RawNeRF results.
This implementation is written in JAX, and is a fork of mip-NeRF. This is research code, and should be treated accordingly.
Setup
# Clone the repo.
git clone https://github.com/google-research/multinerf.git
cd multinerf
# Make a conda environment.
conda create --name multinerf python=3.9
conda activate multinerf
# Prepare pip.
conda install pip
pip install --upgrade pip
# Install requirements.
pip install -r requirements.txt
# Manually install rmbrualla's `pycolmap` (don't use pip's! It's different).
git clone https://github.com/rmbrualla/pycolmap.git ./internal/pycolmap
# Confirm that all the unit tests pass.
./scripts/run_all_unit_tests.sh
You'll probably also need to update your JAX installation to support GPUs or TPUs.
Running
Example scripts for training, evaluating, and rendering can be found in
scripts/
. You'll need to change the paths to point to wherever the datasets
are located. Gin configuration files
for our model and some ablations can be found in configs/
.
After evaluating on the test set of each scene in one of the datasets, you can
use scripts/generate_tables.ipynb
to produce error metrics across all scenes
in the same format as was used in tables in the paper.
OOM errors
You may need to reduce the batch size (Config.batch_size
) to avoid out of memory
errors. If you do this, but want to preserve quality, be sure to increase the number
of training iterations and decrease the learning rate by whatever scale factor you
decrease batch size by.
Using your own data
Summary: first, calculate poses. Second, train MultiNeRF. Third, render a result video from the trained NeRF model.
- Calculating poses (using COLMAP):
DATA_DIR=my_dataset_dir
bash scripts/local_colmap_and_resize.sh ${DATA_DIR}
- Training MultiNeRF:
python -m train \
--gin_configs=configs/360.gin \
--gin_bindings="Config.data_dir = '${DATA_DIR}'" \
--gin_bindings="Config.checkpoint_dir = '${DATA_DIR}/checkpoints'" \
--logtostderr
- Rendering MultiNeRF:
python -m render \
--gin_configs=configs/360.gin \
--gin_bindings="Config.data_dir = '${DATA_DIR}'" \
--gin_bindings="Config.checkpoint_dir = '${DATA_DIR}/checkpoints'" \
--gin_bindings="Config.render_dir = '${DATA_DIR}/render'" \
--gin_bindings="Config.render_path = True" \
--gin_bindings="Config.render_path_frames = 480" \
--gin_bindings="Config.render_video_fps = 60" \
--logtostderr
Your output video should now exist in the directory my_dataset_dir/render/
.
See below for more detailed instructions on either using COLMAP to calculate poses or writing your own dataset loader (if you already have pose data from another source, like SLAM or RealityCapture).
Running COLMAP to get camera poses
In order to run MultiNeRF on your own captured images of a scene, you must first run COLMAP to calculate camera poses. You can do this using our provided script scripts/local_colmap_and_resize.sh
. Just make a directory my_dataset_dir/
and copy your input images into a folder my_dataset_dir/images/
, then run:
bash scripts/local_colmap_and_resize.sh my_dataset_dir
This will run COLMAP and create 2x, 4x, and 8x downsampled versions of your images. These lower resolution images can be used in NeRF by setting, e.g., the Config.factor = 4
gin flag.
By default, local_colmap_and_resize.sh
uses the OPENCV camera model, which is a perspective pinhole camera with k1, k2 radial and t1, t2 tangential distortion coefficients. To switch to another COLMAP camera model, for example OPENCV_FISHEYE, you can run
bash scripts/local_colmap_and_resize.sh my_dataset_dir OPENCV_FISHEYE
If you have a very large capture of more than around 500 images, we recommend switching from the exhaustive matcher to the vocabulary tree matcher in COLMAP (see the script for a commented-out example).
Our script is simply a thin wrapper for COLMAP--if you have run COLMAP yourself, all you need to do to load your scene in NeRF is ensure it has the following format:
my_dataset_dir/images/ <--- all input images
my_dataset_dir/sparse/0/ <--- COLMAP sparse reconstruction files (cameras, images, points)
Writing a custom dataloader
If you already have poses for your own data, you may prefer to write your own custom dataloader.
MultiNeRF includes a variety of dataloaders, all of which inherit from the base Dataset class.
The job of this class is to load all image and pose information from disk, then create batches of ray and color data for training or rendering a NeRF model.
Any inherited subclass is responsible for loading images and camera poses from
disk by implementing the _load_renderings
method (which is marked as
abstract by the decorator @abc.abstractmethod
). This data is then used to
generate train and test batches of ray + color data for feeding through the NeRF
model. The ray parameters are calculated in _make_ray_batch
.
Existing data loaders
To work from an example, you can see how this function is overloaded for the different dataloaders we have already implemented:
- Blender
- DTU dataset
- Tanks and Temples, as processed by the NeRF++ paper
- Tanks and Temples, as processed by the Free View Synthesis paper
The main data loader we rely on is LLFF (named for historical reasons), which is the loader for a dataset that has been posed by COLMAP.
Making your own loader by implementing _load_renderings
To make a new dataset, make a class inheriting from Dataset
and overload the
_load_renderings
method:
class MyNewDataset(Dataset):
def _load_renderings(self, config):
...
In this function, you must set the following public attributes:
- images
- camtoworlds
- pixtocams
- height, width
Many of our dataset loaders also set other useful attributes, but these are the
critical ones for generating rays. You can see how they are used (along with a batch of pixel coordinates) to create rays in camera_utils.pixels_to_rays
.
Images
images
= [N, height, width, 3] numpy array of RGB images. Currently we
require all images to have the same resolution.
Extrinsic camera poses
camtoworlds
= [N, 3, 4] numpy array of extrinsic pose matrices.
camtoworlds[i]
should be in camera-to-world format, such that we can run
pose = camtoworlds[i]
x_world = pose[:3, :3] @ x_camera + pose[:3, 3:4]
to convert a 3D camera space point x_camera
into a world space point x_world
.
These matrices must be stored in the OpenGL coordinate system convention for camera rotation: x-axis to the right, y-axis upward, and z-axis backward along the camera's focal axis.
The most common conventions are
[right, up, backwards]
: OpenGL, NeRF, most graphics code.[right, down, forwards]
: OpenCV, COLMAP, most computer vision code.
Fortunately switching from OpenCV/COLMAP to NeRF is
simple:
you just need to right-multiply the OpenCV pose matrices by np.diag([1, -1, -1, 1])
,
which will flip the sign of the y-axis (from down to up) and z-axis (from
forwards to backwards):
camtoworlds_opengl = camtoworlds_opencv @ np.diag([1, -1, -1, 1])
You may also want to scale your camera pose translations such that they all
lie within the [-1, 1]^3
cube for best performance with the default mipnerf360
config files.
We provide a useful helper function camera_utils.transform_poses_pca
that computes a translation/rotation/scaling transform for the input poses that aligns the world space x-y plane with the ground (based on PCA) and scales the scene so that all input pose positions lie within [-1, 1]^3
. (This function is applied by default when loading mip-NeRF 360 scenes with the LLFF data loader.) For a scene where this transformation has been applied, camera_utils.generate_ellipse_path
can be used to generate a nice elliptical camera path for rendering videos.
Intrinsic camera poses
pixtocams
= [N, 3, 4] numpy array of inverse intrinsic matrices, OR [3, 4]
numpy array of a single shared inverse intrinsic matrix. These should be in
OpenCV format, e.g.
camtopix = np.array([
[focal, 0, width/2],
[ 0, focal, height/2],
[ 0, 0, 1],
])
pixtocam = np.linalg.inv(camtopix)
Given a focal length and image size (and assuming a centered principal point,
this matrix can be created using
camera_utils.get_pixtocam
.
Alternatively, it can be created by using
camera_utils.intrinsic_matrix
and inverting the resulting matrix.
Resolution
height
= int, height of images.
width
= int, width of images.
Distortion parameters (optional)
distortion_params
= dict, camera lens distortion model parameters. This
dictionary must map from strings -> floats, and the allowed keys are ['k1', 'k2', 'k3', 'k4', 'p1', 'p2']
(up to four radial coefficients and up to two
tangential coefficients). By default, this is set to the empty dictionary {}
,
in which case undistortion is not run.
Details of the inner workings of Dataset
The public interface mimics the behavior of a standard machine learning pipeline
dataset provider that can provide infinite batches of data to the
training/testing pipelines without exposing any details of how the batches are
loaded/created or how this is parallelized. Therefore, the initializer runs all
setup, including data loading from disk using _load_renderings
, and begins
the thread using its parent start() method. After the initializer returns, the
caller can request batches of data straight away.
The internal self._queue
is initialized as queue.Queue(3)
, so the infinite
loop in run()
will block on the call self._queue.put(self._next_fn())
once
there are 3 elements. The main thread training job runs in a loop that pops 1
element at a time off the front of the queue. The Dataset thread's run()
loop
will populate the queue with 3 elements, then wait until a batch has been
removed and push one more onto the end.
This repeats indefinitely until the main thread's training loop completes (typically hundreds of thousands of iterations), then the main thread will exit and the Dataset thread will automatically be killed since it is a daemon.
Citation
If you use this software package, please cite whichever constituent paper(s) you build upon, or feel free to cite this entire codebase as:
@misc{multinerf2022,
title={{MultiNeRF}: {A} {Code} {Release} for {Mip-NeRF} 360, {Ref-NeRF}, and {RawNeRF}},
author={Ben Mildenhall and Dor Verbin and Pratul P. Srinivasan and Peter Hedman and Ricardo Martin-Brualla and Jonathan T. Barron},
year={2022},
url={https://github.com/google-research/multinerf},
}
Top Related Projects
Instant neural graphics primitives: lightning fast NeRF and more
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
Code release for NeRF (Neural Radiance Fields)
A PyTorch implementation of NeRF (Neural Radiance Fields) that reproduces the results.
NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
PixelNeRF Official Repository
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot