Convert Figma logo to code with AI

autonomousvision logooccupancy_networks

This repository contains the code for the paper "Occupancy Networks - Learning 3D Reconstruction in Function Space"

1,507
292
1,507
81

Top Related Projects

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Google Research

Instant neural graphics primitives: lightning fast NeRF and more

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

11,188

Open3D: A Modern Library for 3D Data Processing

Quick Overview

Occupancy Networks is a project that introduces a novel 3D representation for learning-based 3D reconstruction. It represents 3D geometry as a continuous decision boundary of a classifier, providing a memory-efficient and resolution-independent representation. This approach allows for detailed reconstructions from various input types like point clouds, images, and voxels.

Pros

  • Resolution-independent 3D representation
  • Memory-efficient compared to voxel-based approaches
  • Versatile input handling (point clouds, images, voxels)
  • Capable of producing high-quality 3D reconstructions

Cons

  • May require significant computational resources for training
  • Complexity in implementation and understanding the underlying concepts
  • Limited to static object reconstruction (not suitable for dynamic scenes)
  • Potential challenges in real-time applications due to computational demands

Code Examples

  1. Loading a pre-trained model:
from im2mesh import config
from im2mesh.checkpoints import CheckpointIO

cfg = config.load_config('configs/img/onet_pretrained.yaml', 'CONFIG_PATH')
checkpoint_io = CheckpointIO(cfg['model']['checkpoint_dir'], model=model)
checkpoint_io.load(cfg['test']['model_file'])
  1. Generating 3D mesh from an image:
import torch
from im2mesh.onet import generation

with torch.no_grad():
    data = next(iter(test_loader))
    inputs = data.get('inputs', torch.empty(1, 0)).to(device)
    mesh = generation.generate_mesh(model, inputs, cfg['generation'])
  1. Visualizing the generated mesh:
import trimesh

mesh_trimesh = trimesh.Trimesh(vertices=mesh.vertices, faces=mesh.faces)
mesh_trimesh.show()

Getting Started

  1. Clone the repository:

    git clone https://github.com/autonomousvision/occupancy_networks.git
    cd occupancy_networks
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models:

    bash scripts/download_pretrained.sh
    
  4. Run inference on a sample image:

    python generate.py configs/img/onet_pretrained.yaml --input_path path/to/image.jpg
    

Competitor Comparisons

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Pros of pytorch3d

  • Broader scope: Provides a comprehensive set of tools for 3D deep learning, including rendering, mesh operations, and point cloud processing
  • Active development: Regularly updated with new features and improvements
  • Extensive documentation and tutorials

Cons of pytorch3d

  • Steeper learning curve due to its broader scope
  • May be overkill for projects focused solely on 3D reconstruction

Code comparison

occupancy_networks:

import torch
from im2mesh import config, data
from im2mesh.checkpoints import CheckpointIO

cfg = config.load_config('configs/img/onet_pretrained.yaml', 'configs/default.yaml')
dataset = config.get_dataset('test', cfg, return_idx=True)
model = config.get_model(cfg, dataset=dataset)

pytorch3d:

import torch
from pytorch3d.structures import Meshes
from pytorch3d.renderer import Textures
from pytorch3d.renderer import look_at_view_transform, FoVPerspectiveCameras, PointLights, DirectionalLights, Materials, RasterizationSettings, MeshRenderer, MeshRasterizer, SoftPhongShader, TexturesVertex

verts, faces = load_obj("model.obj")
mesh = Meshes(verts=[verts], faces=[faces])

Google Research

Pros of google-research

  • Broader scope, covering various research areas and projects
  • More active development with frequent updates and contributions
  • Larger community and potential for collaboration

Cons of google-research

  • Less focused on a specific topic, potentially harder to navigate
  • May require more time to find relevant information for specific tasks
  • Larger codebase might be overwhelming for newcomers

Code comparison

occupancy_networks:

def compute_iou(occ1, occ2):
    ''' Computes the Intersection over Union (IoU) value for two sets of occupancy values.
    Args:
        occ1 (tensor): first set of occupancy values
        occ2 (tensor): second set of occupancy values
    '''
    occ1 = np.asarray(occ1)
    occ2 = np.asarray(occ2)

google-research:

def compute_metrics(predictions, labels):
  """Computes evaluation metrics for a batch of predictions.

  Args:
    predictions: [batch_size, ...] Tensor of predictions.
    labels: [batch_size, ...] Tensor of labels.

  Returns:
    A dictionary of metric names to metric values.
  """

Summary

While occupancy_networks focuses specifically on 3D reconstruction using occupancy networks, google-research covers a wide range of research topics. The google-research repository offers more diverse content and active development, but may be less focused for specific tasks. occupancy_networks provides a more targeted approach to 3D reconstruction, potentially easier to navigate for those specifically interested in this area.

Instant neural graphics primitives: lightning fast NeRF and more

Pros of instant-ngp

  • Significantly faster rendering and training times
  • Supports real-time rendering of complex 3D scenes
  • Utilizes GPU acceleration for improved performance

Cons of instant-ngp

  • Requires more GPU memory for large-scale scenes
  • Less focus on explicit surface reconstruction
  • May produce artifacts in some complex geometries

Code Comparison

occupancy_networks:

def compute_occupancy(self, p):
    c = self.encode_inputs(p)
    logits = self.decoder(p, c)
    return logits

instant-ngp:

__device__ inline float network_to_density(float val, ENerfActivation activation) {
    switch (activation) {
        case ENerfActivation::None: return val;
        case ENerfActivation::ReLU: return max(0.0f, val);
        case ENerfActivation::Exponential: return __expf(val);
        default: return 0.0f;
    }
}

The code snippets highlight the different approaches: occupancy_networks uses a Python-based neural network for occupancy prediction, while instant-ngp employs CUDA for efficient density calculations on the GPU.

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • More comprehensive and versatile, supporting a wide range of computer vision tasks
  • Backed by Facebook AI Research, ensuring regular updates and community support
  • Extensive documentation and pre-trained models available

Cons of Detectron2

  • Steeper learning curve due to its broader scope
  • Potentially higher computational requirements for some tasks

Code Comparison

Detectron2:

from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
predictor = DefaultPredictor(cfg)

Occupancy Networks:

from im2mesh import config
from im2mesh.checkpoints import CheckpointIO

cfg = config.load_config('configs/img/onet_pretrained.yaml', 'configs/default.yaml')
checkpoint_io = CheckpointIO(cfg['training']['out_dir'], model=model, optimizer=optimizer)

Both repositories focus on different aspects of computer vision. Detectron2 is a more general-purpose library for various tasks, while Occupancy Networks specifically targets 3D reconstruction. Detectron2 offers more flexibility and a wider range of applications, but Occupancy Networks may be more suitable for specialized 3D tasks.

11,188

Open3D: A Modern Library for 3D Data Processing

Pros of Open3D

  • Comprehensive 3D data processing library with a wide range of functionalities
  • Efficient C++ core with Python bindings for ease of use
  • Active development and community support

Cons of Open3D

  • Steeper learning curve due to its extensive feature set
  • May be overkill for projects focused solely on 3D reconstruction

Code Comparison

Open3D example (point cloud visualization):

import open3d as o3d

pcd = o3d.io.read_point_cloud("example.ply")
o3d.visualization.draw_geometries([pcd])

Occupancy Networks example (occupancy prediction):

import torch
from models import decoder

points = torch.rand(1000, 3)
z = encoder(points)
occ = decoder(points, z)

Open3D offers a more general-purpose 3D processing toolkit, while Occupancy Networks focuses specifically on learning-based 3D reconstruction. Open3D provides ready-to-use functions for various 3D tasks, whereas Occupancy Networks requires more custom implementation for specific use cases. The choice between the two depends on the project's requirements and the developer's familiarity with deep learning techniques for 3D reconstruction.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Occupancy Networks

Example 1 Example 2 Example 3

This repository contains the code to reproduce the results from the paper Occupancy Networks - Learning 3D Reconstruction in Function Space.

You can find detailed usage instructions for training your own models and using pretrained models below.

If you find our code or paper useful, please consider citing

@inproceedings{Occupancy Networks,
    title = {Occupancy Networks: Learning 3D Reconstruction in Function Space},
    author = {Mescheder, Lars and Oechsle, Michael and Niemeyer, Michael and Nowozin, Sebastian and Geiger, Andreas},
    booktitle = {Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)},
    year = {2019}
}

Installation

First you have to make sure that you have all dependencies in place. The simplest way to do so, is to use anaconda.

You can create an anaconda environment called mesh_funcspace using

conda env create -f environment.yaml
conda activate mesh_funcspace

Next, compile the extension modules. You can do this via

python setup.py build_ext --inplace

To compile the dmc extension, you have to have a cuda enabled device set up. If you experience any errors, you can simply comment out the dmc_* dependencies in setup.py. You should then also comment out the dmc imports in im2mesh/config.py.

Demo

Example Input Example Output

You can now test our code on the provided input images in the demo folder. To this end, simply run

python generate.py configs/demo.yaml

This script should create a folder demo/generation where the output meshes are stored. The script will copy the inputs into the demo/generation/inputs folder and creates the meshes in the demo/generation/meshes folder. Moreover, the script creates a demo/generation/vis folder where both inputs and outputs are copied together.

Dataset

To evaluate a pretrained model or train a new model from scratch, you have to obtain the dataset. To this end, there are two options:

  1. you can download our preprocessed data
  2. you can download the ShapeNet dataset and run the preprocessing pipeline yourself

Take in mind that running the preprocessing pipeline yourself requires a substantial amount time and space on your hard drive. Unless you want to apply our method to a new dataset, we therefore recommmend to use the first option.

Preprocessed data

You can download our preprocessed data (73.4 GB) using

bash scripts/download_data.sh

This script should download and unpack the data automatically into the data/ShapeNet folder.

Building the dataset

Alternatively, you can also preprocess the dataset yourself. To this end, you have to follow the following steps:

You are now ready to build the dataset:

cd scripts
bash dataset_shapenet/build.sh

This command will build the dataset in data/ShapeNet.build. To install the dataset, run

bash dataset_shapenet/install.sh

If everything worked out, this will copy the dataset into data/ShapeNet.

Usage

When you have installed all binary dependencies and obtained the preprocessed data, you are ready to run our pretrained models and train new models from scratch.

Generation

To generate meshes using a trained model, use

python generate.py CONFIG.yaml

where you replace CONFIG.yaml with the correct config file.

The easiest way is to use a pretrained model. You can do this by using one of the config files

configs/img/onet_pretrained.yaml
configs/pointcloud/onet_pretrained.yaml
configs/voxels/onet_pretrained.yaml
configs/unconditional/onet_cars_pretrained.yaml
configs/unconditional/onet_airplanes_pretrained.yaml
configs/unconditional/onet_sofas_pretrained.yaml
configs/unconditional/onet_chairs_pretrained.yaml

which correspond to the experiments presented in the paper. Our script will automatically download the model checkpoints and run the generation. You can find the outputs in the out/*/*/pretrained folders.

Please note that the config files *_pretrained.yaml are only for generation, not for training new models: when these configs are used for training, the model will be trained from scratch, but during inference our code will still use the pretrained model.

Evaluation

For evaluation of the models, we provide two scripts: eval.py and eval_meshes.py.

The main evaluation script is eval_meshes.py. You can run it using

python eval_meshes.py CONFIG.yaml

The script takes the meshes generated in the previous step and evaluates them using a standardized protocol. The output will be written to .pkl/.csv files in the corresponding generation folder which can be processed using pandas.

For a quick evaluation, you can also run

python eval.py CONFIG.yaml

This script will run a fast method specific evaluation to obtain some basic quantities that can be easily computed without extracting the meshes. This evaluation will also be conducted automatically on the validation set during training.

All results reported in the paper were obtained using the eval_meshes.py script.

Training

Finally, to train a new network from scratch, run

python train.py CONFIG.yaml

where you replace CONFIG.yaml with the name of the configuration file you want to use.

You can monitor on http://localhost:6006 the training process using tensorboard:

cd OUTPUT_DIR
tensorboard --logdir ./logs --port 6006

where you replace OUTPUT_DIR with the respective output directory.

For available training options, please take a look at configs/default.yaml.

Notes

  • In our paper we used random crops and scaling to augment the input images. However, we later found that this image augmentation decreases performance on the ShapeNet test set. The pretrained model that is loaded in configs/img/onet_pretrained.yaml was hence trained without data augmentation and has slightly better performance than the model from the paper. The updated table looks a follows: Updated table for single view 3D reconstruction experiment For completeness, we also provide the trained weights for the model which was used in the paper in configs/img/onet_legacy_pretrained.yaml.
  • Note that training and evaluation of both our model and the baselines is performed with respect to the watertight models, but that normalization into the unit cube is performed with respect to the non-watertight meshes (to be consistent with the voxelizations from Choy et al.). As a result, the bounding box of the sampled point cloud is usually slightly bigger than the unit cube and may differ a little bit from a point cloud that was sampled from the original ShapeNet mesh.

Futher Information

Please also check out the following concurrent papers that have proposed similar ideas: