AdelaiDepth

This repo contains the projects: 'Virtual Normal', 'DiverseDepth', and '3D Scene Shape'. They aim to solve the monocular depth estimation, 3D scene reconstruction from single image problems.

1,102

149

1,102

View on GitHub

Top Related Projects

MiDaS

4,995

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

Quick Overview

AdelaiDepth is a GitHub repository that focuses on monocular depth estimation research. It contains implementations of various depth estimation models and techniques developed by the researchers at the University of Adelaide. The repository aims to provide a comprehensive toolkit for depth estimation tasks in computer vision.

Pros

Offers multiple state-of-the-art depth estimation models
Includes pre-trained models for easy use and evaluation
Provides detailed documentation and instructions for each model
Supports both PyTorch and TensorFlow implementations

Cons

Limited to monocular depth estimation tasks
May require significant computational resources for training and inference
Some models might be complex for beginners to understand and modify
Occasional updates may lead to compatibility issues with older versions

Code Examples

Loading a pre-trained model:

from lib.models.multi_depth_model_auxiv2 import RelDepthModel

model = RelDepthModel(backbone='resnext101')
model.load_state_dict(torch.load('pretrained_model.pth'))
model.eval()

Performing inference on an image:

import cv2
import torch

image = cv2.imread('input_image.jpg')
image = torch.from_numpy(image).unsqueeze(0).permute(0, 3, 1, 2).float()
with torch.no_grad():
    depth = model(image)

Visualizing the depth map:

import matplotlib.pyplot as plt

depth_map = depth.squeeze().cpu().numpy()
plt.imshow(depth_map, cmap='viridis')
plt.colorbar()
plt.show()

Getting Started

To get started with AdelaiDepth:

Clone the repository:

git clone https://github.com/aim-uofa/AdelaiDepth.git
cd AdelaiDepth

Install dependencies:
```
pip install -r requirements.txt
```
Download pre-trained models from the provided links in the repository.

Run inference on an image:

from lib.models.multi_depth_model_auxiv2 import RelDepthModel
import torch
import cv2

model = RelDepthModel(backbone='resnext101')
model.load_state_dict(torch.load('pretrained_model.pth'))
model.eval()

image = cv2.imread('input_image.jpg')
image = torch.from_numpy(image).unsqueeze(0).permute(0, 3, 1, 2).float()
with torch.no_grad():
    depth = model(image)

# Visualize or save the depth map

For more detailed instructions and model-specific usage, refer to the documentation in the repository.

Competitor Comparisons

MiDaS

4,995

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

Pros of MiDaS

More versatile with support for various input resolutions and aspect ratios
Faster inference time, especially with smaller models like MiDaS-small
Extensive pre-trained models available for different use cases

Cons of MiDaS

Less focus on high-resolution depth estimation compared to AdelaiDepth
May struggle with complex scenes or fine details in some cases
Limited options for custom training or fine-tuning

Code Comparison

MiDaS:

model_type = "DPT_Large"
midas = torch.hub.load("intel-isl/MiDaS", model_type)
midas.to(device)
midas.eval()

AdelaiDepth:

from lib.models.multi_depth_model_auxiv2 import RelDepthModel
depth_model = RelDepthModel(backbone='resnext101')
depth_model.eval()

Both repositories offer depth estimation solutions, but MiDaS provides more flexibility in terms of model sizes and input handling, while AdelaiDepth focuses on high-resolution depth maps. MiDaS is generally faster and easier to use out-of-the-box, whereas AdelaiDepth may offer better performance in scenarios requiring detailed depth information.

DPT

2,163

Dense Prediction Transformers

Pros of DPT

More efficient architecture with Vision Transformers (ViT)
Better performance on zero-shot transfer tasks
Supports a wider range of input resolutions

Cons of DPT

Requires more computational resources for training
Less flexibility in terms of backbone architectures
Fewer pre-trained models available for different tasks

Code Comparison

AdelaiDepth:

class DepthModel(nn.Module):
    def __init__(self, backbone='resnet50'):
        super(DepthModel, self).__init__()
        self.backbone = get_backbone(backbone)
        self.decoder = DepthDecoder()

    def forward(self, x):
        features = self.backbone(x)
        depth = self.decoder(features)
        return depth

DPT:

class DPTDepthModel(DPT):
    def __init__(self, path=None, non_negative=True, **kwargs):
        features = kwargs["features"] if "features" in kwargs else 256
        super().__init__(head=HeadDepth(features), **kwargs)

        self.scale_inv_depth = partial(scale_invariant_depth, non_negative=non_negative)

    def forward(self, x):
        return self.scale_inv_depth(super().forward(x))

The code comparison shows that DPT uses a more complex architecture based on Vision Transformers, while AdelaiDepth uses a more traditional encoder-decoder structure with a ResNet backbone. DPT also includes a scale-invariant depth calculation in its forward pass.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

AdelaiDepth

AdelaiDepth is an open source toolbox for monocular depth prediction. Relevant work from our group is open-sourced here.

AdelaiDepth contains the following algorithms:

3D Scene Shape (Best Paper Finalist): Code, Learning to Recover 3D Scene Shape from a Single Image
DiverseDepth: Code, Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction, DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data
Virtual Normal: Code, Enforcing geometric constraints of virtual normal for depth prediction
Depth Estimation Using Deep Convolutional Neural Fields: Code, Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields, TPAMI'16, CVPR'15

News:

[May. 31, 2022] Training code and data of LeReS project have been released.
[Feb. 13, 2022] Training code and data of DiverseDepth project have been released.
[Jun. 13, 2021] Our "Learning to Recover 3D Scene Shape from a Single Image" work is one of the CVPR'21 Best Paper Finalists.
[Jun. 6, 2021] We have made the training data of DiverseDepth available.

Results and Dataset Examples:

3D Scene Shape

You may want to check this video which provides a very brief introduction to the work:

RGB	Depth	Point Cloud

Depth

DiverseDepth

Results examplesï¼

Depth

DiverseDepth dataset examplesï¼

DiverseDepth dataset

BibTeX

@article{yin2022towards,
  title={Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image},
  author={Yin, Wei and Zhang, Jianming and Wang, Oliver and Niklaus, Simon and Chen, Simon and Liu, Yifan and Shen, Chunhua},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year={2022}
}

@inproceedings{Yin2019enforcing,
  title     = {Enforcing geometric constraints of virtual normal for depth prediction},
  author    = {Yin, Wei and Liu, Yifan and Shen, Chunhua and Yan, Youliang},
  booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
  year      = {2019}
}

@inproceedings{Wei2021CVPR,
  title     =  {Learning to Recover 3D Scene Shape from a Single Image},
  author    =  {Wei Yin and Jianming Zhang and Oliver Wang and Simon Niklaus and Long Mai and Simon Chen and Chunhua Shen},
  booktitle =  {Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR)},
  year      =  {2021}
}

@article{yin2021virtual,
  title   = {Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction},
  author  = {Yin, Wei and Liu, Yifan and Shen, Chunhua},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year    = {2021}
}

License

The 3D Scene Shape code is under a non-commercial license from Adobe Research. See the LICENSE file for details.

Other depth prediction projects are licensed under the 2-clause BSD License for non-commercial use -- see the LICENSE file for details. For commercial use, please contact Chunhua Shen.

Top Related Projects

MiDaS

4,995

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer, TPAMI 2022"

DPT

2,163

Dense Prediction Transformers

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot