SfMLearner
An unsupervised learning framework for depth and ego-motion estimation from monocular videos
Top Related Projects
Pytorch version of SfmLearner from Tinghui Zhou et al.
Quick Overview
SfMLearner is an unsupervised learning framework for depth and ego-motion estimation from monocular videos. It implements the method described in the paper "Unsupervised Learning of Depth and Ego-Motion from Video" by Zhou et al. The project provides a TensorFlow implementation of the SfMLearner model, which can predict depth and camera motion from unlabeled video sequences.
Pros
- Unsupervised learning approach, eliminating the need for ground truth depth or pose data
- Capable of estimating both depth and ego-motion simultaneously
- Provides pre-trained models for quick testing and evaluation
- Includes data preparation scripts for popular datasets like KITTI
Cons
- Limited to monocular video sequences, not suitable for stereo or multi-view setups
- Performance may vary depending on the quality and characteristics of input video
- Requires significant computational resources for training
- May struggle with dynamic objects or scenes with significant occlusions
Code Examples
- Loading a pre-trained model:
from SfMLearner import SfMLearner
model = SfMLearner()
model.setup_inference(img_height=128, img_width=416, mode='depth')
model.load_model('models/model-190532')
- Predicting depth for a single image:
import cv2
import numpy as np
img = cv2.imread('path/to/image.jpg')
img = cv2.resize(img, (416, 128))
input_img = np.expand_dims(img, axis=0)
predicted_depth = model.inference(input_img)
- Estimating ego-motion between two frames:
img1 = cv2.imread('path/to/frame1.jpg')
img2 = cv2.imread('path/to/frame2.jpg')
img1 = cv2.resize(img1, (416, 128))
img2 = cv2.resize(img2, (416, 128))
input_imgs = np.stack([img1, img2], axis=0)
predicted_pose = model.inference(input_imgs, mode='pose')
Getting Started
-
Clone the repository:
git clone https://github.com/tinghuiz/SfMLearner.git cd SfMLearner
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models:
bash ./models/download_model.sh
-
Run inference on a sample image:
python inference.py --img_path path/to/image.jpg --output_dir ./output
Competitor Comparisons
Pytorch version of SfmLearner from Tinghui Zhou et al.
Pros of SfmLearner-Pytorch
- Implemented in PyTorch, offering better flexibility and ease of use for many researchers
- More active development and maintenance, with recent updates and contributions
- Includes additional features like custom transforms and improved data loading
Cons of SfmLearner-Pytorch
- May have slightly different results compared to the original TensorFlow implementation
- Requires familiarity with PyTorch, which might be a learning curve for some users
- Some reported issues with CUDA memory management in certain scenarios
Code Comparison
SfMLearner (TensorFlow):
def build_model(self):
with tf.variable_scope(self.name):
self.build_inference_model()
self.build_loss()
self.collect_summaries()
SfmLearner-Pytorch:
class SfmLearner(nn.Module):
def __init__(self, options):
super(SfmLearner, self).__init__()
self.pose_net = PoseExpNet(options.num_scales)
self.disp_net = DispNetS(options.num_scales)
The main difference in the code structure is the use of PyTorch's nn.Module
in SfmLearner-Pytorch, which provides a more object-oriented approach compared to TensorFlow's functional style in SfMLearner. This can lead to more intuitive model definition and easier customization for many developers.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
SfMLearner
This codebase implements the system described in the paper:
Unsupervised Learning of Depth and Ego-Motion from Video
Tinghui Zhou, Matthew Brown, Noah Snavely, David G. Lowe
In CVPR 2017 (Oral).
See the project webpage for more details. Please contact Tinghui Zhou (tinghuiz@berkeley.edu) if you have any questions.
Prerequisites
This codebase was developed and tested with Tensorflow 1.0, CUDA 8.0 and Ubuntu 16.04.
Running the single-view depth demo
We provide the demo code for running our single-view depth prediction model. First, download the pre-trained model from this Google Drive, and put the model files under models/
. Then you can use the provided ipython-notebook demo.ipynb
to run the demo.
Preparing training data
In order to train the model using the provided code, the data needs to be formatted in a certain manner.
For KITTI, first download the dataset using this script provided on the official website, and then run the following command
python data/prepare_train_data.py --dataset_dir=/path/to/raw/kitti/dataset/ --dataset_name='kitti_raw_eigen' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=128 --num_threads=4
For the pose experiments, we used the KITTI odometry split, which can be downloaded here. Then you can change --dataset_name
option to kitti_odom
when preparing the data.
For Cityscapes, download the following packages: 1) leftImg8bit_sequence_trainvaltest.zip
, 2) camera_trainvaltest.zip
. Then run the following command
python data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/resulting/formatted/data/ --seq_length=3 --img_width=416 --img_height=171 --num_threads=4
Notice that for Cityscapes the img_height
is set to 171 because we crop out the bottom part of the image that contains the car logo, and the resulting image will have height 128.
Training
Once the data are formatted following the above instructions, you should be able to train the model by running the following command
python train.py --dataset_dir=/path/to/the/formatted/data/ --checkpoint_dir=/where/to/store/checkpoints/ --img_width=416 --img_height=128 --batch_size=4
You can then start a tensorboard
session by
tensorboard --logdir=/path/to/tensorflow/log/files --port=8888
and visualize the training progress by opening https://localhost:8888 on your browser. If everything is set up properly, you should start seeing reasonable depth prediction after ~100K iterations when training on KITTI.
Notes
After adding data augmentation and removing batch normalization (along with some other minor tweaks), we have been able to train depth models better than what was originally reported in the paper even without using additional Cityscapes data or the explainability regularization. The provided pre-trained model was trained on KITTI only with smooth weight set to 0.5, and achieved the following performance on the Eigen test split (Table 1 of the paper):
Abs Rel | Sq Rel | RMSE | RMSE(log) | Acc.1 | Acc.2 | Acc.3 |
---|---|---|---|---|---|---|
0.183 | 1.595 | 6.709 | 0.270 | 0.734 | 0.902 | 0.959 |
When trained on 5-frame snippets, the pose model obtains the following performanace on the KITTI odometry split (Table 3 of the paper):
Seq. 09 | Seq. 10 |
---|---|
0.016 (std. 0.009) | 0.013 (std. 0.009) |
Evaluation on KITTI
Depth
We provide evaluation code for the single-view depth experiment on KITTI. First, download our predictions (~140MB) from this Google Drive and put them into kitti_eval/
.
Then run
python kitti_eval/eval_depth.py --kitti_dir=/path/to/raw/kitti/dataset/ --pred_file=kitti_eval/kitti_eigen_depth_predictions.npy
If everything runs properly, you should get the numbers for Ours(CS+K)
in Table 1 of the paper. To get the numbers for Ours cap 50m (CS+K)
, set an additional flag --max_depth=50
when executing the above command.
Pose
We provide evaluation code for the pose estimation experiment on KITTI. First, download the predictions and ground-truth pose data from this Google Drive.
Notice that all the predictions and ground-truth are 5-frame snippets with the format of timestamp tx ty tz qx qy qz qw
consistent with the TUM evaluation toolkit. Then you could run
python kitti_eval/eval_pose.py --gtruth_dir=/directory/of/groundtruth/trajectory/files/ --pred_dir=/directory/of/predicted/trajectory/files/
to obtain the results reported in Table 3 of the paper. For instance, to get the results of Ours
for Seq. 10
you could run
python kitti_eval/eval_pose.py --gtruth_dir=kitti_eval/pose_data/ground_truth/10/ --pred_dir=kitti_eval/pose_data/ours_results/10/
KITTI Testing code
Depth
Once you have model trained, you can obtain the single-view depth predictions on the KITTI eigen test split formatted properly for evaluation by running
python test_kitti_depth.py --dataset_dir /path/to/raw/kitti/dataset/ --output_dir /path/to/output/directory --ckpt_file /path/to/pre-trained/model/file/
Pose
We also provide sample testing code for obtaining pose predictions on the KITTI dataset with a pre-trained model. You can obtain the predictions formatted as above for pose evaluation by running
python test_kitti_pose.py --test_seq [sequence_id] --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file /path/to/pre-trained/model/file/
A sample model trained on 5-frame snippets can be downloaded at this Google Drive.
Then you can obtain predictions on, say Seq. 9
, by running
python test_kitti_pose.py --test_seq 9 --dataset_dir /path/to/KITTI/odometry/set/ --output_dir /path/to/output/directory/ --ckpt_file models/model-100280
Other implementations
Pytorch (by Clement Pinard)
Disclaimer
This is the authors' implementation of the system described in the paper and not an official Google product.
Top Related Projects
Pytorch version of SfmLearner from Tinghui Zhou et al.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot