Top Related Projects
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
A Robust and Versatile Monocular Visual-Inertial State Estimator
Quick Overview
DROID-SLAM is an open-source visual SLAM (Simultaneous Localization and Mapping) system developed by Princeton Vision & Learning Lab. It uses deep learning techniques to perform real-time dense reconstruction and tracking, offering a novel approach to visual odometry and 3D mapping.
Pros
- High accuracy and robustness in challenging environments
- Real-time performance on consumer-grade hardware
- Ability to handle both indoor and outdoor scenes
- Integrates deep learning with traditional SLAM techniques
Cons
- Requires a GPU for optimal performance
- May have higher computational requirements compared to traditional SLAM methods
- Limited documentation for advanced customization
- Dependency on specific deep learning frameworks
Code Examples
- Initializing DROID-SLAM:
from droid_slam import DROID
# Initialize DROID-SLAM
slam = DROID(device="cuda:0")
- Processing a video sequence:
import cv2
# Open video file
video = cv2.VideoCapture("path/to/video.mp4")
while True:
ret, frame = video.read()
if not ret:
break
# Process frame with DROID-SLAM
slam.track(frame)
# Finalize reconstruction
poses, points, colors = slam.get_map()
- Visualizing the reconstructed map:
import open3d as o3d
# Create point cloud
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(points)
pcd.colors = o3d.utility.Vector3dVector(colors / 255.0)
# Visualize
o3d.visualization.draw_geometries([pcd])
Getting Started
- Install dependencies:
pip install torch torchvision opencv-python open3d
- Clone the repository:
git clone https://github.com/princeton-vl/DROID-SLAM.git
cd DROID-SLAM
- Install DROID-SLAM:
pip install -e .
- Download pre-trained weights:
wget https://github.com/princeton-vl/DROID-SLAM/releases/download/v1.0/droid.pth
- Run DROID-SLAM on a video:
from droid_slam import DROID
import cv2
slam = DROID(device="cuda:0")
video = cv2.VideoCapture("path/to/video.mp4")
while True:
ret, frame = video.read()
if not ret:
break
slam.track(frame)
poses, points, colors = slam.get_map()
Competitor Comparisons
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Pros of ORB_SLAM2
- Well-established and widely used in the robotics community
- Efficient feature-based SLAM algorithm with loop closure
- Supports monocular, stereo, and RGB-D cameras
Cons of ORB_SLAM2
- May struggle in low-texture environments
- Requires careful parameter tuning for optimal performance
- Less robust to dynamic objects in the scene
Code Comparison
ORB_SLAM2 (C++):
// Feature extraction and matching
void Frame::ExtractORB(int flag, const cv::Mat &im)
{
(*mpORBextractor)(im,cv::Mat(),mvKeys,mDescriptors);
}
DROID-SLAM (Python):
# Feature extraction using deep learning
def extract_features(image):
return self.feature_net(image)
DROID-SLAM uses deep learning for feature extraction, potentially offering more robust performance in challenging environments. ORB_SLAM2 relies on traditional computer vision techniques, which can be faster but may be less adaptable to diverse scenes.
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
Pros of ORB_SLAM3
- Well-established and widely used in the SLAM community
- Supports various sensor inputs (monocular, stereo, RGB-D)
- Real-time performance on CPU
Cons of ORB_SLAM3
- Relies on handcrafted features, which may not be optimal for all environments
- Can struggle in low-texture or dynamic scenes
Code Comparison
ORB_SLAM3:
// Feature extraction and matching
void Frame::ExtractORB(int flag, const cv::Mat &im)
{
(*mpORBextractorLeft)(im, cv::Mat(), mvKeys, mDescriptors);
}
DROID-SLAM:
# Feature extraction using deep learning
def extract_features(self, images):
return self.fnet(images)
ORB_SLAM3 uses traditional computer vision techniques for feature extraction, while DROID-SLAM leverages deep learning for this task. This fundamental difference affects their performance in various scenarios and computational requirements.
A Robust and Versatile Monocular Visual-Inertial State Estimator
Pros of VINS-Mono
- Utilizes both visual and inertial data for more robust state estimation
- Designed specifically for aerial robotics applications
- Includes loop closure for improved accuracy in long-term operation
Cons of VINS-Mono
- May have higher computational requirements due to fusion of multiple sensor inputs
- Potentially more complex setup and calibration process
- Less suitable for pure visual odometry scenarios
Code Comparison
VINS-Mono (C++):
void FeatureManager::setDepth(const VectorXd &x)
{
int feature_index = -1;
for (auto &it_per_id : feature)
{
it_per_id.used_num = it_per_id.feature_per_frame.size();
if (!(it_per_id.used_num >= 2 && it_per_id.start_frame < WINDOW_SIZE - 2))
continue;
it_per_id.estimated_depth = 1.0 / x(++feature_index);
}
}
DROID-SLAM (Python):
def update(self, tstamp, image, depth=None, intrinsics=None):
if intrinsics is not None:
self.video.intrinsics[self.fidx] = intrinsics
self.video.poses[self.fidx] = self.video.poses[self.fidx-1].clone()
self.video.images[self.fidx] = image
self.video.timestamps[self.fidx] = tstamp
self.fidx += 1
return self.fidx - 1
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
DROID-SLAM
DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras
Zachary Teed and Jia Deng
@article{teed2021droid,
title={{DROID-SLAM: Deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras}},
author={Teed, Zachary and Deng, Jia},
journal={Advances in neural information processing systems},
year={2021}
}
Initial Code Release: This repo currently provides a single GPU implementation of our monocular, stereo, and RGB-D SLAM systems. It currently contains demos, training, and evaluation scripts.
Requirements
To run the code you will need ...
-
Inference: Running the demos will require a GPU with at least 11G of memory.
-
Training: Training requires a GPU with at least 24G of memory. We train on 4 x RTX-3090 GPUs.
Getting Started
- Clone the repo using the
--recursive
flag
git clone --recursive https://github.com/princeton-vl/DROID-SLAM.git
- Creating a new anaconda environment using the provided .yaml file. Use
environment_novis.yaml
to if you do not want to use the visualization
conda env create -f environment.yaml
pip install evo --upgrade --no-binary evo
pip install gdown
- Compile the extensions (takes about 10 minutes)
python setup.py install
Demos
-
Download the model from google drive: droid.pth
-
Download some sample videos using the provided script.
./tools/download_sample_data.sh
Run the demo on any of the samples (all demos can be run on a GPU with 11G of memory). While running, press the "s" key to increase the filtering threshold (= more points) and "a" to decrease the filtering threshold (= fewer points). To save the reconstruction with full resolution depth maps use the --reconstruction_path
flag.
python demo.py --imagedir=data/abandonedfactory --calib=calib/tartan.txt --stride=2
python demo.py --imagedir=data/sfm_bench/rgb --calib=calib/eth.txt
python demo.py --imagedir=data/Barn --calib=calib/barn.txt --stride=1 --backend_nms=4
python demo.py --imagedir=data/mav0/cam0/data --calib=calib/euroc.txt --t0=150
python demo.py --imagedir=data/rgbd_dataset_freiburg3_cabinet/rgb --calib=calib/tum3.txt
Running on your own data: All you need is a calibration file. Calibration files are in the form
fx fy cx cy [k1 k2 p1 p2 [ k3 [ k4 k5 k6 ]]]
with parameters in brackets optional.
Evaluation
We provide evaluation scripts for TartanAir, EuRoC, and TUM. EuRoC and TUM can be run on a 1080Ti. The TartanAir and ETH will require 24G of memory.
TartanAir (Mono + Stereo)
Download the TartanAir dataset using the script thirdparty/tartanair_tools/download_training.py
and put them in datasets/TartanAir
./tools/validate_tartanair.sh --plot_curve # monocular eval
./tools/validate_tartanair.sh --plot_curve --stereo # stereo eval
EuRoC (Mono + Stereo)
Download the EuRoC sequences (ASL format) and put them in datasets/EuRoC
./tools/evaluate_euroc.sh # monocular eval
./tools/evaluate_euroc.sh --stereo # stereo eval
TUM-RGBD (Mono)
Download the fr1 sequences from TUM-RGBD and put them in datasets/TUM-RGBD
./tools/evaluate_tum.sh # monocular eval
ETH3D (RGB-D)
Download the ETH3D dataset
./tools/evaluate_eth3d.sh # RGB-D eval
Training
First download the TartanAir dataset. The download script can be found in thirdparty/tartanair_tools/download_training.py
. You will only need the rgb
and depth
data.
python download_training.py --rgb --depth
You can then run the training script. We use 4x3090 RTX GPUs for training which takes approximatly 1 week. If you use a different number of GPUs, adjust the learning rate accordingly.
Note: On the first training run, covisibility is computed between all pairs of frames. This can take several hours, but the results are cached so that future training runs will start immediately.
python train.py --datapath=<path to tartanair> --gpus=4 --lr=0.00025
Acknowledgements
Data from TartanAir was used to train our model. We additionally use evaluation tools from evo and tartanair_tools.
Top Related Projects
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
A Robust and Versatile Monocular Visual-Inertial State Estimator
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot