Top Related Projects
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
LSD-SLAM
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
A Robust and Versatile Monocular Visual-Inertial State Estimator
edition 2 of the slambook
Quick Overview
The JakobEngel/dso repository contains the source code for a Direct Sparse Odometry (DSO) algorithm, which is a visual odometry and mapping system that uses a novel direct, sparse, and probabilistic approach to estimate the camera pose and 3D structure of a scene from a monocular video stream.
Pros
- Direct Approach: DSO uses a direct approach, which means it directly optimizes the pixel intensities in the image, rather than relying on feature extraction and matching. This can lead to more accurate and robust pose estimation.
- Sparse Representation: DSO uses a sparse representation of the scene, which reduces the computational complexity and memory requirements compared to dense methods.
- Probabilistic Modeling: DSO models the uncertainty in the camera pose and 3D structure using a probabilistic framework, which can improve the robustness of the system.
- Real-time Performance: DSO is designed to run in real-time on modern hardware, making it suitable for applications that require low-latency pose estimation.
Cons
- Specialized Hardware: DSO may require specialized hardware, such as a high-end GPU, to achieve real-time performance, which can limit its deployment in resource-constrained environments.
- Sensitivity to Lighting Conditions: Like other direct methods, DSO may be sensitive to changes in lighting conditions, which can affect the quality of the pose estimation.
- Lack of Loop Closure: DSO does not currently include a loop closure mechanism, which can lead to drift in the estimated trajectory over long-term operation.
- Limited Sensor Support: DSO is primarily designed for monocular video streams and may not work as well with other sensor modalities, such as stereo cameras or depth sensors.
Code Examples
// Initialize the DSO system
dso::System dsoSystem(settings);
// Add a new frame to the system
dso::FrameHessian* newFrame = dsoSystem.addNewFrame(image);
// Optimize the camera pose and 3D structure
dsoSystem.optimize();
// Get the current camera pose
Eigen::Matrix4f cameraPose = dsoSystem.getCurrentPose();
This code demonstrates the basic usage of the DSO system, including initializing the system, adding a new frame, and optimizing the camera pose and 3D structure.
// Set the camera parameters
dso::Settings settings;
settings.width = 640;
settings.height = 480;
settings.fx = 500.0;
settings.fy = 500.0;
settings.cx = 320.0;
settings.cy = 240.0;
settings.baseline = 0.1;
This code shows how to configure the camera parameters for the DSO system.
// Visualize the 3D reconstruction
dso::Viewer viewer(dsoSystem);
viewer.run();
This code demonstrates how to visualize the 3D reconstruction generated by the DSO system using the built-in viewer.
Getting Started
To get started with the DSO project, follow these steps:
- Clone the repository:
git clone https://github.com/JakobEngel/dso.git
-
Install the required dependencies, which include OpenCV, Eigen, and a C++11 compiler.
-
Build the project using CMake:
cd dso
mkdir build
cd build
cmake ..
make -j4
- Run the DSO system on a sample dataset:
./dso_dataset path/to/dataset/sequence
This will start the DSO system and display the 3D reconstruction in the viewer window. You can also use the DSO system with your own camera input by modifying the dso_dataset
command to use the appropriate input source.
For more detailed instructions and configuration options, please refer to the project's README file and documentation.
Competitor Comparisons
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Pros of ORB_SLAM2
- More robust in feature-rich environments due to feature-based approach
- Supports loop closure for improved accuracy in long trajectories
- Capable of relocalization if tracking is lost
Cons of ORB_SLAM2
- Less accurate in low-texture environments
- Higher computational requirements
- Requires careful parameter tuning for optimal performance
Code Comparison
ORB_SLAM2 (feature extraction):
cv::Mat im = imGray;
vector<cv::KeyPoint> vKeys;
cv::Mat Descriptors;
(*mpORBextractorLeft)(im, cv::Mat(), vKeys, Descriptors);
DSO (direct image alignment):
for(int i=0;i<pyrLevelsUsed;i++)
makeK(&Hcalib, i, pyrLevelsUsed, w[i], h[i], K[i]);
SE3 refToNew = lastToNew_out;
AffLight aff_g2l = aff_g2l_out;
ORB_SLAM2 focuses on extracting and matching visual features, while DSO performs direct image alignment without explicit feature extraction. This fundamental difference in approach affects their performance in various environments and computational requirements.
LSD-SLAM
Pros of LSD-SLAM
- Operates on direct image intensities, allowing for semi-dense mapping
- Capable of large-scale, consistent mapping with loop closures
- Provides more detailed reconstructions of the environment
Cons of LSD-SLAM
- Generally slower and more computationally intensive than DSO
- May struggle with rapid camera motions or highly dynamic scenes
- Requires more parameter tuning for optimal performance
Code Comparison
LSD-SLAM (C++):
void SlamSystem::trackFrame(Frame* newFrame, bool blockUntilMapped, float minResidual)
{
// Frame tracking logic
currentKeyFrame->trackFrame(newFrame, SE3(), minResidual);
// ...
}
DSO (C++):
void FullSystem::trackNewCoarse(FrameHessian* fh)
{
// Coarse tracking logic
Vec3 flowVecs = coarseTracker->makeCoarseDepthL0(fh);
// ...
}
Both systems use C++ and implement frame tracking, but DSO focuses on efficient, sparse direct tracking, while LSD-SLAM aims for semi-dense mapping with more comprehensive environmental reconstruction.
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
Pros of ORB_SLAM3
- Supports multi-map and multi-session capabilities, allowing for more robust localization across different environments
- Includes visual-inertial odometry, enhancing accuracy in dynamic scenes
- Offers real-time loop closing and relocalization features
Cons of ORB_SLAM3
- Higher computational complexity due to feature extraction and matching
- May struggle in low-texture environments where feature detection is challenging
- Requires careful parameter tuning for optimal performance
Code Comparison
ORB_SLAM3:
void ORBextractor::ComputeKeyPointsOctTree(vector<vector<KeyPoint>>& allKeypoints)
{
allKeypoints.resize(nlevels);
const float W = 30;
for (int level = 0; level < nlevels; ++level)
{
const int minBorderX = EDGE_THRESHOLD-3;
const int minBorderY = minBorderX;
const int maxBorderX = mvImagePyramid[level].cols-EDGE_THRESHOLD+3;
const int maxBorderY = mvImagePyramid[level].rows-EDGE_THRESHOLD+3;
DSO:
void FullSystem::trackNewCoarse(FrameHessian* fh)
{
assert(allFrameHistory.size() > 0);
// set pose initialization.
for(IOWrap::Output3DWrapper* ow : outputWrapper)
ow->pushLiveFrame(fh);
FrameHessian* lastF = coarseTracker->lastRef;
A Robust and Versatile Monocular Visual-Inertial State Estimator
Pros of VINS-Mono
- Integrates visual-inertial fusion for improved accuracy in challenging environments
- Supports loop closure for drift correction and global consistency
- Provides real-time performance on mobile devices
Cons of VINS-Mono
- More complex system architecture, potentially harder to set up and tune
- May have higher computational requirements due to additional sensor fusion
- Less suitable for purely vision-based scenarios where IMU data is unavailable
Code Comparison
VINS-Mono (C++):
void FeatureManager::setDepth(const VectorXd &x) {
int feature_index = -1;
for (auto &it_per_id : feature) {
it_per_id.used_num = it_per_id.feature_per_frame.size();
if (it_per_id.used_num < 4)
continue;
feature_index++;
it_per_id.estimated_depth = 1.0 / x(feature_index);
}
}
DSO (C++):
void FullSystem::makeNewTraces(FrameHessian* newFrame, float* gtDepth) {
pixelSelector->allowFast = true;
int numPointsTotal = pixelSelector->makeMaps(newFrame, selectionMap, settings.desiredImmatureDensity);
newFrame->pointHessians.reserve(numPointsTotal*1.2f);
newFrame->pointHessiansMarginalized.reserve(numPointsTotal*1.2f);
newFrame->pointHessiansOut.reserve(numPointsTotal*1.2f);
}
Both repositories use C++ and focus on visual odometry, but VINS-Mono incorporates IMU data for improved robustness, while DSO relies solely on visual information for a simpler, yet potentially less stable system in certain scenarios.
edition 2 of the slambook
Pros of slambook2
- Comprehensive educational resource for learning SLAM algorithms
- Covers a wide range of SLAM topics and techniques
- Includes practical examples and exercises for hands-on learning
Cons of slambook2
- May not be as optimized for performance as DSO
- Focuses more on teaching concepts rather than providing a production-ready SLAM system
- Potentially less suitable for direct deployment in real-world applications
Code Comparison
DSO (Direct Sparse Odometry):
void FullSystem::trackNewCoarse(FrameHessian* fh) {
assert(allFrameHistory.size() > 0);
// ... (tracking implementation)
}
slambook2:
void Frontend::Track() {
cv::Mat img_left = current_frame_->left_img_;
cv::Mat img_right = current_frame_->right_img_;
// ... (tracking implementation)
}
Both repositories implement tracking functions, but DSO focuses on direct sparse methods, while slambook2 provides a more general approach to tracking in a stereo setup. DSO's implementation is likely more optimized for performance, while slambook2's code is designed to be more educational and easier to understand for beginners learning SLAM concepts.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
DSO: Direct Sparse Odometry
For more information see https://vision.in.tum.de/dso
1. Related Papers
- Direct Sparse Odometry, J. Engel, V. Koltun, D. Cremers, In arXiv:1607.02565, 2016
- A Photometrically Calibrated Benchmark For Monocular Visual Odometry, J. Engel, V. Usenko, D. Cremers, In arXiv:1607.02555, 2016
Get some datasets from https://vision.in.tum.de/mono-dataset .
2. Installation
git clone https://github.com/JakobEngel/dso.git
2.1 Required Dependencies
suitesparse and eigen3 (required).
Required. Install with
sudo apt-get install libsuitesparse-dev libeigen3-dev libboost-all-dev
2.2 Optional Dependencies
OpenCV (highly recommended).
Used to read / write / display images.
OpenCV is only used in IOWrapper/OpenCV/*
. Without OpenCV, respective
dummy functions from IOWrapper/*_dummy.cpp
will be compiled into the library, which do nothing.
The main binary will not be created, since it is useless if it can't read the datasets from disk.
Feel free to implement your own version of these functions with your prefered library,
if you want to stay away from OpenCV.
Install with
sudo apt-get install libopencv-dev
Pangolin (highly recommended).
Used for 3D visualization & the GUI.
Pangolin is only used in IOWrapper/Pangolin/*
. You can compile without Pangolin,
however then there is not going to be any visualization / GUI capability.
Feel free to implement your own version of Output3DWrapper
with your preferred library,
and use it instead of PangolinDSOViewer
Install from https://github.com/stevenlovegrove/Pangolin
ziplib (recommended).
Used to read datasets with images as .zip, as e.g. in the TUM monoVO dataset. You can compile without this, however then you can only read images directly (i.e., have to unzip the dataset image archives before loading them).
sudo apt-get install zlib1g-dev
cd dso/thirdparty
tar -zxvf libzip-1.1.1.tar.gz
cd libzip-1.1.1/
./configure
make
sudo make install
sudo cp lib/zipconf.h /usr/local/include/zipconf.h # (no idea why that is needed).
sse2neon (required for ARM builds).
After cloning, just run git submodule update --init
to include this. It translates Intel-native SSE functions to ARM-native NEON functions during the compilation process.
2.3 Build
cd dso
mkdir build
cd build
cmake ..
make -j4
this will compile a library libdso.a
, which can be linked from external projects.
It will also build a binary dso_dataset
, to run DSO on datasets. However, for this
OpenCV and Pangolin need to be installed.
3 Usage
Run on a dataset from https://vision.in.tum.de/mono-dataset using
bin/dso_dataset \
files=XXXXX/sequence_XX/images.zip \
calib=XXXXX/sequence_XX/camera.txt \
gamma=XXXXX/sequence_XX/pcalib.txt \
vignette=XXXXX/sequence_XX/vignette.png \
preset=0 \
mode=0
See https://github.com/JakobEngel/dso_ros for a minimal example on how the library can be used from another project. It should be straight forward to implement extentions for other camera drivers, to use DSO interactively without ROS.
3.1 Dataset Format.
The format assumed is that of https://vision.in.tum.de/mono-dataset. However, it should be easy to adapt it to your needs, if required. The binary is run with:
-
files=XXX
where XXX is either a folder or .zip archive containing images. They are sorted alphabetically. for .zip to work, need to comiple with ziplib support. -
gamma=XXX
where XXX is a gamma calibration file, containing a single row with 256 values, mapping [0..255] to the respective irradiance value, i.e. containing the discretized inverse response function. See TUM monoVO dataset for an example. -
vignette=XXX
where XXX is a monochrome 16bit or 8bit image containing the vignette as pixelwise attenuation factors. See TUM monoVO dataset for an example. -
calib=XXX
where XXX is a geometric camera calibration file. See below.
Geometric Calibration File.
Calibration File for Pre-Rectified Images
Pinhole fx fy cx cy 0
in_width in_height
"crop" / "full" / "none" / "fx fy cx cy 0"
out_width out_height
Calibration File for FOV camera model:
FOV fx fy cx cy omega
in_width in_height
"crop" / "full" / "fx fy cx cy 0"
out_width out_height
Calibration File for Radio-Tangential camera model
RadTan fx fy cx cy k1 k2 r1 r2
in_width in_height
"crop" / "full" / "fx fy cx cy 0"
out_width out_height
Calibration File for Equidistant camera model
EquiDistant fx fy cx cy k1 k2 k3 k4
in_width in_height
"crop" / "full" / "fx fy cx cy 0"
out_width out_height
(note: for backwards-compatibility, "Pinhole", "FOV" and "RadTan" can be omitted). See the respective
::distortCoordinates
implementation in Undistorter.cpp
for the exact corresponding projection function.
Furthermore, it should be straight-forward to implement other camera models.
Explanation:
Across all models fx fy cx cy
denotes the focal length / principal point relative to the image width / height,
i.e., DSO computes the camera matrix K
as
K(0,0) = width * fx
K(1,1) = height * fy
K(0,2) = width * cx - 0.5
K(1,2) = height * cy - 0.5
For backwards-compatibility, if the given cx
and cy
are larger than 1, DSO assumes all four parameters to directly be the entries of K,
and ommits the above computation.
That strange "0.5" offset:
Internally, DSO uses the convention that the pixel at integer position (1,1) in the image, i.e. the pixel in the second row and second column,
contains the integral over the continuous image function from (0.5,0.5) to (1.5,1.5), i.e., approximates a "point-sample" of the
continuous image functions at (1.0, 1.0).
In turn, there seems to be no unifying convention across calibration toolboxes whether the pixel at integer position (1,1)
contains the integral over (0.5,0.5) to (1.5,1.5), or the integral over (1,1) to (2,2). The above conversion assumes that
the given calibration in the calibration file uses the latter convention, and thus applies the -0.5 correction.
Note that this also is taken into account when creating the scale-pyramid (see globalCalib.cpp
).
Rectification modes:
For image rectification, DSO either supports rectification to a user-defined pinhole model (fx fy cx cy 0
),
or has an option to automatically crop the image to the maximal rectangular, well-defined region (crop
).
full
will preserve the full original field of view and is mainly meant for debugging - it will create black
borders in undefined image regions, which DSO does NOT ignore (i.e., this option will generate additional
outliers along those borders, and corrupt the scale-pyramid).
3.2 Commandline Options
there are many command line options available, see main_dso_pangolin.cpp
. some examples include
-
mode=X
:mode=0
use iff a photometric calibration exists (e.g. TUM monoVO dataset).mode=1
use iff NO photometric calibration exists (e.g. ETH EuRoC MAV dataset).mode=2
use iff images are not photometrically distorted (e.g. synthetic datasets).
-
preset=X
preset=0
: default settings (2k pts etc.), not enforcing real-time executionpreset=1
: default settings (2k pts etc.), enforcing 1x real-time executionpreset=2
: fast settings (800 pts etc.), not enforcing real-time execution. WARNING: overwrites image resolution with 424 x 320.preset=3
: fast settings (800 pts etc.), enforcing 5x real-time execution. WARNING: overwrites image resolution with 424 x 320.
-
nolog=1
: disable logging of eigenvalues etc. (good for performance) -
reverse=1
: play sequence in reverse -
nogui=1
: disable gui (good for performance) -
nomt=1
: single-threaded execution -
prefetch=1
: load into memory & rectify all images before running DSO. -
start=X
: start at frame X -
end=X
: end at frame X -
speed=X
: force execution at X times real-time speed (0 = not enforcing real-time) -
save=1
: save lots of images for video creation -
quiet=1
: disable most console output (good for performance) -
sampleoutput=1
: register a "SampleOutputWrapper", printing some sample output data to the commandline. meant as example.
3.3 Runtime Options
Some parameters can be reconfigured from the Pangolin GUI at runtime. Feel free to add more.
3.4 Accessing Data.
The easiest way to access the Data (poses, pointclouds, etc.) computed by DSO (in real-time)
is to create your own Output3DWrapper
, and add it to the system, i.e., to FullSystem.outputWrapper
.
The respective member functions will be called on various occations (e.g., when a new KF is created,
when a new frame is tracked, etc.), exposing the relevant data.
See IOWrapper/Output3DWrapper.h
for a description of the different callbacks available,
and some basic notes on where to find which data in the used classes.
See IOWrapper/OutputWrapper/SampleOutputWrapper.h
for an example implementation, which just prints
some example data to the commandline (use the options sampleoutput=1 quiet=1
to see the result).
Note that these callbacks block the respective DSO thread, thus expensive computations should not be performed in the callbacks, a better practice is to just copy over / publish / output the data you need.
Per default, dso_dataset
writes all keyframe poses to a file result.txt
at the end of a sequence,
using the TUM RGB-D / TUM monoVO format ([timestamp x y z qx qy qz qw] of the cameraToWorld transformation).
3.5 Notes
- the initializer is very slow, and does not work very reliably. Maybe replace by your own way to get an initialization.
- see https://github.com/JakobEngel/dso_ros for a minimal example project on how to use the library with your own input / output procedures.
- see
settings.cpp
for a LOT of settings parameters. Most of which you shouldn't touch. setGlobalCalib(...)
needs to be called once before anything is initialized, and globally sets the camera intrinsics and video resolution for convenience. probably not the most portable way of doing this though.
4 General Notes for Good Results
Accurate Geometric Calibration
-
Please have a look at Chapter 4.3 from the DSO paper, in particular Figure 20 (Geometric Noise). Direct approaches suffer a LOT from bad geometric calibrations: Geometric distortions of 1.5 pixel already reduce the accuracy by factor 10.
-
Do not use a rolling shutter camera, the geometric distortions from a rolling shutter camera are huge. Even for high frame-rates (over 60fps).
-
Note that the reprojection RMSE reported by most calibration tools is the reprojection RMSE on the "training data", i.e., overfitted to the the images you used for calibration. If it is low, that does not imply that your calibration is good, you may just have used insufficient images.
-
try different camera / distortion models, not all lenses can be modelled by all models.
Photometric Calibration
Use a photometric calibration (e.g. using https://github.com/tum-vision/mono_dataset_code ).
Translation vs. Rotation
DSO cannot do magic: if you rotate the camera too much without translation, it will fail. Since it is a pure visual odometry, it cannot recover by re-localizing, or track through strong rotations by using previously triangulated geometry.... everything that leaves the field of view is marginalized immediately.
Computation Speed
If your computer is slow, try to use "fast" settings. Or run DSO on a dataset, without enforcing real-time.
Initialization
The current initializer is not very good... it is very slow and occasionally fails. Make sure, the initial camera motion is slow and "nice" (i.e., a lot of translation and little rotation) during initialization. Possibly replace by your own initializer.
5 License
DSO was developed at the Technical University of Munich and Intel. The open-source version is licensed under the GNU General Public License Version 3 (GPLv3). For commercial purposes, we also offer a professional version, see http://vision.in.tum.de/dso for details.
Top Related Projects
Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
LSD-SLAM
ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM
A Robust and Versatile Monocular Visual-Inertial State Estimator
edition 2 of the slambook
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot