dso

Direct Sparse Odometry

2,372

917

2,372

137

View on GitHub

Top Related Projects

ORB_SLAM2

9,851

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

ORB_SLAM3

7,407

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

VINS-Mono

5,380

A Robust and Versatile Monocular Visual-Inertial State Estimator

Quick Overview

The JakobEngel/dso repository contains the source code for a Direct Sparse Odometry (DSO) algorithm, which is a visual odometry and mapping system that uses a novel direct, sparse, and probabilistic approach to estimate the camera pose and 3D structure of a scene from a monocular video stream.

Pros

Direct Approach: DSO uses a direct approach, which means it directly optimizes the pixel intensities in the image, rather than relying on feature extraction and matching. This can lead to more accurate and robust pose estimation.
Sparse Representation: DSO uses a sparse representation of the scene, which reduces the computational complexity and memory requirements compared to dense methods.
Probabilistic Modeling: DSO models the uncertainty in the camera pose and 3D structure using a probabilistic framework, which can improve the robustness of the system.
Real-time Performance: DSO is designed to run in real-time on modern hardware, making it suitable for applications that require low-latency pose estimation.

Cons

Specialized Hardware: DSO may require specialized hardware, such as a high-end GPU, to achieve real-time performance, which can limit its deployment in resource-constrained environments.
Sensitivity to Lighting Conditions: Like other direct methods, DSO may be sensitive to changes in lighting conditions, which can affect the quality of the pose estimation.
Lack of Loop Closure: DSO does not currently include a loop closure mechanism, which can lead to drift in the estimated trajectory over long-term operation.
Limited Sensor Support: DSO is primarily designed for monocular video streams and may not work as well with other sensor modalities, such as stereo cameras or depth sensors.

Code Examples

// Initialize the DSO system
dso::System dsoSystem(settings);

// Add a new frame to the system
dso::FrameHessian* newFrame = dsoSystem.addNewFrame(image);

// Optimize the camera pose and 3D structure
dsoSystem.optimize();

// Get the current camera pose
Eigen::Matrix4f cameraPose = dsoSystem.getCurrentPose();

This code demonstrates the basic usage of the DSO system, including initializing the system, adding a new frame, and optimizing the camera pose and 3D structure.

// Set the camera parameters
dso::Settings settings;
settings.width = 640;
settings.height = 480;
settings.fx = 500.0;
settings.fy = 500.0;
settings.cx = 320.0;
settings.cy = 240.0;
settings.baseline = 0.1;

This code shows how to configure the camera parameters for the DSO system.

// Visualize the 3D reconstruction
dso::Viewer viewer(dsoSystem);
viewer.run();

This code demonstrates how to visualize the 3D reconstruction generated by the DSO system using the built-in viewer.

Getting Started

To get started with the DSO project, follow these steps:

Clone the repository:

git clone https://github.com/JakobEngel/dso.git

Install the required dependencies, which include OpenCV, Eigen, and a C++11 compiler.
Build the project using CMake:

cd dso
mkdir build
cd build
cmake ..
make -j4

Run the DSO system on a sample dataset:

./dso_dataset path/to/dataset/sequence

This will start the DSO system and display the 3D reconstruction in the viewer window. You can also use the DSO system with your own camera input by modifying the dso_dataset command to use the appropriate input source.

For more detailed instructions and configuration options, please refer to the project's README file and documentation.

Competitor Comparisons

ORB_SLAM2

9,851

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

Pros of ORB_SLAM2

More robust in feature-rich environments due to feature-based approach
Supports loop closure for improved accuracy in long trajectories
Capable of relocalization if tracking is lost

Cons of ORB_SLAM2

Less accurate in low-texture environments
Higher computational requirements
Requires careful parameter tuning for optimal performance

Code Comparison

ORB_SLAM2 (feature extraction):

cv::Mat im = imGray;
vector<cv::KeyPoint> vKeys;
cv::Mat Descriptors;
(*mpORBextractorLeft)(im, cv::Mat(), vKeys, Descriptors);

DSO (direct image alignment):

for(int i=0;i<pyrLevelsUsed;i++)
    makeK(&Hcalib, i, pyrLevelsUsed, w[i], h[i], K[i]);
SE3 refToNew = lastToNew_out;
AffLight aff_g2l = aff_g2l_out;

ORB_SLAM2 focuses on extracting and matching visual features, while DSO performs direct image alignment without explicit feature extraction. This fundamental difference in approach affects their performance in various environments and computational requirements.

lsd_slam

2,671

LSD-SLAM

Pros of LSD-SLAM

Operates on direct image intensities, allowing for semi-dense mapping
Capable of large-scale, consistent mapping with loop closures
Provides more detailed reconstructions of the environment

Cons of LSD-SLAM

Generally slower and more computationally intensive than DSO
May struggle with rapid camera motions or highly dynamic scenes
Requires more parameter tuning for optimal performance

Code Comparison

LSD-SLAM (C++):

void SlamSystem::trackFrame(Frame* newFrame, bool blockUntilMapped, float minResidual)
{
    // Frame tracking logic
    currentKeyFrame->trackFrame(newFrame, SE3(), minResidual);
    // ...
}

DSO (C++):

void FullSystem::trackNewCoarse(FrameHessian* fh)
{
    // Coarse tracking logic
    Vec3 flowVecs = coarseTracker->makeCoarseDepthL0(fh);
    // ...
}

Both systems use C++ and implement frame tracking, but DSO focuses on efficient, sparse direct tracking, while LSD-SLAM aims for semi-dense mapping with more comprehensive environmental reconstruction.

ORB_SLAM3

7,407

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial and Multi-Map SLAM

Pros of ORB_SLAM3

Supports multi-map and multi-session capabilities, allowing for more robust localization across different environments
Includes visual-inertial odometry, enhancing accuracy in dynamic scenes
Offers real-time loop closing and relocalization features

Cons of ORB_SLAM3

Higher computational complexity due to feature extraction and matching
May struggle in low-texture environments where feature detection is challenging
Requires careful parameter tuning for optimal performance

Code Comparison

ORB_SLAM3:

void ORBextractor::ComputeKeyPointsOctTree(vector<vector<KeyPoint>>& allKeypoints)
{
    allKeypoints.resize(nlevels);
    const float W = 30;
    for (int level = 0; level < nlevels; ++level)
    {
        const int minBorderX = EDGE_THRESHOLD-3;
        const int minBorderY = minBorderX;
        const int maxBorderX = mvImagePyramid[level].cols-EDGE_THRESHOLD+3;
        const int maxBorderY = mvImagePyramid[level].rows-EDGE_THRESHOLD+3;

DSO:

void FullSystem::trackNewCoarse(FrameHessian* fh)
{
    assert(allFrameHistory.size() > 0);
    // set pose initialization.
    for(IOWrap::Output3DWrapper* ow : outputWrapper)
        ow->pushLiveFrame(fh);

    FrameHessian* lastF = coarseTracker->lastRef;

VINS-Mono

5,380

A Robust and Versatile Monocular Visual-Inertial State Estimator

Pros of VINS-Mono

Integrates visual-inertial fusion for improved accuracy in challenging environments
Supports loop closure for drift correction and global consistency
Provides real-time performance on mobile devices

Cons of VINS-Mono

More complex system architecture, potentially harder to set up and tune
May have higher computational requirements due to additional sensor fusion
Less suitable for purely vision-based scenarios where IMU data is unavailable

Code Comparison

VINS-Mono (C++):

void FeatureManager::setDepth(const VectorXd &x) {
    int feature_index = -1;
    for (auto &it_per_id : feature) {
        it_per_id.used_num = it_per_id.feature_per_frame.size();
        if (it_per_id.used_num < 4)
            continue;
        feature_index++;
        it_per_id.estimated_depth = 1.0 / x(feature_index);
    }
}

DSO (C++):

void FullSystem::makeNewTraces(FrameHessian* newFrame, float* gtDepth) {
    pixelSelector->allowFast = true;
    int numPointsTotal = pixelSelector->makeMaps(newFrame, selectionMap, settings.desiredImmatureDensity);
    newFrame->pointHessians.reserve(numPointsTotal*1.2f);
    newFrame->pointHessiansMarginalized.reserve(numPointsTotal*1.2f);
    newFrame->pointHessiansOut.reserve(numPointsTotal*1.2f);
}

Both repositories use C++ and focus on visual odometry, but VINS-Mono incorporates IMU data for improved robustness, while DSO relies solely on visual information for a simpler, yet potentially less stable system in certain scenarios.

slambook2

5,970

edition 2 of the slambook

Pros of slambook2

Comprehensive educational resource for learning SLAM algorithms
Covers a wide range of SLAM topics and techniques
Includes practical examples and exercises for hands-on learning

Cons of slambook2

May not be as optimized for performance as DSO
Focuses more on teaching concepts rather than providing a production-ready SLAM system
Potentially less suitable for direct deployment in real-world applications

Code Comparison

DSO (Direct Sparse Odometry):

void FullSystem::trackNewCoarse(FrameHessian* fh) {
    assert(allFrameHistory.size() > 0);
    // ... (tracking implementation)
}

slambook2:

void Frontend::Track() {
    cv::Mat img_left = current_frame_->left_img_;
    cv::Mat img_right = current_frame_->right_img_;
    // ... (tracking implementation)
}

Both repositories implement tracking functions, but DSO focuses on direct sparse methods, while slambook2 provides a more general approach to tracking in a stereo setup. DSO's implementation is likely more optimized for performance, while slambook2's code is designed to be more educational and easier to understand for beginners learning SLAM concepts.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

DSO: Direct Sparse Odometry

For more information see https://vision.in.tum.de/dso

1. Related Papers

Direct Sparse Odometry, J. Engel, V. Koltun, D. Cremers, In arXiv:1607.02565, 2016
A Photometrically Calibrated Benchmark For Monocular Visual Odometry, J. Engel, V. Usenko, D. Cremers, In arXiv:1607.02555, 2016

Get some datasets from https://vision.in.tum.de/mono-dataset .

2. Installation

git clone https://github.com/JakobEngel/dso.git

2.1 Required Dependencies

suitesparse and eigen3 (required).

Required. Install with

	sudo apt-get install libsuitesparse-dev libeigen3-dev libboost-all-dev

2.2 Optional Dependencies

OpenCV (highly recommended).

Used to read / write / display images. OpenCV is only used in IOWrapper/OpenCV/*. Without OpenCV, respective dummy functions from IOWrapper/*_dummy.cpp will be compiled into the library, which do nothing. The main binary will not be created, since it is useless if it can't read the datasets from disk. Feel free to implement your own version of these functions with your prefered library, if you want to stay away from OpenCV.

Install with

sudo apt-get install libopencv-dev

Pangolin (highly recommended).

Used for 3D visualization & the GUI. Pangolin is only used in IOWrapper/Pangolin/*. You can compile without Pangolin, however then there is not going to be any visualization / GUI capability. Feel free to implement your own version of Output3DWrapper with your preferred library, and use it instead of PangolinDSOViewer

Install from https://github.com/stevenlovegrove/Pangolin

ziplib (recommended).

Used to read datasets with images as .zip, as e.g. in the TUM monoVO dataset. You can compile without this, however then you can only read images directly (i.e., have to unzip the dataset image archives before loading them).

sudo apt-get install zlib1g-dev
cd dso/thirdparty
tar -zxvf libzip-1.1.1.tar.gz
cd libzip-1.1.1/
./configure
make
sudo make install
sudo cp lib/zipconf.h /usr/local/include/zipconf.h   # (no idea why that is needed).

sse2neon (required for ARM builds).

After cloning, just run git submodule update --init to include this. It translates Intel-native SSE functions to ARM-native NEON functions during the compilation process.

2.3 Build

	cd dso
	mkdir build
	cd build
	cmake ..
	make -j4

this will compile a library libdso.a, which can be linked from external projects. It will also build a binary dso_dataset, to run DSO on datasets. However, for this OpenCV and Pangolin need to be installed.

3 Usage

Run on a dataset from https://vision.in.tum.de/mono-dataset using

	bin/dso_dataset \
		files=XXXXX/sequence_XX/images.zip \
		calib=XXXXX/sequence_XX/camera.txt \
		gamma=XXXXX/sequence_XX/pcalib.txt \
		vignette=XXXXX/sequence_XX/vignette.png \
		preset=0 \
		mode=0

See https://github.com/JakobEngel/dso_ros for a minimal example on how the library can be used from another project. It should be straight forward to implement extentions for other camera drivers, to use DSO interactively without ROS.

3.1 Dataset Format.

The format assumed is that of https://vision.in.tum.de/mono-dataset. However, it should be easy to adapt it to your needs, if required. The binary is run with:

files=XXX where XXX is either a folder or .zip archive containing images. They are sorted alphabetically. for .zip to work, need to comiple with ziplib support.
gamma=XXX where XXX is a gamma calibration file, containing a single row with 256 values, mapping [0..255] to the respective irradiance value, i.e. containing the discretized inverse response function. See TUM monoVO dataset for an example.
vignette=XXX where XXX is a monochrome 16bit or 8bit image containing the vignette as pixelwise attenuation factors. See TUM monoVO dataset for an example.
calib=XXX where XXX is a geometric camera calibration file. See below.

Geometric Calibration File.

Calibration File for Pre-Rectified Images

Pinhole fx fy cx cy 0
in_width in_height
"crop" / "full" / "none" / "fx fy cx cy 0"
out_width out_height

Calibration File for FOV camera model:

FOV fx fy cx cy omega
in_width in_height
"crop" / "full" / "fx fy cx cy 0"
out_width out_height

Calibration File for Radio-Tangential camera model

RadTan fx fy cx cy k1 k2 r1 r2
in_width in_height
"crop" / "full" / "fx fy cx cy 0"
out_width out_height

Calibration File for Equidistant camera model

EquiDistant fx fy cx cy k1 k2 k3 k4
in_width in_height
"crop" / "full" / "fx fy cx cy 0"
out_width out_height

(note: for backwards-compatibility, "Pinhole", "FOV" and "RadTan" can be omitted). See the respective ::distortCoordinates implementation in Undistorter.cpp for the exact corresponding projection function. Furthermore, it should be straight-forward to implement other camera models.

Explanation: Across all models fx fy cx cy denotes the focal length / principal point relative to the image width / height, i.e., DSO computes the camera matrix K as

	K(0,0) = width * fx
	K(1,1) = height * fy
	K(0,2) = width * cx - 0.5
	K(1,2) = height * cy - 0.5

For backwards-compatibility, if the given cx and cy are larger than 1, DSO assumes all four parameters to directly be the entries of K, and ommits the above computation.

That strange "0.5" offset: Internally, DSO uses the convention that the pixel at integer position (1,1) in the image, i.e. the pixel in the second row and second column, contains the integral over the continuous image function from (0.5,0.5) to (1.5,1.5), i.e., approximates a "point-sample" of the continuous image functions at (1.0, 1.0). In turn, there seems to be no unifying convention across calibration toolboxes whether the pixel at integer position (1,1) contains the integral over (0.5,0.5) to (1.5,1.5), or the integral over (1,1) to (2,2). The above conversion assumes that the given calibration in the calibration file uses the latter convention, and thus applies the -0.5 correction. Note that this also is taken into account when creating the scale-pyramid (see globalCalib.cpp).

Rectification modes: For image rectification, DSO either supports rectification to a user-defined pinhole model (fx fy cx cy 0), or has an option to automatically crop the image to the maximal rectangular, well-defined region (crop). full will preserve the full original field of view and is mainly meant for debugging - it will create black borders in undefined image regions, which DSO does NOT ignore (i.e., this option will generate additional outliers along those borders, and corrupt the scale-pyramid).

3.2 Commandline Options

there are many command line options available, see main_dso_pangolin.cpp. some examples include

mode=X:
- mode=0 use iff a photometric calibration exists (e.g. TUM monoVO dataset).
- mode=1 use iff NO photometric calibration exists (e.g. ETH EuRoC MAV dataset).
- mode=2 use iff images are not photometrically distorted (e.g. synthetic datasets).
preset=X
- preset=0: default settings (2k pts etc.), not enforcing real-time execution
- preset=1: default settings (2k pts etc.), enforcing 1x real-time execution
- preset=2: fast settings (800 pts etc.), not enforcing real-time execution. WARNING: overwrites image resolution with 424 x 320.
- preset=3: fast settings (800 pts etc.), enforcing 5x real-time execution. WARNING: overwrites image resolution with 424 x 320.
nolog=1: disable logging of eigenvalues etc. (good for performance)
reverse=1: play sequence in reverse
nogui=1: disable gui (good for performance)
nomt=1: single-threaded execution
prefetch=1: load into memory & rectify all images before running DSO.
start=X: start at frame X
end=X: end at frame X
speed=X: force execution at X times real-time speed (0 = not enforcing real-time)
save=1: save lots of images for video creation
quiet=1: disable most console output (good for performance)
sampleoutput=1: register a "SampleOutputWrapper", printing some sample output data to the commandline. meant as example.

3.3 Runtime Options

Some parameters can be reconfigured from the Pangolin GUI at runtime. Feel free to add more.

3.4 Accessing Data.

The easiest way to access the Data (poses, pointclouds, etc.) computed by DSO (in real-time) is to create your own Output3DWrapper, and add it to the system, i.e., to FullSystem.outputWrapper. The respective member functions will be called on various occations (e.g., when a new KF is created, when a new frame is tracked, etc.), exposing the relevant data.

See IOWrapper/Output3DWrapper.h for a description of the different callbacks available, and some basic notes on where to find which data in the used classes. See IOWrapper/OutputWrapper/SampleOutputWrapper.h for an example implementation, which just prints some example data to the commandline (use the options sampleoutput=1 quiet=1 to see the result).

Note that these callbacks block the respective DSO thread, thus expensive computations should not be performed in the callbacks, a better practice is to just copy over / publish / output the data you need.

Per default, dso_dataset writes all keyframe poses to a file result.txt at the end of a sequence, using the TUM RGB-D / TUM monoVO format ([timestamp x y z qx qy qz qw] of the cameraToWorld transformation).

3.5 Notes

the initializer is very slow, and does not work very reliably. Maybe replace by your own way to get an initialization.
see https://github.com/JakobEngel/dso_ros for a minimal example project on how to use the library with your own input / output procedures.
see settings.cpp for a LOT of settings parameters. Most of which you shouldn't touch.
setGlobalCalib(...) needs to be called once before anything is initialized, and globally sets the camera intrinsics and video resolution for convenience. probably not the most portable way of doing this though.

4 General Notes for Good Results

Accurate Geometric Calibration

Please have a look at Chapter 4.3 from the DSO paper, in particular Figure 20 (Geometric Noise). Direct approaches suffer a LOT from bad geometric calibrations: Geometric distortions of 1.5 pixel already reduce the accuracy by factor 10.
Do not use a rolling shutter camera, the geometric distortions from a rolling shutter camera are huge. Even for high frame-rates (over 60fps).
Note that the reprojection RMSE reported by most calibration tools is the reprojection RMSE on the "training data", i.e., overfitted to the the images you used for calibration. If it is low, that does not imply that your calibration is good, you may just have used insufficient images.
try different camera / distortion models, not all lenses can be modelled by all models.

Photometric Calibration

Use a photometric calibration (e.g. using https://github.com/tum-vision/mono_dataset_code ).

Translation vs. Rotation

DSO cannot do magic: if you rotate the camera too much without translation, it will fail. Since it is a pure visual odometry, it cannot recover by re-localizing, or track through strong rotations by using previously triangulated geometry.... everything that leaves the field of view is marginalized immediately.

Computation Speed

If your computer is slow, try to use "fast" settings. Or run DSO on a dataset, without enforcing real-time.

Initialization

The current initializer is not very good... it is very slow and occasionally fails. Make sure, the initial camera motion is slow and "nice" (i.e., a lot of translation and little rotation) during initialization. Possibly replace by your own initializer.

5 License

DSO was developed at the Technical University of Munich and Intel. The open-source version is licensed under the GNU General Public License Version 3 (GPLv3). For commercial purposes, we also offer a professional version, see http://vision.in.tum.de/dso for details.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot