RayTracingInVulkan

Implementation of Peter Shirley's Ray Tracing In One Weekend book using Vulkan and NVIDIA's RTX extension.

1,389

126

1,389

View on GitHub

Top Related Projects

Vulkan-Samples

4,825

One stop solution for all Vulkan samples

Vulkan

11,231

C++ examples for the Vulkan graphics API

VulkanTutorial

3,444

Tutorial for the Vulkan graphics and compute API

renderdoc

9,759

RenderDoc is a stand-alone graphics debugging tool.

Quick Overview

The GPSnoopy/RayTracingInVulkan repository is a project that demonstrates the implementation of a ray tracing algorithm using the Vulkan graphics API. It showcases the use of Vulkan's ray tracing capabilities to create realistic 3D scenes with advanced lighting and rendering effects.

Pros

Vulkan-based: The project utilizes the Vulkan graphics API, which provides low-level control and high performance for graphics rendering.
Ray Tracing Showcase: The project demonstrates the power of ray tracing, a rendering technique that can produce highly realistic and accurate lighting effects.
Educational: The codebase can be a valuable resource for developers interested in learning about ray tracing and Vulkan programming.
Cross-platform: The project is designed to be cross-platform, allowing it to run on a variety of hardware and operating systems.

Cons

Complexity: Implementing a full-fledged ray tracing system in Vulkan can be a complex and challenging task, which may make it difficult for beginners to understand and contribute to the project.
Performance Requirements: Ray tracing is a computationally intensive process, and the project may require high-end hardware to achieve optimal performance.
Limited Documentation: The project's documentation could be more comprehensive, making it harder for new contributors to get started.
Lack of Active Maintenance: The project appears to have limited active maintenance, with the last commit being over a year old.

Code Examples

The GPSnoopy/RayTracingInVulkan repository is a code library, and here are a few short code examples to give you a better understanding of the project:

Ray Tracing Pipeline Creation:

VkRayTracingPipelineCreateInfoKHR pipelineInfo = {};
pipelineInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_PIPELINE_CREATE_INFO_KHR;
pipelineInfo.stageCount = static_cast<uint32_t>(shaderStages.size());
pipelineInfo.pStages = shaderStages.data();
pipelineInfo.groupCount = static_cast<uint32_t>(groups.size());
pipelineInfo.pGroups = groups.data();
pipelineInfo.maxPipelineRayRecursionDepth = 2;
pipelineInfo.layout = pipelineLayout;
VK_CHECK(vkCreateRayTracingPipelinesKHR(device, VK_NULL_HANDLE, 1, &pipelineInfo, nullptr, &rtPipeline));

This code snippet demonstrates the creation of a Vulkan ray tracing pipeline, which is a crucial component of the ray tracing process.

Shader Binding Table (SBT) Creation:

VkStridedDeviceAddressRegionKHR raygenShaderSBTEntry = {};
raygenShaderSBTEntry.deviceAddress = raygenShaderBindingTableAddress;
raygenShaderSBTEntry.stride = raygenShaderBindingTableStride;
raygenShaderSBTEntry.size = raygenShaderBindingTableSize;

VkStridedDeviceAddressRegionKHR missShaderSBTEntry = {};
missShaderSBTEntry.deviceAddress = missShaderBindingTableAddress;
missShaderSBTEntry.stride = missShaderBindingTableStride;
missShaderSBTEntry.size = missShaderBindingTableSize;

VkStridedDeviceAddressRegionKHR hitShaderSBTEntry = {};
hitShaderSBTEntry.deviceAddress = hitShaderBindingTableAddress;
hitShaderSBTEntry.stride = hitShaderBindingTableStride;
hitShaderSBTEntry.size = hitShaderBindingTableSize;

This code snippet demonstrates the creation of the Shader Binding Table (SBT), which is used to identify the shaders that will be executed during the ray tracing process.

Ray Tracing Dispatch:

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_RAY_TRACING

Competitor Comparisons

Vulkan-Samples

4,825

One stop solution for all Vulkan samples

Pros of Vulkan-Samples

Comprehensive collection of Vulkan API usage examples
Actively maintained by the Khronos Group
Provides a good starting point for learning Vulkan

Cons of Vulkan-Samples

Primarily focused on basic Vulkan functionality, less emphasis on advanced techniques
May not cover the latest Vulkan features and extensions
Codebase can be more complex due to the wide range of samples

Code Comparison

RayTracingInVulkan

vkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, pipeline);
vkCmdBindDescriptorSets(cmdBuffer, VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR, pipelineLayout, 0, 1, &descriptorSet, 0, nullptr);
vkCmdTraceRaysKHR(cmdBuffer, &raygenShaderBindingTable, &missShaderBindingTable, &hitShaderBindingTable, &callableShaderBindingTable, width, height, 1);

Vulkan-Samples

vkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
vkCmdBindVertexBuffers(cmdBuffer, 0, 1, vertexBuffer.buffer.data(), vertexBuffer.offsets.data());
vkCmdBindIndexBuffer(cmdBuffer, indexBuffer.buffer, 0, VK_INDEX_TYPE_UINT32);

Vulkan

11,231

C++ examples for the Vulkan graphics API

Pros of Vulkan

Comprehensive set of examples and demos showcasing various Vulkan features and techniques
Well-documented and actively maintained repository
Includes both basic and advanced Vulkan samples, catering to developers of all skill levels

Cons of Vulkan

Primarily focused on general Vulkan samples, rather than specific ray tracing implementation
May require more effort to extract and adapt the ray tracing-specific code from the broader Vulkan examples

Code Comparison

RayTracingInVulkan (GPSnoopy/RayTracingInVulkan)

vkCmdBindPipeline(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipeline);
vkCmdBindDescriptorSets(commandBuffer, VK_PIPELINE_BIND_POINT_COMPUTE, pipelineLayout, 0, 1, &descriptorSet, 0, nullptr);
vkCmdDispatch(commandBuffer, static_cast<uint32_t>(ceil(width / 16.0f)), static_cast<uint32_t>(ceil(height / 16.0f)), 1);

Vulkan (SaschaWillems/Vulkan)

vkCmdBindPipeline(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipeline);
vkCmdBindDescriptorSets(cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, pipelineLayout, 0, 1, &descriptorSet, 0, nullptr);
vkCmdDraw(cmdBuffer, 3, 1, 0, 0);

The code comparison highlights the differences in the Vulkan commands used for a compute-based ray tracing implementation (RayTracingInVulkan) versus a more general graphics-oriented Vulkan sample (Vulkan).

VulkanTutorial

3,444

Tutorial for the Vulkan graphics and compute API

Pros of Overv/VulkanTutorial

Comprehensive and well-structured tutorial series covering a wide range of Vulkan concepts
Includes detailed explanations and code examples for each step
Provides a solid foundation for learning Vulkan programming

Cons of Overv/VulkanTutorial

Primarily focused on basic Vulkan concepts, with less emphasis on advanced topics like ray tracing
May not provide as much hands-on experience with complex Vulkan applications as RayTracingInVulkan

Code Comparison

RayTracingInVulkan (GPSnoopy/RayTracingInVulkan)

VkResult CreateRayTracingPipeline(VkDevice device, VkPipelineCache pipelineCache, VkRayTracingPipelineCreateInfoKHR* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkPipeline* pPipeline) {
    auto vkCreateRayTracingPipelinesKHR = (PFN_vkCreateRayTracingPipelinesKHR)vkGetDeviceProcAddr(device, "vkCreateRayTracingPipelinesKHR");
    if (vkCreateRayTracingPipelinesKHR == nullptr) {
        return VK_ERROR_EXTENSION_NOT_PRESENT;
    }
    return vkCreateRayTracingPipelinesKHR(device, pipelineCache, 1, pCreateInfo, pAllocator, pPipeline);
}

VulkanTutorial (Overv/VulkanTutorial)

VkResult CreateDebugUtilsMessengerEXT(VkInstance instance, const VkDebugUtilsMessengerCreateInfoEXT* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkDebugUtilsMessengerEXT* pDebugMessenger) {
    auto func = (PFN_vkCreateDebugUtilsMessengerEXT)vkGetInstanceProcAddr(instance, "vkCreateDebugUtilsMessengerEXT");
    if (func != nullptr) {
        return func(instance, pCreateInfo, pAllocator, pDebugMessenger);
    } else {
        return VK_ERROR_EXTENSION_NOT_PRESENT;
    }
}

The code snippets demonstrate the different approaches taken in the two projects. RayTracingInVulkan focuses on creating a ray tracing pipeline, while VulkanTutorial provides a utility function for creating a debug messenger.

renderdoc

9,759

RenderDoc is a stand-alone graphics debugging tool.

Pros of RenderDoc

RenderDoc is a comprehensive and feature-rich graphics debugging and profiling tool that supports a wide range of graphics APIs, including Vulkan, DirectX, OpenGL, and more.
It provides detailed information about the state of the graphics pipeline, including resource usage, shader code, and performance metrics.
RenderDoc has a user-friendly interface and a powerful set of tools for analyzing and debugging graphics applications.

Cons of RenderDoc

RenderDoc is a large and complex application, which can make it challenging to set up and use, especially for beginners.
The tool is primarily focused on debugging and profiling, and may not be as well-suited for tasks like ray tracing or other advanced graphics techniques.

Code Comparison

Here's a brief comparison of the code structure between RayTracingInVulkan and RenderDoc:

RayTracingInVulkan (main.cpp):

int main() {
    RayTracingApp app;
    app.run();
    return 0;
}

RenderDoc (renderdoc.cpp):

int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) {
    return RenderDoc::Inst().AppInitialize();
}

Both projects use a similar structure, with a main entry point that initializes and runs the application. However, RenderDoc has a more complex setup process, as it needs to integrate with the host application and provide debugging and profiling functionality.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Ray Tracing In Vulkan

My implementation of Peter Shirley's Ray Tracing in One Weekend books using Vulkan and NVIDIA's RTX extension (formerly VK_NV_ray_tracing, now ported to Khronos cross platform VK_KHR_ray_tracing_pipeline extension). This allows most scenes to be rendered at interactive speed on appropriate hardware.

The real-time ray tracer can also load full geometry from OBJ files as well as render the procedural spheres from the book. An accumulation buffer is used to increase the sample count when the camera is not moving while keeping the frame rate interactive. I have added a UI built using Dear ImGui to allow changing the renderer parameters on the fly. Unlike projects such as Q2VKPT, there is no denoising filter. So the image will get noisy when moving the camera.

This personal project follows my own attempts at CPU ray tracing following Peter Shirley's books (see here and here if you are interested).

Karim Sayed also wrote an excellent CUDA implementation, along with a detailed step-by-step recount article of each iteration. It is a pretty detailed introduction to key CUDA performance considerations.

Gallery

Performance

Using a GeForce RTX 2080 Ti, the rendering speed is obscenely faster than using the CPU renderer. Obviously both implementations are still quite naive in some places, but I'm really impressed by the performance. The cover scene of the first book reaches ~140fps at 1280x720 using 8 rays per pixel and up to 16 bounces.

I suspect performance could be improved further. I have created each object in the scene as a separate instance in the top level acceleration structure, which is probably not the best for data locality. The same goes for displaying multiple Lucy statues, where I have naively duplicated the geometry rather than instancing it multiple times.

Benchmarking

Command line arguments can be used to control various aspects of the application. Use --help to see all modes and arguments. For example, to run the ray tracer in benchmark mode in 2560x1440 fullscreen for scene #1 with vsync off:

RayTracer.exe --benchmark --width 2560 --height 1440 --fullscreen --scene 1 --present-mode 0

To benchmark all the scenes, starting from scene #1:

RayTracer.exe --benchmark --width 2560 --height 1440 --fullscreen --scene 1 --next-scenes --present-mode 0

Here are my results with the command above on a few different computers.

RayTracer Release 6 (NVIDIA drivers 461.40, AMD drivers 21.1.1)

Platform	Scene 1	Scene 2	Scene 3	Scene 4	Scene 5
Radeon RX 6900 XT	52.9 fps	52.2 fps	24.0 fps	41.0 fps	14.1 fps
GeForce RTX 3090 FE	42.8 fps	43.6 fps	38.9 fps	79.5 fps	40.0 fps
GeForce RTX 2080 Ti FE	37.7 fps	38.2 fps	24.2 fps	58.7 fps	21.4 fps

RayTracer Release 4 (NVIDIA drivers 436.48)

Platform	Scene 1	Scene 2	Scene 3	Scene 4	Scene 5
GeForce RTX 2080 Ti FE	36.1 fps	35.7 fps	19.9 fps	54.9 fps	15.1 fps
GeForce RTX 2070	19.9 fps	19.9 fps	11.7 fps	30.4 fps	9.5 fps
GeForce GTX 1080 Ti FE	3.4 fps	3.4 fps	1.9 fps	3.8 fps	1.3 fps

Building

First you will need to install the Vulkan SDK. For Windows, LunarG provides installers. For Ubuntu LTS, they have native packages available. For other Linux distributions, they only provide tarballs. The rest of the third party dependencies can be built using Microsoft's vcpkg as provided by the scripts below.

If in doubt, please check the GitHub Actions continuous integration configurations for more details.

Windows (Visual Studio 2022 x64 solution)

vcpkg_windows.bat
build_windows.bat

Linux (GCC 9+ Makefile)

For example, on Ubuntu 20.04 (same as the CI pipeline, build steps on other distributions may vary):

sudo apt-get install curl unzip tar libxi-dev libxinerama-dev libxcursor-dev xorg-dev
./vcpkg_linux.sh
./build_linux.sh

Fedora Installation

sudo dnf install libXinerama-devel libXcursor-devel libX11-devel libXrandr-devel mesa-libGLU-devel pkgconfig ninja-build cmake gcc gcc-c++ vulkan-validation-layers-devel vulkan-headers vulkan-tools vulkan-loader-devel vulkan-loader glslang glslc
./vcpkg_linux.sh
./build_linux.sh

Random Thoughts

I suspect the RTX 2000 series RT cores to implement ray-AABB collision detection using reduced float precision. Early in the development, when trying to get the sphere procedural rendering to work, reporting an intersection every time the rint shader is invoked allowed to visualise the AABB of each procedural instance. The rendering of the bounding volume had many artifacts around the boxes edges, typical of reduced precision.
When I upgraded the drivers to 430.86, performance significantly improved (+50%). This was around the same time Quake II RTX was released by NVIDIA. Coincidence?
When looking at the benchmark results of an RTX 2070 and an RTX 2080 Ti, the performance differences mostly in line with the number of CUDA cores and RT cores rather than being influences by other metrics. Although I do not know at this point whether the CUDA cores or the RT cores are the main bottleneck.
UPDATE 2020-01-07: the RTX 30xx results seem to imply that performance is mostly dictated by the number of RT cores. Compared to Turing, Ampere achieves 2x RT performance only when using ray-triangle intersection (as expected as per NVIDIA Ampere whitepaper), otherwise performance per RT core is the same. This leads to situations such as an RTX 2080 Ti being faster than an RTX 3080 when using procedural geometry.
UPDATE 2020-01-31: the 6900 XT results show the RDNA 2 architecture performing surprisingly well in procedural geometry scenes. Is it because the RDNA2 BVH-ray intersections are done using the generic computing units (and there are plenty of those), whereas Ampere is bottlenecked by its small number of RT cores in these simple scenes? Or is RDNA2 Infinity Cache really shining here? The triangle-based geometry scenes highlight how efficient Ampere RT cores are in handling triangle-ray intersections; unsurprisingly as these scenes are more representative of what video games would do in practice.

References

Initial Implementation (NVIDIA vendor specific extension)

Vulkan Khronos Ray Tracing (cross platform extension)

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot