DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.

2,478

319

2,478

257

View on GitHub

Top Related Projects

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

onnx

19,372

Open standard for machine learning interoperability

coremltools

4,869

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

jax

32,985

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

TensorRT

11,912

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Quick Overview

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. It provides low-level APIs for executing machine learning primitives on GPUs and other hardware accelerators, enabling developers to integrate machine learning into their applications with optimal performance.

Pros

Hardware acceleration for machine learning tasks on DirectX 12 compatible devices
Seamless integration with DirectX 12 graphics pipelines
Supports a wide range of machine learning operations and primitives
Cross-platform compatibility (Windows and Xbox)

Cons

Limited to DirectX 12 compatible hardware
Steeper learning curve compared to higher-level machine learning frameworks
Less extensive documentation and community support compared to more popular ML libraries
Primarily focused on inference rather than training

Code Examples

Creating a DirectML device:

ComPtr<ID3D12Device> d3d12Device;
D3D12CreateDevice(nullptr, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&d3d12Device));

ComPtr<IDMLDevice> dmlDevice;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));

Executing a simple addition operation:

ComPtr<IDMLOperatorInitializer> initializer;
ComPtr<IDMLCompiledOperator> compiledOperator;

// Create and initialize the operator
DML_ELEMENT_WISE_ADD_OPERATOR_DESC addDesc = {};
addDesc.InputTensor1 = &inputTensor1Desc;
addDesc.InputTensor2 = &inputTensor2Desc;
addDesc.OutputTensor = &outputTensorDesc;

dmlDevice->CreateOperator(&addDesc, IID_PPV_ARGS(&operator));
dmlDevice->CompileOperator(operator.Get(), DML_EXECUTION_FLAG_NONE, IID_PPV_ARGS(&compiledOperator));

// Execute the operator
dmlCommandRecorder->RecordDispatch(commandList.Get(), compiledOperator.Get(), dispatchableBindings.Get());

Creating a convolution operator:

DML_CONVOLUTION_OPERATOR_DESC convDesc = {};
convDesc.InputTensor = &inputTensorDesc;
convDesc.FilterTensor = &filterTensorDesc;
convDesc.OutputTensor = &outputTensorDesc;
convDesc.Mode = DML_CONVOLUTION_MODE_CROSS_CORRELATION;
convDesc.Direction = DML_CONVOLUTION_DIRECTION_FORWARD;
convDesc.Strides = strides;
convDesc.Dilations = dilations;
convDesc.StartPadding = startPadding;
convDesc.EndPadding = endPadding;
convDesc.OutputPadding = outputPadding;
convDesc.GroupCount = 1;

dmlDevice->CreateOperator(&convDesc, IID_PPV_ARGS(&convOperator));

Getting Started

Install the DirectML NuGet package:
```
nuget install Microsoft.AI.DirectML
```
Include the DirectML header in your C++ project:
```
#include <DirectML.h>
```

Initialize DirectML device:

ComPtr<ID3D12Device> d3d12Device;
D3D12CreateDevice(nullptr, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&d3d12Device));

ComPtr<IDMLDevice> dmlDevice;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));

Create and execute operators as needed for your machine learning tasks.

Competitor Comparisons

tensorflow

190,523

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Larger community and ecosystem, with more resources and third-party libraries
Supports a wider range of platforms and hardware accelerators
More comprehensive documentation and tutorials

Cons of TensorFlow

Steeper learning curve for beginners
Can be slower for certain operations compared to DirectML
Larger file size and memory footprint

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

DirectML:

import tensorflow as tf
from tensorflow.python.eager import context
context.set_preferred_device('DML')

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

The main difference in the code is the additional import and device preference setting for DirectML. TensorFlow's code is slightly more concise, while DirectML requires explicit device selection for GPU acceleration.

pytorch

91,080

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

Widely adopted in the research community with extensive ecosystem
Supports dynamic computational graphs for flexible model development
Offers a more Pythonic and intuitive API

Cons of PyTorch

Generally slower performance on Windows compared to DirectML
Less optimized for DirectX-based hardware acceleration
Steeper learning curve for beginners compared to DirectML's simplicity

Code Comparison

PyTorch:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.matmul(x, y)

DirectML:

import numpy as np
import directml as dml

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = dml.matmul(x, y)

Summary

PyTorch is a popular deep learning framework with a rich ecosystem and flexible design, making it ideal for research and complex model development. DirectML, on the other hand, focuses on providing efficient hardware acceleration for DirectX-based systems, offering better performance on Windows platforms. While PyTorch has a steeper learning curve, it provides more advanced features for experienced users. DirectML aims for simplicity and optimization on Microsoft platforms, making it a good choice for Windows-centric development and deployment.

onnx

19,372

Open standard for machine learning interoperability

Pros of ONNX

Broader ecosystem support and compatibility across multiple frameworks
More extensive model zoo and pre-trained models available
Active community-driven development with frequent updates

Cons of ONNX

Steeper learning curve for beginners
May require additional tools for optimization and deployment
Less integrated with DirectX and Windows-specific hardware acceleration

Code Comparison

ONNX example:

import onnx
model = onnx.load("model.onnx")
onnx.checker.check_model(model)
print(onnx.helper.printable_graph(model.graph))

DirectML example:

ComPtr<IDMLDevice> dmlDevice;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));
ComPtr<IDMLOperatorInitializer> initializer;
dmlDevice->CreateOperatorInitializer(1, &operatorDesc, IID_PPV_ARGS(&initializer));

ONNX focuses on model representation and interoperability, while DirectML provides low-level GPU acceleration for machine learning operations. ONNX is more versatile across platforms, whereas DirectML is optimized for Windows and DirectX-compatible hardware. ONNX has a larger community and more extensive tooling, but DirectML offers tighter integration with Microsoft's ecosystem and potentially better performance on supported devices.

coremltools

4,869

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Pros of Core ML Tools

Specifically designed for iOS and macOS, providing seamless integration with Apple's ecosystem
Supports a wide range of popular machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn
Offers comprehensive tools for model conversion, optimization, and deployment on Apple devices

Cons of Core ML Tools

Limited to Apple platforms, lacking cross-platform support
May require more manual optimization for performance on non-Apple hardware
Smaller community and ecosystem compared to DirectML

Code Comparison

Core ML Tools:

import coremltools as ct

model = ct.convert('model.h5', source='keras')
model.save('converted_model.mlmodel')

DirectML:

ComPtr<IDMLDevice> device;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&device));

ComPtr<IDMLOperatorInitializer> initializer;
device->CreateOperatorInitializer(1, &operatorDesc, IID_PPV_ARGS(&initializer));

Note: The code snippets demonstrate basic usage and may not be directly comparable due to the different nature and purposes of the libraries.

jax

32,985

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Pros of JAX

More flexible and general-purpose, supporting a wider range of machine learning tasks
Better support for automatic differentiation and GPU/TPU acceleration
Larger and more active community, with frequent updates and contributions

Cons of JAX

Steeper learning curve, especially for those not familiar with NumPy
Less optimized for DirectX-specific hardware and scenarios
May have higher overhead for simple operations compared to DirectML

Code Comparison

JAX example:

import jax.numpy as jnp
from jax import grad, jit

def f(x):
    return jnp.sum(jnp.sin(x))

grad_f = jit(grad(f))

DirectML example:

DML_TENSOR_DESC inputDesc = {};
// ... (initialize tensor description)
DML_ELEMENT_WISE_SIN_OPERATOR_DESC sinDesc = {};
sinDesc.InputTensor = &inputDesc;
// ... (create and execute operator)

The JAX example showcases its simplicity in defining and differentiating functions, while the DirectML example demonstrates lower-level control over tensor operations. JAX provides a more Pythonic interface, whereas DirectML offers fine-grained control for DirectX-specific optimizations.

TensorRT

11,912

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Pros of TensorRT

Highly optimized for NVIDIA GPUs, offering superior performance on supported hardware
Extensive support for various deep learning frameworks and models
Robust quantization and precision calibration tools for model optimization

Cons of TensorRT

Limited to NVIDIA hardware, lacking cross-platform support
Steeper learning curve and more complex setup process
Less frequent updates and potentially slower bug fixes

Code Comparison

TensorRT:

IBuilder* builder = createInferBuilder(gLogger);
INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
IOptimizationProfile* profile = builder->createOptimizationProfile();

DirectML:

ComPtr<IDMLDevice> dmlDevice;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));
ComPtr<IDMLOperatorInitializer> initializer;
dmlDevice->CreateOperatorInitializer(1, &op, IID_PPV_ARGS(&initializer));

Both libraries provide APIs for creating and optimizing deep learning models, but TensorRT focuses on NVIDIA GPUs, while DirectML offers a more hardware-agnostic approach.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

DirectML

When used standalone, the DirectML API is a low-level DirectX 12 library and is suitable for high-performance, low-latency applications such as frameworks, games, and other real-time applications. The seamless interoperability of DirectML with Direct3D 12 as well as its low overhead and conformance across hardware makes DirectML ideal for accelerating machine learning when both high performance is desired, and the reliability and predictability of results across hardware is critical.

More information about DirectML can be found in Introduction to DirectML.

Getting Started with DirectML
DirectML Samples
DxDispatch Tool
Windows ML on DirectML
ONNX Runtime on DirectML
PyTorch with DirectML
TensorFlow with DirectML
Feedback
External Links
- Documentation
- More information
Contributing

Visit the DirectX Landing Page for more resources for DirectX developers.

Getting Started with DirectML

DirectML is distributed as a system component of Windows 10, and is available as part of the Windows 10 operating system (OS) in Windows 10, version 1903 (10.0; Build 18362), and newer.

Starting with DirectML version 1.4.0, DirectML is also available as a standalone redistributable package (see Microsoft.AI.DirectML), which is useful for applications that wish to use a fixed version of DirectML, or when running on older versions of Windows 10.

Hardware requirements

DirectML requires a DirectX 12 capable device. Almost all commercially-available graphics cards released in the last several years support DirectX 12. Examples of compatible hardware include:

AMD GCN 1st Gen (Radeon HD 7000 series) and above
Intel Haswell (4th-gen core) HD Integrated Graphics and above
NVIDIA Kepler (GTX 600 series) and above
Qualcomm Adreno 600 and above

For application developers

DirectML exposes a native C++ DirectX 12 API. The header and library (DirectML.h/DirectML.lib) are available as part of the redistributable NuGet package, and are also included in the Windows 10 SDK version 10.0.18362 or newer.

The Windows 10 SDK can be downloaded from the Windows Dev Center
Microsoft.AI.DirectML on the NuGet Gallery
DirectML programming guide
DirectML API reference

For users, data scientists, and researchers

DirectML is built-in as a backend to several frameworks such as Windows ML, ONNX Runtime, and TensorFlow.

See the following sections for more information:

Windows ML on DirectML
ONNX Runtime on DirectML
TensorFlow with DirectML
PyTorch with DirectML

DirectML Samples

DirectML C++ sample code is available under Samples.

HelloDirectML: A minimal "hello world" application that executes a single DirectML operator.
DirectMLNpuInference: A sample that showcases how to utilize NPU hardware with DirectML.
DirectMLSuperResolution: A sample that uses DirectML to execute a basic super-resolution model to upscale video from 540p to 1080p in real time.
yolov4: YOLOv4 is an object detection model capable of recognizing up to 80 different classes of objects in an image. This sample contains a complete end-to-end implementation of the model using DirectML, and is able to run in real time on a user-provided video stream.

DirectML Python sample code is available under Python/samples. The samples require PyDirectML, an open source Python projection library for DirectML, which can be built and installed to a Python executing environment from Python/src. Refer to the Python/README.md file for more details.

MobileNet: Adapted from the ONNX MobileNet model. MobileNet classifies an image into 1000 different classes. It is highly efficient in speed and size, ideal for mobile applications.
MNIST: Adapted from the ONNX MNIST model. MNIST predicts handwritten digits using a convolution neural network.
SqueezeNet: Based on the ONNX SqueezeNet model. SqueezeNet performs image classification trained on the ImageNet dataset. It is highly efficient and provides results with good accuracy.
FNS-Candy: Adapted from the Windows ML Style Transfer model sample, FNS-Candy re-applies specific artistic styles on regular images.
Super Resolution: Adapted from the ONNX Super Resolution model, Super-Res upscales and sharpens the input images to refine the details and improve image quality.

DxDispatch Tool

DxDispatch is simple command-line executable for launching DirectX 12 compute programs (including DirectML operators) without writing all the C++ boilerplate.

Windows ML on DirectML

Windows ML (WinML) is a high-performance, reliable API for deploying hardware-accelerated ML inferences on Windows devices. DirectML provides the GPU backend for Windows ML.

DirectML acceleration can be enabled in Windows ML using the LearningModelDevice with any one of the DirectX DeviceKinds.

For more information, see Get Started with Windows ML.

Windows Machine Learning Overview (docs.microsoft.com)
Windows Machine Learning GitHub
WinMLRunner, a tool for executing ONNX models using WinML with DirectML

ONNX Runtime on DirectML

ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn, and more.

DirectML is available as an optional execution provider for ONNX Runtime that provides hardware acceleration when running on Windows 10.

For more information about getting started, see Using the DirectML execution provider.

PyTorch with DirectML

PyTorch with DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware. This is done through torch-directml, a plugin for PyTorch.

PyTorch with DirectML is supported on both the latest versions of Windows and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started with torch-directml, see our Windows or WSL 2 guidance on Microsoft Learn.

TensorFlow with DirectML

TensorFlow is a popular open source platform for machine learning and is a leading framework for training of machine learning models.

DirectML acceleration for TensorFlow 1.15 is currently available for Public Preview. TensorFlow on DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.

TensorFlow on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)

Feedback

We look forward to hearing from you!

For TensorFlow with DirectML issues, bugs, and feedback; or for general DirectML issues and feedback, please file an issue or contact us directly at askdirectml@microsoft.com.
For PyTorch with DirectML issues, bugs, and feedback; or for general DirectML issues and feedback, please file an issue or contact us directly at askdirectml@microsoft.com.
For Windows ML issues, please file a GitHub issue at microsoft/Windows-Machine-Learning or contact us directly at askwindowsml@microsoft.com.
For ONNX Runtime issues, please file an issue at microsoft/onnxruntime.

External Links

Documentation

DirectML programming guide
DirectML API reference

More information

Introducing DirectML (Game Developers Conference '19)
Accelerating GPU Inferencing with DirectML and DirectX 12 (SIGGRAPH '18)
Windows AI: hardware-accelerated ML on Windows devices (Microsoft Build '20)
Gaming with Windows ML (DirectX Developer Blog)
DirectML at GDC 2019 (DirectX Developer Blog)
DirectX â¤ Linux (DirectX Developer Blog)

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot