DirectML
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Open standard for machine learning interoperability
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Quick Overview
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. It provides low-level APIs for executing machine learning primitives on GPUs and other hardware accelerators, enabling developers to integrate machine learning into their applications with optimal performance.
Pros
- Hardware acceleration for machine learning tasks on DirectX 12 compatible devices
- Seamless integration with DirectX 12 graphics pipelines
- Supports a wide range of machine learning operations and primitives
- Cross-platform compatibility (Windows and Xbox)
Cons
- Limited to DirectX 12 compatible hardware
- Steeper learning curve compared to higher-level machine learning frameworks
- Less extensive documentation and community support compared to more popular ML libraries
- Primarily focused on inference rather than training
Code Examples
- Creating a DirectML device:
ComPtr<ID3D12Device> d3d12Device;
D3D12CreateDevice(nullptr, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&d3d12Device));
ComPtr<IDMLDevice> dmlDevice;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));
- Executing a simple addition operation:
ComPtr<IDMLOperatorInitializer> initializer;
ComPtr<IDMLCompiledOperator> compiledOperator;
// Create and initialize the operator
DML_ELEMENT_WISE_ADD_OPERATOR_DESC addDesc = {};
addDesc.InputTensor1 = &inputTensor1Desc;
addDesc.InputTensor2 = &inputTensor2Desc;
addDesc.OutputTensor = &outputTensorDesc;
dmlDevice->CreateOperator(&addDesc, IID_PPV_ARGS(&operator));
dmlDevice->CompileOperator(operator.Get(), DML_EXECUTION_FLAG_NONE, IID_PPV_ARGS(&compiledOperator));
// Execute the operator
dmlCommandRecorder->RecordDispatch(commandList.Get(), compiledOperator.Get(), dispatchableBindings.Get());
- Creating a convolution operator:
DML_CONVOLUTION_OPERATOR_DESC convDesc = {};
convDesc.InputTensor = &inputTensorDesc;
convDesc.FilterTensor = &filterTensorDesc;
convDesc.OutputTensor = &outputTensorDesc;
convDesc.Mode = DML_CONVOLUTION_MODE_CROSS_CORRELATION;
convDesc.Direction = DML_CONVOLUTION_DIRECTION_FORWARD;
convDesc.Strides = strides;
convDesc.Dilations = dilations;
convDesc.StartPadding = startPadding;
convDesc.EndPadding = endPadding;
convDesc.OutputPadding = outputPadding;
convDesc.GroupCount = 1;
dmlDevice->CreateOperator(&convDesc, IID_PPV_ARGS(&convOperator));
Getting Started
-
Install the DirectML NuGet package:
nuget install Microsoft.AI.DirectML
-
Include the DirectML header in your C++ project:
#include <DirectML.h>
-
Initialize DirectML device:
ComPtr<ID3D12Device> d3d12Device; D3D12CreateDevice(nullptr, D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&d3d12Device)); ComPtr<IDMLDevice> dmlDevice; DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));
-
Create and execute operators as needed for your machine learning tasks.
Competitor Comparisons
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Larger community and ecosystem, with more resources and third-party libraries
- Supports a wider range of platforms and hardware accelerators
- More comprehensive documentation and tutorials
Cons of TensorFlow
- Steeper learning curve for beginners
- Can be slower for certain operations compared to DirectML
- Larger file size and memory footprint
Code Comparison
TensorFlow:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
DirectML:
import tensorflow as tf
from tensorflow.python.eager import context
context.set_preferred_device('DML')
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
The main difference in the code is the additional import and device preference setting for DirectML. TensorFlow's code is slightly more concise, while DirectML requires explicit device selection for GPU acceleration.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Widely adopted in the research community with extensive ecosystem
- Supports dynamic computational graphs for flexible model development
- Offers a more Pythonic and intuitive API
Cons of PyTorch
- Generally slower performance on Windows compared to DirectML
- Less optimized for DirectX-based hardware acceleration
- Steeper learning curve for beginners compared to DirectML's simplicity
Code Comparison
PyTorch:
import torch
x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.matmul(x, y)
DirectML:
import numpy as np
import directml as dml
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = dml.matmul(x, y)
Summary
PyTorch is a popular deep learning framework with a rich ecosystem and flexible design, making it ideal for research and complex model development. DirectML, on the other hand, focuses on providing efficient hardware acceleration for DirectX-based systems, offering better performance on Windows platforms. While PyTorch has a steeper learning curve, it provides more advanced features for experienced users. DirectML aims for simplicity and optimization on Microsoft platforms, making it a good choice for Windows-centric development and deployment.
Open standard for machine learning interoperability
Pros of ONNX
- Broader ecosystem support and compatibility across multiple frameworks
- More extensive model zoo and pre-trained models available
- Active community-driven development with frequent updates
Cons of ONNX
- Steeper learning curve for beginners
- May require additional tools for optimization and deployment
- Less integrated with DirectX and Windows-specific hardware acceleration
Code Comparison
ONNX example:
import onnx
model = onnx.load("model.onnx")
onnx.checker.check_model(model)
print(onnx.helper.printable_graph(model.graph))
DirectML example:
ComPtr<IDMLDevice> dmlDevice;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));
ComPtr<IDMLOperatorInitializer> initializer;
dmlDevice->CreateOperatorInitializer(1, &operatorDesc, IID_PPV_ARGS(&initializer));
ONNX focuses on model representation and interoperability, while DirectML provides low-level GPU acceleration for machine learning operations. ONNX is more versatile across platforms, whereas DirectML is optimized for Windows and DirectX-compatible hardware. ONNX has a larger community and more extensive tooling, but DirectML offers tighter integration with Microsoft's ecosystem and potentially better performance on supported devices.
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Pros of Core ML Tools
- Specifically designed for iOS and macOS, providing seamless integration with Apple's ecosystem
- Supports a wide range of popular machine learning frameworks, including TensorFlow, PyTorch, and scikit-learn
- Offers comprehensive tools for model conversion, optimization, and deployment on Apple devices
Cons of Core ML Tools
- Limited to Apple platforms, lacking cross-platform support
- May require more manual optimization for performance on non-Apple hardware
- Smaller community and ecosystem compared to DirectML
Code Comparison
Core ML Tools:
import coremltools as ct
model = ct.convert('model.h5', source='keras')
model.save('converted_model.mlmodel')
DirectML:
ComPtr<IDMLDevice> device;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&device));
ComPtr<IDMLOperatorInitializer> initializer;
device->CreateOperatorInitializer(1, &operatorDesc, IID_PPV_ARGS(&initializer));
Note: The code snippets demonstrate basic usage and may not be directly comparable due to the different nature and purposes of the libraries.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Pros of JAX
- More flexible and general-purpose, supporting a wider range of machine learning tasks
- Better support for automatic differentiation and GPU/TPU acceleration
- Larger and more active community, with frequent updates and contributions
Cons of JAX
- Steeper learning curve, especially for those not familiar with NumPy
- Less optimized for DirectX-specific hardware and scenarios
- May have higher overhead for simple operations compared to DirectML
Code Comparison
JAX example:
import jax.numpy as jnp
from jax import grad, jit
def f(x):
return jnp.sum(jnp.sin(x))
grad_f = jit(grad(f))
DirectML example:
DML_TENSOR_DESC inputDesc = {};
// ... (initialize tensor description)
DML_ELEMENT_WISE_SIN_OPERATOR_DESC sinDesc = {};
sinDesc.InputTensor = &inputDesc;
// ... (create and execute operator)
The JAX example showcases its simplicity in defining and differentiating functions, while the DirectML example demonstrates lower-level control over tensor operations. JAX provides a more Pythonic interface, whereas DirectML offers fine-grained control for DirectX-specific optimizations.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Pros of TensorRT
- Highly optimized for NVIDIA GPUs, offering superior performance on supported hardware
- Extensive support for various deep learning frameworks and models
- Robust quantization and precision calibration tools for model optimization
Cons of TensorRT
- Limited to NVIDIA hardware, lacking cross-platform support
- Steeper learning curve and more complex setup process
- Less frequent updates and potentially slower bug fixes
Code Comparison
TensorRT:
IBuilder* builder = createInferBuilder(gLogger);
INetworkDefinition* network = builder->createNetworkV2(1U << static_cast<uint32_t>(NetworkDefinitionCreationFlag::kEXPLICIT_BATCH));
IOptimizationProfile* profile = builder->createOptimizationProfile();
DirectML:
ComPtr<IDMLDevice> dmlDevice;
DMLCreateDevice(d3d12Device.Get(), DML_CREATE_DEVICE_FLAG_NONE, IID_PPV_ARGS(&dmlDevice));
ComPtr<IDMLOperatorInitializer> initializer;
dmlDevice->CreateOperatorInitializer(1, &op, IID_PPV_ARGS(&initializer));
Both libraries provide APIs for creating and optimizing deep learning models, but TensorRT focuses on NVIDIA GPUs, while DirectML offers a more hardware-agnostic approach.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
DirectML
DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
When used standalone, the DirectML API is a low-level DirectX 12 library and is suitable for high-performance, low-latency applications such as frameworks, games, and other real-time applications. The seamless interoperability of DirectML with Direct3D 12 as well as its low overhead and conformance across hardware makes DirectML ideal for accelerating machine learning when both high performance is desired, and the reliability and predictability of results across hardware is critical.
More information about DirectML can be found in Introduction to DirectML.
- Getting Started with DirectML
- DirectML Samples
- DxDispatch Tool
- Windows ML on DirectML
- ONNX Runtime on DirectML
- PyTorch with DirectML
- TensorFlow with DirectML
- Feedback
- External Links
- Contributing
Visit the DirectX Landing Page for more resources for DirectX developers.
Getting Started with DirectML
DirectML is distributed as a system component of Windows 10, and is available as part of the Windows 10 operating system (OS) in Windows 10, version 1903 (10.0; Build 18362), and newer.
Starting with DirectML version 1.4.0, DirectML is also available as a standalone redistributable package (see Microsoft.AI.DirectML), which is useful for applications that wish to use a fixed version of DirectML, or when running on older versions of Windows 10.
Hardware requirements
DirectML requires a DirectX 12 capable device. Almost all commercially-available graphics cards released in the last several years support DirectX 12. Examples of compatible hardware include:
- AMD GCN 1st Gen (Radeon HD 7000 series) and above
- Intel Haswell (4th-gen core) HD Integrated Graphics and above
- NVIDIA Kepler (GTX 600 series) and above
- Qualcomm Adreno 600 and above
For application developers
DirectML exposes a native C++ DirectX 12 API. The header and library (DirectML.h/DirectML.lib) are available as part of the redistributable NuGet package, and are also included in the Windows 10 SDK version 10.0.18362 or newer.
- The Windows 10 SDK can be downloaded from the Windows Dev Center
- Microsoft.AI.DirectML on the NuGet Gallery
- DirectML programming guide
- DirectML API reference
For users, data scientists, and researchers
DirectML is built-in as a backend to several frameworks such as Windows ML, ONNX Runtime, and TensorFlow.
See the following sections for more information:
DirectML Samples
DirectML C++ sample code is available under Samples.
- HelloDirectML: A minimal "hello world" application that executes a single DirectML operator.
- DirectMLNpuInference: A sample that showcases how to utilize NPU hardware with DirectML.
- DirectMLSuperResolution: A sample that uses DirectML to execute a basic super-resolution model to upscale video from 540p to 1080p in real time.
- yolov4: YOLOv4 is an object detection model capable of recognizing up to 80 different classes of objects in an image. This sample contains a complete end-to-end implementation of the model using DirectML, and is able to run in real time on a user-provided video stream.
DirectML Python sample code is available under Python/samples. The samples require PyDirectML, an open source Python projection library for DirectML, which can be built and installed to a Python executing environment from Python/src. Refer to the Python/README.md file for more details.
- MobileNet: Adapted from the ONNX MobileNet model. MobileNet classifies an image into 1000 different classes. It is highly efficient in speed and size, ideal for mobile applications.
- MNIST: Adapted from the ONNX MNIST model. MNIST predicts handwritten digits using a convolution neural network.
- SqueezeNet: Based on the ONNX SqueezeNet model. SqueezeNet performs image classification trained on the ImageNet dataset. It is highly efficient and provides results with good accuracy.
- FNS-Candy: Adapted from the Windows ML Style Transfer model sample, FNS-Candy re-applies specific artistic styles on regular images.
- Super Resolution: Adapted from the ONNX Super Resolution model, Super-Res upscales and sharpens the input images to refine the details and improve image quality.
DxDispatch Tool
DxDispatch is simple command-line executable for launching DirectX 12 compute programs (including DirectML operators) without writing all the C++ boilerplate.
Windows ML on DirectML
Windows ML (WinML) is a high-performance, reliable API for deploying hardware-accelerated ML inferences on Windows devices. DirectML provides the GPU backend for Windows ML.
DirectML acceleration can be enabled in Windows ML using the LearningModelDevice with any one of the DirectX DeviceKinds.
For more information, see Get Started with Windows ML.
- Windows Machine Learning Overview (docs.microsoft.com)
- Windows Machine Learning GitHub
- WinMLRunner, a tool for executing ONNX models using WinML with DirectML
ONNX Runtime on DirectML
ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn, and more.
DirectML is available as an optional execution provider for ONNX Runtime that provides hardware acceleration when running on Windows 10.
For more information about getting started, see Using the DirectML execution provider.
PyTorch with DirectML
PyTorch with DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware. This is done through torch-directml
, a plugin for PyTorch.
PyTorch with DirectML is supported on both the latest versions of Windows and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started with torch-directml
, see our Windows or WSL 2 guidance on Microsoft Learn.
TensorFlow with DirectML
TensorFlow is a popular open source platform for machine learning and is a leading framework for training of machine learning models.
DirectML acceleration for TensorFlow 1.15 is currently available for Public Preview. TensorFlow on DirectML enables training and inference of complex machine learning models on a wide range of DirectX 12-compatible hardware.
TensorFlow on DirectML is supported on both the latest versions of Windows 10 and the Windows Subsystem for Linux, and is available for download as a PyPI package. For more information about getting started, see GPU accelerated ML training (docs.microsoft.com)
- TensorFlow on DirectML GitHub repo
- TensorFlow on DirectML samples
- tensorflow-directml PyPI project
- TensorFlow GitHub | RFC: TensorFlow on DirectML
- TensorFlow homepage
Feedback
We look forward to hearing from you!
-
For TensorFlow with DirectML issues, bugs, and feedback; or for general DirectML issues and feedback, please file an issue or contact us directly at askdirectml@microsoft.com.
-
For PyTorch with DirectML issues, bugs, and feedback; or for general DirectML issues and feedback, please file an issue or contact us directly at askdirectml@microsoft.com.
-
For Windows ML issues, please file a GitHub issue at microsoft/Windows-Machine-Learning or contact us directly at askwindowsml@microsoft.com.
-
For ONNX Runtime issues, please file an issue at microsoft/onnxruntime.
External Links
Documentation
DirectML programming guide
DirectML API reference
More information
Introducing DirectML (Game Developers Conference '19)
Accelerating GPU Inferencing with DirectML and DirectX 12 (SIGGRAPH '18)
Windows AI: hardware-accelerated ML on Windows devices (Microsoft Build '20)
Gaming with Windows ML (DirectX Developer Blog)
DirectML at GDC 2019 (DirectX Developer Blog)
DirectX ⤠Linux (DirectX Developer Blog)
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Open standard for machine learning interoperability
Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot