Convert Figma logo to code with AI

uxlfoundation logooneDNN

oneAPI Deep Neural Network Library (oneDNN)

3,750
1,033
3,750
126

Top Related Projects

3,750

oneAPI Deep Neural Network Library (oneDNN)

188,828

An Open Source Machine Learning Framework for Everyone

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

11,937

Open deep learning compiler stack for cpu, gpu and specialized accelerators

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

7,173

CUDA Templates for Linear Algebra Subroutines

Quick Overview

oneDNN (Deep Neural Network Library) is an open-source performance library for deep learning applications. It accelerates performance-critical operations such as convolution, matrix multiplication, and pooling on various hardware platforms, including CPUs and GPUs. oneDNN is part of the oneAPI initiative and aims to provide optimized primitives for deep learning frameworks and applications.

Pros

  • High performance across multiple hardware platforms (CPUs, GPUs)
  • Extensive support for various deep learning operations and primitives
  • Integration with popular deep learning frameworks (TensorFlow, PyTorch)
  • Active development and maintenance by Intel and the open-source community

Cons

  • Steeper learning curve compared to higher-level deep learning libraries
  • Documentation can be complex for beginners
  • Limited support for some specialized hardware accelerators
  • May require manual tuning for optimal performance in certain scenarios

Code Examples

  1. Creating and executing a simple matrix multiplication operation:
#include <oneapi/dnnl/dnnl.hpp>
#include <vector>

using namespace dnnl;

int main() {
    engine eng(engine::kind::cpu, 0);
    stream s(eng);

    const int M = 128, N = 128, K = 128;
    std::vector<float> a(M * K), b(K * N), c(M * N);

    auto matmul_d = matmul::desc({{M, K}, {K, N}, {M, N}});
    auto matmul_pd = matmul::primitive_desc(matmul_d, eng);
    auto matmul = matmul(matmul_pd);

    memory a_mem({{M, K}, memory::data_type::f32, memory::format_tag::ab}, eng, a.data());
    memory b_mem({{K, N}, memory::data_type::f32, memory::format_tag::ab}, eng, b.data());
    memory c_mem({{M, N}, memory::data_type::f32, memory::format_tag::ab}, eng, c.data());

    matmul.execute(s, {{DNNL_ARG_SRC, a_mem}, {DNNL_ARG_WEIGHTS, b_mem}, {DNNL_ARG_DST, c_mem}});
    s.wait();

    return 0;
}
  1. Creating and executing a 2D convolution operation:
#include <oneapi/dnnl/dnnl.hpp>
#include <vector>

using namespace dnnl;

int main() {
    engine eng(engine::kind::cpu, 0);
    stream s(eng);

    const int N = 1, IC = 3, IH = 224, IW = 224, OC = 64, KH = 3, KW = 3;
    std::vector<float> src(N * IC * IH * IW), weights(OC * IC * KH * KW), dst;

    auto conv_d = convolution_forward::desc(
        prop_kind::forward_inference,
        algorithm::convolution_direct,
        {{N, IC, IH, IW}, memory::data_type::f32, memory::format_tag::nchw},
        {{OC, IC, KH, KW}, memory::data_type::f32, memory::format_tag::oihw},
        {{N, OC, IH - KH + 1, IW - KW + 1}, memory::data_type::f32, memory::format_tag::nchw},
        {1, 1}, {0, 0}, {0, 0}
    );

    auto conv_pd = convolution_forward::primitive_desc(conv_d, eng);
    auto conv = convolution_forward(conv_pd);

    memory src_mem({{N, IC, IH, IW}, memory::data_type::f32, memory::format_tag::nchw}, eng, src.data());
    memory weights_mem({{OC, IC, KH, KW}, memory::data_type::f32, memory::format_tag::oih

Competitor Comparisons

3,750

oneAPI Deep Neural Network Library (oneDNN)

Pros of oneDNN

  • Identical repositories, so no distinct advantages
  • Both provide the same deep neural network acceleration library
  • Equal performance optimizations and hardware support

Cons of oneDNN

  • No unique features compared to the other repository
  • Same potential limitations or issues as the other oneDNN repo
  • Identical documentation and support resources

Code Comparison

Both repositories contain the same codebase, so there are no differences to highlight. Here's a sample from both:

#include "oneapi/dnnl/dnnl.hpp"
#include <vector>

using namespace dnnl;

int main() {
    engine eng(engine::kind::cpu, 0);
    stream s(eng);
    // ... (identical code continues)
}

As the repositories are the same, there are no meaningful distinctions in terms of code structure, features, or implementation. Users can choose either repository interchangeably for their deep learning acceleration needs.

188,828

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Comprehensive ecosystem with tools for model development, training, and deployment
  • Extensive documentation and large community support
  • Flexible architecture supporting various platforms and devices

Cons of TensorFlow

  • Steeper learning curve for beginners
  • Can be slower for certain operations compared to specialized libraries
  • Larger footprint and resource requirements

Code Comparison

TensorFlow:

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

oneDNN:

#include "dnnl.hpp"

auto fc1 = dnnl::inner_product_forward::primitive_desc(
    {{64, 784}, dnnl::memory::data_type::f32, dnnl::memory::format_tag::nc},
    {{64, 64}, dnnl::memory::data_type::f32, dnnl::memory::format_tag::nc},
    engine);

oneDNN focuses on low-level, high-performance primitives for deep learning, while TensorFlow provides a higher-level API for building and training neural networks. TensorFlow offers more abstraction and ease of use for general machine learning tasks, whereas oneDNN is optimized for specific hardware and can be integrated into larger frameworks for performance gains.

88,135

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • Higher-level API for deep learning, easier for beginners
  • Dynamic computation graphs allow for more flexible model architectures
  • Extensive ecosystem with pre-built models and libraries

Cons of PyTorch

  • Generally slower performance compared to low-level libraries
  • Larger memory footprint for some operations
  • Less optimized for specific hardware architectures

Code Comparison

PyTorch example:

import torch

x = torch.tensor([1, 2, 3])
y = torch.tensor([4, 5, 6])
z = torch.matmul(x, y)

oneDNN example:

#include <dnnl.hpp>

dnnl::engine eng(dnnl::engine::kind::cpu, 0);
dnnl::memory::dims dims = {3};
auto x = dnnl::memory({dims, dnnl::memory::data_type::f32, eng});
auto y = dnnl::memory({dims, dnnl::memory::data_type::f32, eng});
auto z = dnnl::memory({dims, dnnl::memory::data_type::f32, eng});

PyTorch provides a more intuitive, high-level API for deep learning tasks, while oneDNN offers lower-level control and potentially better performance for specific hardware configurations. PyTorch is more suitable for research and rapid prototyping, whereas oneDNN is better for production environments requiring fine-tuned performance.

11,937

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Pros of TVM

  • Broader scope: Supports a wide range of hardware targets and ML frameworks
  • Automatic optimization: Uses machine learning to optimize tensor operations
  • Extensibility: Easily customizable for new hardware and algorithms

Cons of TVM

  • Steeper learning curve: More complex to use due to its broader scope
  • Potentially slower for specific use cases: May not be as optimized as oneDNN for certain Intel hardware

Code Comparison

TVM example:

import tvm
from tvm import te
A = te.placeholder((1000, 1000), name='A')
B = te.compute((1000, 1000), lambda i, j: A[i, j] * 2, name='B')
s = te.create_schedule(B.op)

oneDNN example:

#include "oneapi/dnnl/dnnl.hpp"
using namespace dnnl;
auto eng = engine(engine::kind::cpu, 0);
auto src_md = memory::desc({2, 2}, memory::data_type::f32, memory::format_tag::ab);
auto dst_md = memory::desc({2, 2}, memory::data_type::f32, memory::format_tag::ab);

Both libraries provide high-performance implementations for deep learning operations, but TVM focuses on a more general approach to tensor computation optimization across various hardware targets, while oneDNN specializes in optimized primitives for Intel architectures.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

  • Broader support for various ML frameworks and hardware accelerators
  • Easier integration with existing ML pipelines and models
  • More extensive documentation and community support

Cons of ONNX Runtime

  • Potentially higher overhead for simple models or specific use cases
  • Less fine-grained control over low-level optimizations
  • May require additional steps for model conversion and compatibility

Code Comparison

oneDNN example:

dnnl::engine eng(dnnl::engine::kind::cpu, 0);
dnnl::memory::dims dims = {2, 2, 3, 3};
auto src_md = dnnl::memory::desc(dims, dnnl::memory::data_type::f32, dnnl::memory::format_tag::nchw);
auto src_mem = dnnl::memory(src_md, eng);

ONNX Runtime example:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
result = session.run([output_name], {input_name: input_data})

Both libraries aim to optimize deep learning performance, but oneDNN focuses on low-level primitives, while ONNX Runtime provides a higher-level interface for running models across different frameworks and hardware.

7,173

CUDA Templates for Linear Algebra Subroutines

Pros of CUTLASS

  • Highly optimized for NVIDIA GPUs, offering excellent performance on supported hardware
  • Extensive support for various data types and tensor operations
  • Flexible template-based design allows for easy customization and integration

Cons of CUTLASS

  • Limited to NVIDIA GPUs, lacking support for other hardware platforms
  • Steeper learning curve due to its template-heavy C++ implementation
  • May require more manual optimization compared to oneDNN's automated approach

Code Comparison

CUTLASS example (matrix multiplication):

using Gemm = cutlass::gemm::device::Gemm<float, cutlass::layout::RowMajor,
                                         float, cutlass::layout::ColumnMajor,
                                         float, cutlass::layout::RowMajor>;
Gemm gemm_op;
gemm_op(args...);

oneDNN example (matrix multiplication):

auto matmul_pd = matmul::primitive_desc(engine, matmul_d);
auto matmul = matmul(matmul_pd);
matmul.execute(stream, args);

Both libraries offer high-performance implementations for deep learning and linear algebra operations. CUTLASS excels in NVIDIA GPU environments with its flexible, template-based design, while oneDNN provides a more portable solution with broader hardware support and easier integration across different platforms.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

UXL Foundation Logo

oneAPI Deep Neural Network Library (oneDNN)

OpenSSF Best Practices OpenSSF Scorecard

oneAPI Deep Neural Network Library (oneDNN) is an open-source cross-platform performance library of basic building blocks for deep learning applications. oneDNN project is part of the UXL Foundation and is an implementation of the oneAPI specification for oneDNN component.

The library is optimized for Intel(R) Architecture Processors, Intel Graphics, and Arm(R) 64-bit Architecture (AArch64)-based processors. oneDNN has experimental support for the following architectures: NVIDIA* GPU, AMD* GPU, OpenPOWER* Power ISA (PPC64), IBMz* (s390x), and RISC-V.

oneDNN is intended for deep learning applications and framework developers interested in improving application performance on CPUs and GPUs.

Deep learning practitioners should use one of the applications enabled with oneDNN:

Table of Contents

Documentation

  • oneDNN Developer Guide and Reference explains the programming model, supported functionality, implementation details, and includes annotated examples.
  • API Reference provides a comprehensive reference of the library API.
  • Release Notes explains the new features, performance optimizations, and improvements implemented in each version of oneDNN.

System Requirements

oneDNN supports platforms based on the following architectures:

WARNING

Power ISA (PPC64), IBMz (s390x), and RISC-V (RV64) support is experimental with limited testing validation.

The library is optimized for the following CPUs:

  • Intel 64/AMD64 architecture
    • Intel Atom(R) processor (at least Intel SSE4.1 support is required)
    • Intel Core(TM) processor (at least Intel SSE4.1 support is required)
    • Intel Xeon(R) processor E3, E5, and E7 family (formerly Sandy Bridge, Ivy Bridge, Haswell, and Broadwell)
    • Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, Ice Lake, Sapphire Rapids, and Emerald Rapids)
    • Intel Xeon CPU Max Series (formerly Sapphire Rapids HBM)
    • Intel Core Ultra processors (formerly Meteor Lake, Arrow Lake, and Lunar Lake)
    • Intel Xeon 6 processors (formerly Sierra Forest and Granite Rapids)
  • AArch64 architecture
    • Arm Neoverse(TM) N1 and V1 processors

On a CPU based on Intel 64 or on AMD64 architecture, oneDNN detects the instruction set architecture (ISA) at runtime and uses just-in-time (JIT) code generation to deploy the code optimized for the latest supported ISA. Future ISAs may have initial support in the library disabled by default and require the use of run-time controls to enable them. See CPU dispatcher control for more details.

WARNING

On macOS, applications that use oneDNN may need to request special entitlements if they use the hardened runtime. See the Linking Guide for more details.

The library is optimized for the following GPUs:

  • Intel Graphics for 11th-14th Generation Intel Core Processors
  • Intel Iris Xe MAX Graphics (formerly DG1)
  • Intel Arc(TM) graphics (formerly Alchemist)
  • Intel Data Center GPU Flex Series (formerly Arctic Sound)
  • Intel Data Center GPU Max Series (formerly Ponte Vecchio)
  • Intel Graphics and Intel Arc graphics for Intel Core Ultra processors (formerly Meteor Lake, Arrow Lake and Lunar Lake)
  • future Intel Arc graphics (code name Battlemage)

Requirements for Building from Source

oneDNN supports systems meeting the following requirements:

  • Operating system with Intel 64 / Arm 64 / Power / IBMz architecture support
  • C++ compiler with C++11 standard support
  • CMake 3.13 or later

The following tools are required to build oneDNN documentation:

Configurations of CPU and GPU engines may introduce additional build time dependencies.

CPU Engine

oneDNN CPU engine is used to execute primitives on Intel Architecture Processors, 64-bit Arm Architecture (AArch64) processors, 64-bit Power ISA (PPC64) processors, IBMz (s390x), and compatible devices.

The CPU engine is built by default but can be disabled at build time by setting DNNL_CPU_RUNTIME to NONE. In this case, GPU engine must be enabled. The CPU engine can be configured to use the OpenMP, TBB or SYCL runtime. The following additional requirements apply:

Some implementations rely on OpenMP 4.0 SIMD extensions. For the best performance results on Intel Architecture Processors we recommend using the Intel C++ Compiler.

On a CPU based on Arm AArch64 architecture, oneDNN CPU engine can be built with Arm Compute Library (ACL) integration. ACL is an open-source library for machine learning applications and provides AArch64 optimized implementations of core functions. This functionality currently requires that ACL is downloaded and built separately. See Build from Source section of the Developer Guide for details. oneDNN only supports Compute Library versions 24.11.1 or later.

GPU Engine

Intel Processor Graphics and Xe Architecture graphics are supported by the oneDNN GPU engine. The GPU engine is disabled in the default build configuration. The following additional requirements apply when GPU engine is enabled:

WARNING

Linux will reset GPU when kernel runtime exceeds several seconds. The user can prevent this behavior by disabling hangcheck for Intel GPU driver. Windows has built-in timeout detection and recovery mechanism that results in similar behavior. The user can prevent this behavior by increasing the TdrDelay value.

WARNING

NVIDIA GPU support is experimental. General information, build instructions, and implementation limitations are available in the NVIDIA backend readme.

WARNING

AMD GPU support is experimental. General information, build instructions, and implementation limitations are available in the AMD backend readme.

Runtime Dependencies

When oneDNN is built from source, the library runtime dependencies and specific versions are defined by the build environment.

Linux

Common dependencies:

  • GNU C Library (libc.so)
  • GNU Standard C++ Library v3 (libstdc++.so)
  • Dynamic Linking Library (libdl.so)
  • C Math Library (libm.so)
  • POSIX Threads Library (libpthread.so)

Runtime-specific dependencies:

Runtime configurationCompilerDependency
DNNL_CPU_RUNTIME=OMPGCCGNU OpenMP runtime (libgomp.so)
DNNL_CPU_RUNTIME=OMPIntel C/C++ CompilerIntel OpenMP runtime (libiomp5.so)
DNNL_CPU_RUNTIME=OMPClangIntel OpenMP runtime (libiomp5.so)
DNNL_CPU_RUNTIME=TBBanyTBB (libtbb.so)
DNNL_CPU_RUNTIME=SYCLIntel oneAPI DPC++ CompilerIntel oneAPI DPC++ Compiler runtime (libsycl.so), TBB (libtbb.so), OpenCL loader (libOpenCL.so)
DNNL_GPU_RUNTIME=OCLanyOpenCL loader (libOpenCL.so)
DNNL_GPU_RUNTIME=SYCLIntel oneAPI DPC++ CompilerIntel oneAPI DPC++ Compiler runtime (libsycl.so), OpenCL loader (libOpenCL.so), oneAPI Level Zero loader (libze_loader.so)

Windows

Common dependencies:

  • Microsoft Visual C++ Redistributable (msvcrt.dll)

Runtime-specific dependencies:

Runtime configurationCompilerDependency
DNNL_CPU_RUNTIME=OMPMicrosoft Visual C++ CompilerNo additional requirements
DNNL_CPU_RUNTIME=OMPIntel C/C++ CompilerIntel OpenMP runtime (iomp5.dll)
DNNL_CPU_RUNTIME=TBBanyTBB (tbb.dll)
DNNL_CPU_RUNTIME=SYCLIntel oneAPI DPC++ CompilerIntel oneAPI DPC++ Compiler runtime (sycl.dll), TBB (tbb.dll), OpenCL loader (OpenCL.dll)
DNNL_GPU_RUNTIME=OCLanyOpenCL loader (OpenCL.dll)
DNNL_GPU_RUNTIME=SYCLIntel oneAPI DPC++ CompilerIntel oneAPI DPC++ Compiler runtime (sycl.dll), OpenCL loader (OpenCL.dll), oneAPI Level Zero loader (ze_loader.dll)

macOS

Common dependencies:

  • System C/C++ runtime (libc++.dylib, libSystem.dylib)

Runtime-specific dependencies:

Runtime configurationCompilerDependency
DNNL_CPU_RUNTIME=OMPIntel C/C++ CompilerIntel OpenMP runtime (libiomp5.dylib)
DNNL_CPU_RUNTIME=TBBanyTBB (libtbb.dylib)

Installation

You can download and install the oneDNN library using one of the following options:

Validated Configurations

x86-64 CPU engine was validated on RedHat* Enterprise Linux 8 with

on Windows Server* 2019 with

on macOS 11 (Big Sur) with

  • Apple LLVM version 13.0

AArch64 CPU engine was validated on Ubuntu 22.04 with

on macOS 14 (Sonoma) with

  • Apple LLVM version 15.0

GPU engine was validated on Ubuntu* 22.04 with

on Windows Server 2019 with

Support

Submit questions, feature requests, and bug reports on the GitHub issues page.

You can also contact oneDNN developers via UXL Foundation Slack using #onednn channel.

Governance

oneDNN project is governed by the UXL Foundation and you can get involved in this project in multiple ways. It is possible to join the AI Special Interest Group (SIG) meetings where the groups discuss and demonstrate work using this project. Members can also join the Open Source and Specification Working Group meetings.

You can also join the mailing lists for the UXL Foundation to be informed of when meetings are happening and receive the latest information and discussions.

Contributing

We welcome community contributions to oneDNN. You can find the oneDNN release schedule and work already in progress towards future milestones in Github's Milestones section. If you are looking for a specific task to start, consider selecting from issues that are marked with the help wanted label.

See contribution guidelines to start contributing to oneDNN. You can also contact oneDNN developers and maintainers via UXL Foundation Slack using #onednn channel.

This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

oneDNN is licensed under Apache License Version 2.0. Refer to the "LICENSE" file for the full license text and copyright notice.

This distribution includes third party software governed by separate license terms.

3-clause BSD license:

2-clause BSD license:

Apache License Version 2.0:

Boost Software License, Version 1.0:

MIT License:

This third-party software, even if included with the distribution of the Intel software, may be governed by separate license terms, including without limitation,third party license terms, other Intel software license terms, and open source software license terms. These separate license terms govern your use of the third party programs as set forth in the "THIRD-PARTY-PROGRAMS" file.

Security

Security Policy outlines our guidelines and procedures for ensuring the highest level of security and trust for our users who consume oneDNN.

Trademark Information

Intel, the Intel logo, Arc, Intel Atom, Intel Core, Iris, OpenVINO, the OpenVINO logo, Pentium, VTune, and Xeon are trademarks of Intel Corporation or its subsidiaries.

Arm and Neoverse are trademarks, or registered trademarks of Arm Ltd.

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.

(C) Intel Corporation