Convert Figma logo to code with AI

google logoXNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web

1,845
356
1,845
152

Top Related Projects

185,446

An Open Source Machine Learning Framework for Everyone

82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

8,579

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

20,298

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Quick Overview

XNNPACK is a highly optimized library for neural network inference acceleration on a variety of hardware platforms, including CPUs, GPUs, and specialized AI accelerators. It provides a set of low-level, hardware-agnostic kernels for common neural network operations, which can be used to build high-performance inference pipelines.

Pros

  • High Performance: XNNPACK is designed to deliver state-of-the-art performance on a wide range of hardware, leveraging platform-specific optimizations and low-level hardware features.
  • Hardware Agnostic: The library provides a hardware-agnostic API, allowing developers to easily integrate it into their projects without worrying about the underlying hardware.
  • Extensive Kernel Support: XNNPACK supports a comprehensive set of neural network operations, including convolution, pooling, activation functions, and more, making it a versatile choice for various neural network architectures.
  • Open-Source: XNNPACK is an open-source project, allowing developers to contribute, customize, and extend the library to fit their specific needs.

Cons

  • Complexity: The library's extensive feature set and hardware-specific optimizations can make it challenging for newcomers to understand and integrate into their projects.
  • Dependency Management: XNNPACK relies on several external dependencies, which can add complexity to the build and deployment process, especially in constrained environments.
  • Limited Documentation: While the project has good technical documentation, the overall user experience and getting started guides could be improved to make it more accessible to a wider audience.
  • Licensing: XNNPACK is licensed under the Apache License 2.0, which may not be compatible with all project requirements.

Code Examples

Here are a few examples of how to use XNNPACK in your projects:

  1. Performing Convolution:
#include <xnnpack.h>

void convolution_example() {
    const size_t input_channels = 3;
    const size_t output_channels = 16;
    const size_t kernel_size = 3;
    const size_t batch_size = 1;
    const size_t input_height = 224;
    const size_t input_width = 224;

    // Allocate memory for input, weights, and output
    float input[batch_size * input_channels * input_height * input_width];
    float weights[output_channels * input_channels * kernel_size * kernel_size];
    float output[batch_size * output_channels * (input_height - kernel_size + 1) * (input_width - kernel_size + 1)];

    // Set up XNNPACK convolution parameters
    xnn_status status = xnn_initialize(NULL);
    xnn_operator_t convolution_op;
    xnn_status create_status = xnn_create_convolution2d_nhwc_f32(
        0, output_channels, kernel_size, kernel_size,
        1, 1, 0, 0, 0, 0,
        input_channels, output_channels, kernel_size * kernel_size,
        xnn_init_f32_conv_minmax_params(0.0f, 6.0f),
        &convolution_op);

    // Execute the convolution
    xnn_status run_status = xnn_run_operator(convolution_op, input, weights, output, NULL);

    // Clean up
    xnn_delete_operator(convolution_op);
    xnn_deinitialize();
}
  1. Performing Pooling:
#include <xnnpack.h>

void pooling_example() {
    const size_t batch_size = 1;
    const size_t channels = 16;
    const size_t input_height = 224;
    const size_t input_width = 224;

    // Allocate memory for input and output
    float input[batch_size * channels * input_height * input_width];
    float output[batch_size * channels * (input_height / 2) * (input_width / 2)];

    // Set up XNNPACK pooling parameters
    xnn_status status = xnn_initialize(NULL);

Competitor Comparisons

185,446

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Comprehensive ecosystem with high-level APIs, tools, and extensive documentation
  • Supports a wide range of platforms and devices
  • Large community and extensive third-party library support

Cons of TensorFlow

  • Steeper learning curve for beginners
  • Can be slower for certain operations compared to XNNPACK's optimized kernels
  • Larger footprint and resource requirements

Code Comparison

XNNPACK (low-level operator implementation):

void xnn_f32_vmax_ukernel__avx_x8(
    size_t n,
    const float* a,
    const float* b,
    float* y,
    const union xnn_f32_output_params params[restrict static 1])
{
  // Implementation details...
}

TensorFlow (high-level API usage):

import tensorflow as tf

a = tf.constant([1.0, 2.0, 3.0])
b = tf.constant([4.0, 5.0, 6.0])
result = tf.maximum(a, b)

XNNPACK focuses on low-level, optimized kernels for neural network operators, while TensorFlow provides a high-level API for building and training machine learning models. XNNPACK is more suitable for performance-critical, embedded applications, whereas TensorFlow offers a more comprehensive solution for general machine learning tasks.

82,049

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • Comprehensive deep learning framework with high-level APIs
  • Large ecosystem and community support
  • Flexible and dynamic computational graph

Cons of PyTorch

  • Larger footprint and resource requirements
  • Steeper learning curve for beginners
  • Less optimized for mobile and edge devices

Code Comparison

XNNPACK (low-level operator implementation):

xnn_status xnn_create_convolution2d_nhwc_f32(
    uint32_t input_padding_top,
    uint32_t input_padding_right,
    uint32_t input_padding_bottom,
    uint32_t input_padding_left,
    uint32_t kernel_height,
    uint32_t kernel_width,
    // ... (additional parameters)
)

PyTorch (high-level API):

import torch.nn as nn

conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding)
output = conv(input)

XNNPACK focuses on low-level, highly optimized implementations of neural network operators, particularly for mobile and edge devices. It provides fine-grained control over performance-critical parameters.

PyTorch offers a more user-friendly, high-level API for building and training neural networks. It abstracts away many low-level details, making it easier to prototype and experiment with different architectures.

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Pros of coremltools

  • Specifically designed for Apple platforms, offering seamless integration with iOS, macOS, and other Apple devices
  • Provides tools for converting models from various frameworks (TensorFlow, PyTorch, etc.) to Core ML format
  • Includes features for model optimization and quantization tailored for Apple hardware

Cons of coremltools

  • Limited to Apple ecosystem, lacking cross-platform support
  • May have a steeper learning curve for developers not familiar with Apple's ML ecosystem
  • Less flexible for low-level optimizations compared to XNNPACK

Code Comparison

XNNPACK (C++):

xnn_status xnn_initialize(const xnn_init_flags flags) {
  if (flags & ~(XNN_INIT_FLAG_XNNPACK | XNN_INIT_FLAG_SPARSE)) {
    xnn_log_error("Invalid initialization flags: %#x", flags);
    return xnn_status_invalid_parameter;
  }
  // ... (implementation continues)
}

coremltools (Python):

import coremltools as ct

model = ct.convert(keras_model, source='keras')
model.save('my_model.mlmodel')

The code snippets highlight the different approaches: XNNPACK focuses on low-level initialization and optimization, while coremltools emphasizes high-level model conversion and deployment for Apple platforms.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

  • Broader ecosystem support with ONNX format compatibility
  • More comprehensive, supporting a wider range of ML models and operations
  • Cross-platform support for various hardware accelerators (CPU, GPU, etc.)

Cons of ONNX Runtime

  • Larger footprint and potentially higher resource usage
  • May have more overhead for simpler models or specific use cases

Code Comparison

XNNPACK (C):

xnn_status xnn_initialize(const struct xnn_allocator* allocator);
xnn_status xnn_create_convolution2d_nhwc_f32(...);
xnn_status xnn_setup_convolution2d_nhwc_f32(...);

ONNX Runtime (C++):

Ort::Env env;
Ort::Session session(env, model_path, session_options);
auto output_tensors = session.Run(run_options, input_names, input_tensors, output_names);

XNNPACK focuses on low-level, high-performance neural network operators, while ONNX Runtime provides a higher-level interface for running entire ML models. XNNPACK is more suitable for fine-grained control and optimization of specific operations, whereas ONNX Runtime offers a more comprehensive solution for deploying and running various ML models across different platforms.

8,579

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

Pros of MNN

  • Supports a wider range of platforms, including mobile, embedded, and IoT devices
  • Offers a comprehensive set of tools for model conversion, visualization, and benchmarking
  • Provides higher-level APIs for easier integration and usage

Cons of MNN

  • Less specialized for low-level optimizations compared to XNNPACK
  • May have a steeper learning curve due to its broader feature set
  • Potentially larger binary size due to more comprehensive functionality

Code Comparison

XNNPACK (C++):

xnn_status xnn_initialize(const xnn_init_flags flags);
xnn_status xnn_create_convolution2d_nhwc_f32(...);
xnn_status xnn_setup_convolution2d_nhwc_f32(...);
xnn_status xnn_run_operator(xnn_operator_t op, pthreadpool_t threadpool);

MNN (C++):

auto net = std::shared_ptr<MNN::Interpreter>(MNN::Interpreter::createFromFile(modelFile));
auto session = net->createSession(config);
net->runSession(session);
auto tensor = net->getSessionOutput(session, "output");

Both libraries offer efficient neural network inference, but XNNPACK focuses on low-level optimizations for specific operations, while MNN provides a more comprehensive solution with higher-level abstractions. XNNPACK may be preferred for fine-grained control and performance optimization, whereas MNN offers a more user-friendly approach with broader platform support and tools for end-to-end deployment.

20,298

ncnn is a high-performance neural network inference framework optimized for the mobile platform

Pros of ncnn

  • Broader platform support, including mobile and embedded devices
  • More comprehensive model conversion tools
  • Larger community and ecosystem, with more pre-trained models available

Cons of ncnn

  • Generally slower performance compared to XNNPACK
  • Less focus on low-precision inference optimizations
  • More complex API and setup process

Code Comparison

XNNPACK example (C++):

xnn_initialize(nullptr);
xnn_operator_t conv_op = nullptr;
xnn_status status = xnn_create_convolution2d_nhwc_f32(
    /* ... parameters ... */
    &conv_op);

ncnn example (C++):

ncnn::Net net;
net.load_param("model.param");
net.load_model("model.bin");
ncnn::Mat in(224, 224, 3);
ncnn::Mat out;
net.extract("output", out);

Both libraries provide efficient neural network inference, but XNNPACK focuses on low-level optimizations for specific hardware, while ncnn offers a higher-level API with broader device support. XNNPACK generally provides better performance, especially for low-precision operations, while ncnn offers more flexibility and easier integration for a wider range of applications and platforms.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

XNNPACK

XNNPACK is a highly optimized solution for neural network inference on ARM, x86, WebAssembly, and RISC-V platforms. XNNPACK is not intended for direct use by deep learning practitioners and researchers; instead it provides low-level performance primitives for accelerating high-level machine learning frameworks, such as TensorFlow Lite, TensorFlow.js, PyTorch, ONNX Runtime, and MediaPipe.

Supported Architectures

  • ARM64 on Android, iOS, macOS, Linux, and Windows
  • ARMv7 (with NEON) on Android
  • ARMv6 (with VFPv2) on Linux
  • x86 and x86-64 (up to AVX512) on Windows, Linux, macOS, Android, and iOS simulator
  • WebAssembly MVP
  • WebAssembly SIMD
  • WebAssembly Relaxed SIMD (experimental)
  • RISC-V (RV32GC and RV64GC)

Operator Coverage

XNNPACK implements the following neural network operators:

  • 2D Convolution (including grouped and depthwise)
  • 2D Deconvolution (AKA Transposed Convolution)
  • 2D Average Pooling
  • 2D Max Pooling
  • 2D ArgMax Pooling (Max Pooling + indices)
  • 2D Unpooling
  • 2D Bilinear Resize
  • 2D Depth-to-Space (AKA Pixel Shuffle)
  • Add (including broadcasting, two inputs only)
  • Subtract (including broadcasting)
  • Divide (including broadcasting)
  • Maximum (including broadcasting)
  • Minimum (including broadcasting)
  • Multiply (including broadcasting)
  • Squared Difference (including broadcasting)
  • Global Average Pooling
  • Channel Shuffle
  • Fully Connected
  • Abs (absolute value)
  • Bankers' Rounding (rounding to nearest, ties to even)
  • Ceiling (rounding to integer above)
  • Clamp (includes ReLU and ReLU6)
  • Convert (includes fixed-point and half-precision quantization and dequantization)
  • Copy
  • ELU
  • Floor (rounding to integer below)
  • HardSwish
  • Leaky ReLU
  • Negate
  • Sigmoid
  • Softmax
  • Square
  • Tanh
  • Transpose
  • Truncation (rounding to integer towards zero)
  • PReLU

All operators in XNNPACK support NHWC layout, but additionally allow custom stride along the Channel dimension. Thus, operators can consume a subset of channels in the input tensor, and produce a subset of channels in the output tensor, providing a zero-cost Channel Split and Channel Concatenation operations.

Performance

Mobile phones

The table below presents single-threaded performance of XNNPACK library on three generations of MobileNet models and three generations of Pixel phones.

ModelPixel, msPixel 2, msPixel 3a, ms
FP32 MobileNet v1 1.0X828688
FP32 MobileNet v2 1.0X495355
FP32 MobileNet v3 Large394244
FP32 MobileNet v3 Small121414

The following table presents multi-threaded (using as many threads as there are big cores) performance of XNNPACK library on three generations of MobileNet models and three generations of Pixel phones.

ModelPixel, msPixel 2, msPixel 3a, ms
FP32 MobileNet v1 1.0X432746
FP32 MobileNet v2 1.0X261828
FP32 MobileNet v3 Large221624
FP32 MobileNet v3 Small768

Benchmarked on March 27, 2020 with end2end_bench --benchmark_min_time=5 on an Android/ARM64 build with Android NDK r21 (bazel build -c opt --config android_arm64 :end2end_bench) and neural network models with randomized weights and inputs.

Raspberry Pi

The table below presents multi-threaded performance of XNNPACK library on three generations of MobileNet models and three generations of Raspberry Pi boards.

ModelRPi Zero W (BCM2835), msRPi 2 (BCM2836), msRPi 3+ (BCM2837B0), msRPi 4 (BCM2711), msRPi 4 (BCM2711, ARM64), ms
FP32 MobileNet v1 1.0X39193021147277
FP32 MobileNet v2 1.0X1987191794146
FP32 MobileNet v3 Large1658161673840
FP32 MobileNet v3 Small47450221315
INT8 MobileNet v1 1.0X2589128462924
INT8 MobileNet v2 1.0X149582302017

Benchmarked on Feb 8, 2022 with end2end-bench --benchmark_min_time=5 on a Raspbian Buster build with CMake (./scripts/build-local.sh) and neural network models with randomized weights and inputs. INT8 inference was evaluated on per-channel quantization schema.

Minimum build requirements

  • C11
  • C++14
  • Python 3

Publications

Ecosystem

Machine Learning Frameworks

Acknowledgements

XNNPACK is a based on QNNPACK library. Over time its codebase diverged a lot, and XNNPACK API is no longer compatible with QNNPACK.