FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

1,220

516

1,220

418

View on GitHub

Top Related Projects

faiss

32,359

A library for efficient similarity search and clustering of dense vectors.

gemmlowp

1,782

Low-precision matrix multiplication

XNNPACK

1,929

High-efficiency floating-point neural network inference operators for mobile, server, and Web

onnxruntime

15,345

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

tensorflow

186,879

An Open Source Machine Learning Framework for Everyone

oneDNN

3,648

oneAPI Deep Neural Network Library (oneDNN)

Quick Overview

FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplication and convolution library for server-side inference. It is optimized for x86 CPUs and focuses on delivering efficient performance for quantized neural networks.

Pros

Highly optimized for x86 architectures, providing excellent performance for server-side inference
Supports various low-precision data types, enabling efficient quantized neural network computations
Integrates well with PyTorch, allowing seamless use in deep learning workflows
Includes specialized kernels for common operations in neural networks, such as fully connected layers and convolutions

Cons

Limited to x86 architectures, not suitable for other platforms like ARM or GPUs
Requires expertise in low-level optimization and quantization techniques for optimal usage
May have a steeper learning curve compared to higher-level deep learning libraries
Documentation could be more comprehensive for newcomers to the project

Code Examples

Basic matrix multiplication:

#include <fbgemm/Fbgemm.h>

// Initialize matrices
fbgemm::matrix_op_t trans_A = fbgemm::matrix_op_t::NoTranspose;
fbgemm::matrix_op_t trans_B = fbgemm::matrix_op_t::NoTranspose;
int m = 5, n = 6, k = 4;
std::vector<float> A(m * k, 1.0f);
std::vector<float> B(k * n, 2.0f);
std::vector<float> C(m * n, 0.0f);

// Perform matrix multiplication
fbgemm::cblas_sgemm_ref(trans_A, trans_B, m, n, k, 1.0f, A.data(), k, B.data(), n, 0.0f, C.data(), n);

Quantized fully connected layer:

#include <fbgemm/FbgemmI8Spmdm.h>

// Initialize weights and input
int m = 128, n = 512, k = 256;
std::vector<uint8_t> A(m * k);
std::vector<int8_t> B(k * n);
std::vector<int32_t> C(m * n);

// Create and execute the fully connected layer
fbgemm::PackBMatrix<int8_t> packB(fbgemm::matrix_op_t::NoTranspose, k, n, B.data(), n);
fbgemm::DoNothing<int32_t, int32_t> doNothingObj{};
fbgemm::ReQuantizeForFloat<false> outputProcObj(doNothingObj);
fbgemm::fbgemmPacked(m, n, k, A.data(), k, packB, C.data(), n, nullptr, 1, outputProcObj, 0, 1);

Convolution operation:

#include <fbgemm/FbgemmI8DepthwiseAvx2.h>

// Initialize convolution parameters
int N = 1, IC = 64, OC = 64, H = 28, W = 28, G = 64, R = 3, S = 3;
std::vector<uint8_t> A(N * H * W * IC);
std::vector<int8_t> B(OC * R * S);
std::vector<int32_t> C(N * H * W * OC);

// Perform depthwise convolution
fbgemm::depthwise_2d_same_pad<QuantizationGranularity::TENSOR, inst_set_t::avx2>(
    N, H, W, IC, OC, R, S, G, A.data(), B.data(), C.data(), nullptr, nullptr);

Getting Started

To use FBGEMM in your project:

Clone the repository:

git clone https://github.com/pytorch/FBGEMM.git

Build FBGEMM:
```
cd FBGEMM
mkdir buil
```

Competitor Comparisons

faiss

32,359

A library for efficient similarity search and clustering of dense vectors.

Pros of Faiss

Specialized for efficient similarity search and clustering of dense vectors
Supports GPU acceleration for faster processing of large datasets
Offers a wide range of indexing algorithms for different use cases

Cons of Faiss

More focused on vector search, less versatile for general matrix operations
May require more setup and configuration for specific use cases
Limited integration with deep learning frameworks compared to FBGEMM

Code Comparison

FBGEMM (matrix multiplication):

fbgemm::cblas_gemm_compute(
    matrix_op_t::NoTranspose, matrix_op_t::NoTranspose,
    m, n, k, A, lda, B, ldb, beta, C, ldc);

Faiss (vector search):

index = faiss.IndexFlatL2(d)
index.add(xb)
D, I = index.search(xq, k)

FBGEMM focuses on optimized matrix operations for deep learning, while Faiss specializes in efficient similarity search and clustering of dense vectors. FBGEMM is more tightly integrated with PyTorch and offers broader support for various quantization schemes. Faiss, on the other hand, excels in vector search tasks and provides GPU acceleration for large-scale operations. The choice between the two depends on the specific requirements of your project, whether it's optimizing deep learning computations or performing efficient similarity searches.

gemmlowp

1,782

Low-precision matrix multiplication

Pros of gemmlowp

Designed specifically for low-precision GEMM operations
Highly portable, works on various platforms including mobile devices
Extensive documentation and examples for ease of use

Cons of gemmlowp

Limited to integer arithmetic, not suitable for floating-point operations
Less actively maintained compared to FBGEMM
Narrower focus on GEMM operations, while FBGEMM offers a broader range of optimizations

Code Comparison

gemmlowp:

#include "gemmlowp/public/gemmlowp.h"

typedef gemmlowp::MatrixMap<const std::uint8_t, gemmlowp::MapOrder::RowMajor> InputMap;
typedef gemmlowp::MatrixMap<std::int32_t, gemmlowp::MapOrder::RowMajor> OutputMap;

gemmlowp::GemmContext context;
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::int32_t, gemmlowp::DefaultL8R8BitDepthParams>(
    &context, lhs, rhs, &result, lhs_offset, rhs_offset, output_pipeline);

FBGEMM:

#include "fbgemm/FbgemmI8DepthwiseAvx2.h"

fbgemm::conv_param_t<2> conv_p(
    1, // output channels
    3, // kernel height
    3, // kernel width
    1, // stride height
    1, // stride width
    0, // pad height
    0  // pad width
);

fbgemm::depthwise_2d_same_pad<QuantizationGranularity::TENSOR>(
    conv_p, A_zero_point, A, B_zero_point, B, C_multiplier, C_zero_point, C, act_times_w_scale);

XNNPACK

1,929

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Pros of XNNPACK

Broader platform support, including mobile and web
More extensive documentation and examples
Actively maintained with frequent updates

Cons of XNNPACK

Less optimized for high-performance server-side inference
Smaller community and ecosystem compared to FBGEMM
Limited support for quantization techniques

Code Comparison

XNNPACK example:

xnn_initialize(NULL);
xnn_operator_t conv_op = NULL;
xnn_status status = xnn_create_convolution2d_nhwc_f32(
  /* ... parameters ... */
  &conv_op);

FBGEMM example:

fbgemm::conv_param_t<2> conv_p(
  /* ... parameters ... */
);
fbgemm::PackWeightsForConv<2> pack_w(conv_p);
fbgemm::ConvFastPath<float, int32_t, float> conv(conv_p);

Both libraries provide low-level optimizations for neural network operations, but XNNPACK focuses on broader platform support and ease of use, while FBGEMM emphasizes high-performance server-side inference. XNNPACK offers more extensive documentation and examples, making it more accessible for developers. However, FBGEMM provides better optimization for specific use cases, particularly in PyTorch integration and quantization techniques.

onnxruntime

15,345

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

Broader ecosystem support and compatibility with multiple frameworks
Extensive optimization capabilities for various hardware platforms
Robust production-ready deployment options

Cons of ONNX Runtime

Potentially higher overhead for simple models or specific use cases
Less specialized for Facebook-specific optimizations compared to FBGEMM

Code Comparison

ONNX Runtime example:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

FBGEMM example:

#include "fbgemm/FbgemmI8DepthwiseAvx2.h"

fbgemm::depthwise_3x3_pad_1(
    N, H, W, IC, OC, stride_h, stride_w,
    A_zero_point, A, B_zero_point, B, C_multiplier, C_zero_point, C);

ONNX Runtime offers a higher-level API for general inference, while FBGEMM provides low-level optimized functions for specific operations. ONNX Runtime is more versatile across different models and frameworks, whereas FBGEMM is tailored for Facebook's specific use cases and optimizations.

tensorflow

186,879

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

Broader ecosystem with more tools and libraries
Better support for production deployment and serving models
More extensive documentation and community resources

Cons of TensorFlow

Steeper learning curve for beginners
Less dynamic and flexible than PyTorch for research and experimentation
Slower development cycle for new features

Code Comparison

FBGEMM (PyTorch):

import fbgemm_gpu
tensor = fbgemm_gpu.new_empty_tensor(torch.device("cuda"), [2, 3], dtype=torch.float32)

TensorFlow:

import tensorflow as tf
tensor = tf.zeros([2, 3], dtype=tf.float32)

Summary

FBGEMM is a specialized library for optimizing matrix multiplication and convolution operations, primarily used within PyTorch. TensorFlow, on the other hand, is a comprehensive machine learning framework with a wider range of applications. While FBGEMM focuses on performance optimizations for specific operations, TensorFlow offers a more complete ecosystem for developing and deploying machine learning models. The choice between the two depends on the specific requirements of your project and your familiarity with each framework's ecosystem.

oneDNN

3,648

oneAPI Deep Neural Network Library (oneDNN)

Pros of oneDNN

Broader hardware support, including CPUs, GPUs, and FPGAs
More comprehensive set of deep learning primitives and operations
Better integration with other Intel oneAPI tools and libraries

Cons of oneDNN

Potentially more complex setup and configuration for non-Intel hardware
May have less specialized optimizations for Facebook-specific workloads

Code Comparison

FBGEMM (C++):

fbgemm::PackAMatrix<int8_t> packA(
    matrix_op_t::NoTranspose, M, K, A, K, nullptr, 1);
fbgemm::PackBMatrix<int8_t> packB(
    matrix_op_t::NoTranspose, K, N, B, N, nullptr, 1);

oneDNN (C++):

auto src_md = memory::desc({N, IC, IH, IW}, memory::data_type::f32, memory::format_tag::nhwc);
auto weights_md = memory::desc({OC, IC, KH, KW}, memory::data_type::f32, memory::format_tag::ohwi);
auto conv_desc = convolution_forward::desc(prop_kind::forward_inference, algorithm::convolution_direct,
    src_md, weights_md, dst_md, strides, padding_l, padding_r);

Both libraries provide optimized primitives for deep learning workloads, but FBGEMM focuses more on quantized operations for Facebook's specific needs, while oneDNN offers a broader range of primitives and hardware support within the Intel ecosystem.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

FBGEMM

FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference.

The library provides efficient low-precision general matrix multiplication for small batch sizes and support for accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization. FBGEMM also exploits fusion opportunities in order to overcome the unique challenges of matrix multiplication at lower precision with bandwidth-bound operations.

FBGEMM is used as a backend of PyTorch quantized operators for x86 machines:

PyTorch: https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/native/quantized/cpu

See the full Documentation for more information on building, installing, and developing with FBGEMM, as well as the most up-to-date support matrix and API documentation for this library.

What's New?

New Features and Recent Improvements (January, 2020)

Citation

For a high-level overview, design philosophy and brief descriptions of various parts of FBGEMM please see our blog post.

For those looking for the appropriate article to cite regarding FBGEMM, we recommend citing our paper:

@article{fbgemm,
  title={FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference},
  author={Khudia, Daya and Huang, Jianyu and Basu, Protonu and Deng, Summer and Liu, Haixin and Park, Jongsoo and Smelyanskiy, Mikhail},
  journal={arXiv preprint arXiv:2101.05615},
  year={2021}
}

Join the FBGEMM community

For questions, support, news updates, or feature requests, please feel free to:

File a ticket in GitHub Issues
Post a discussion in GitHub Discussions
Reach out to us on the #fbgemm channel in PyTorch Slack

For contributions, please see the CONTRIBUTING file for ways to help out.

License

FBGEMM is BSD licensed, as found in the LICENSE file.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot