FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Top Related Projects
A library for efficient similarity search and clustering of dense vectors.
Low-precision matrix multiplication
High-efficiency floating-point neural network inference operators for mobile, server, and Web
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
An Open Source Machine Learning Framework for Everyone
oneAPI Deep Neural Network Library (oneDNN)
Quick Overview
FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplication and convolution library for server-side inference. It is optimized for x86 CPUs and focuses on delivering efficient performance for quantized neural networks.
Pros
- Highly optimized for x86 architectures, providing excellent performance for server-side inference
- Supports various low-precision data types, enabling efficient quantized neural network computations
- Integrates well with PyTorch, allowing seamless use in deep learning workflows
- Includes specialized kernels for common operations in neural networks, such as fully connected layers and convolutions
Cons
- Limited to x86 architectures, not suitable for other platforms like ARM or GPUs
- Requires expertise in low-level optimization and quantization techniques for optimal usage
- May have a steeper learning curve compared to higher-level deep learning libraries
- Documentation could be more comprehensive for newcomers to the project
Code Examples
- Basic matrix multiplication:
#include <fbgemm/Fbgemm.h>
// Initialize matrices
fbgemm::matrix_op_t trans_A = fbgemm::matrix_op_t::NoTranspose;
fbgemm::matrix_op_t trans_B = fbgemm::matrix_op_t::NoTranspose;
int m = 5, n = 6, k = 4;
std::vector<float> A(m * k, 1.0f);
std::vector<float> B(k * n, 2.0f);
std::vector<float> C(m * n, 0.0f);
// Perform matrix multiplication
fbgemm::cblas_sgemm_ref(trans_A, trans_B, m, n, k, 1.0f, A.data(), k, B.data(), n, 0.0f, C.data(), n);
- Quantized fully connected layer:
#include <fbgemm/FbgemmI8Spmdm.h>
// Initialize weights and input
int m = 128, n = 512, k = 256;
std::vector<uint8_t> A(m * k);
std::vector<int8_t> B(k * n);
std::vector<int32_t> C(m * n);
// Create and execute the fully connected layer
fbgemm::PackBMatrix<int8_t> packB(fbgemm::matrix_op_t::NoTranspose, k, n, B.data(), n);
fbgemm::DoNothing<int32_t, int32_t> doNothingObj{};
fbgemm::ReQuantizeForFloat<false> outputProcObj(doNothingObj);
fbgemm::fbgemmPacked(m, n, k, A.data(), k, packB, C.data(), n, nullptr, 1, outputProcObj, 0, 1);
- Convolution operation:
#include <fbgemm/FbgemmI8DepthwiseAvx2.h>
// Initialize convolution parameters
int N = 1, IC = 64, OC = 64, H = 28, W = 28, G = 64, R = 3, S = 3;
std::vector<uint8_t> A(N * H * W * IC);
std::vector<int8_t> B(OC * R * S);
std::vector<int32_t> C(N * H * W * OC);
// Perform depthwise convolution
fbgemm::depthwise_2d_same_pad<QuantizationGranularity::TENSOR, inst_set_t::avx2>(
N, H, W, IC, OC, R, S, G, A.data(), B.data(), C.data(), nullptr, nullptr);
Getting Started
To use FBGEMM in your project:
-
Clone the repository:
git clone https://github.com/pytorch/FBGEMM.git
-
Build FBGEMM:
cd FBGEMM mkdir buil
Competitor Comparisons
A library for efficient similarity search and clustering of dense vectors.
Pros of Faiss
- Specialized for efficient similarity search and clustering of dense vectors
- Supports GPU acceleration for faster processing of large datasets
- Offers a wide range of indexing algorithms for different use cases
Cons of Faiss
- More focused on vector search, less versatile for general matrix operations
- May require more setup and configuration for specific use cases
- Limited integration with deep learning frameworks compared to FBGEMM
Code Comparison
FBGEMM (matrix multiplication):
fbgemm::cblas_gemm_compute(
matrix_op_t::NoTranspose, matrix_op_t::NoTranspose,
m, n, k, A, lda, B, ldb, beta, C, ldc);
Faiss (vector search):
index = faiss.IndexFlatL2(d)
index.add(xb)
D, I = index.search(xq, k)
FBGEMM focuses on optimized matrix operations for deep learning, while Faiss specializes in efficient similarity search and clustering of dense vectors. FBGEMM is more tightly integrated with PyTorch and offers broader support for various quantization schemes. Faiss, on the other hand, excels in vector search tasks and provides GPU acceleration for large-scale operations. The choice between the two depends on the specific requirements of your project, whether it's optimizing deep learning computations or performing efficient similarity searches.
Low-precision matrix multiplication
Pros of gemmlowp
- Designed specifically for low-precision GEMM operations
- Highly portable, works on various platforms including mobile devices
- Extensive documentation and examples for ease of use
Cons of gemmlowp
- Limited to integer arithmetic, not suitable for floating-point operations
- Less actively maintained compared to FBGEMM
- Narrower focus on GEMM operations, while FBGEMM offers a broader range of optimizations
Code Comparison
gemmlowp:
#include "gemmlowp/public/gemmlowp.h"
typedef gemmlowp::MatrixMap<const std::uint8_t, gemmlowp::MapOrder::RowMajor> InputMap;
typedef gemmlowp::MatrixMap<std::int32_t, gemmlowp::MapOrder::RowMajor> OutputMap;
gemmlowp::GemmContext context;
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::int32_t, gemmlowp::DefaultL8R8BitDepthParams>(
&context, lhs, rhs, &result, lhs_offset, rhs_offset, output_pipeline);
FBGEMM:
#include "fbgemm/FbgemmI8DepthwiseAvx2.h"
fbgemm::conv_param_t<2> conv_p(
1, // output channels
3, // kernel height
3, // kernel width
1, // stride height
1, // stride width
0, // pad height
0 // pad width
);
fbgemm::depthwise_2d_same_pad<QuantizationGranularity::TENSOR>(
conv_p, A_zero_point, A, B_zero_point, B, C_multiplier, C_zero_point, C, act_times_w_scale);
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Pros of XNNPACK
- Broader platform support, including mobile and web
- More extensive documentation and examples
- Actively maintained with frequent updates
Cons of XNNPACK
- Less optimized for high-performance server-side inference
- Smaller community and ecosystem compared to FBGEMM
- Limited support for quantization techniques
Code Comparison
XNNPACK example:
xnn_initialize(NULL);
xnn_operator_t conv_op = NULL;
xnn_status status = xnn_create_convolution2d_nhwc_f32(
/* ... parameters ... */
&conv_op);
FBGEMM example:
fbgemm::conv_param_t<2> conv_p(
/* ... parameters ... */
);
fbgemm::PackWeightsForConv<2> pack_w(conv_p);
fbgemm::ConvFastPath<float, int32_t, float> conv(conv_p);
Both libraries provide low-level optimizations for neural network operations, but XNNPACK focuses on broader platform support and ease of use, while FBGEMM emphasizes high-performance server-side inference. XNNPACK offers more extensive documentation and examples, making it more accessible for developers. However, FBGEMM provides better optimization for specific use cases, particularly in PyTorch integration and quantization techniques.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Pros of ONNX Runtime
- Broader ecosystem support and compatibility with multiple frameworks
- Extensive optimization capabilities for various hardware platforms
- Robust production-ready deployment options
Cons of ONNX Runtime
- Potentially higher overhead for simple models or specific use cases
- Less specialized for Facebook-specific optimizations compared to FBGEMM
Code Comparison
ONNX Runtime example:
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})
FBGEMM example:
#include "fbgemm/FbgemmI8DepthwiseAvx2.h"
fbgemm::depthwise_3x3_pad_1(
N, H, W, IC, OC, stride_h, stride_w,
A_zero_point, A, B_zero_point, B, C_multiplier, C_zero_point, C);
ONNX Runtime offers a higher-level API for general inference, while FBGEMM provides low-level optimized functions for specific operations. ONNX Runtime is more versatile across different models and frameworks, whereas FBGEMM is tailored for Facebook's specific use cases and optimizations.
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Broader ecosystem with more tools and libraries
- Better support for production deployment and serving models
- More extensive documentation and community resources
Cons of TensorFlow
- Steeper learning curve for beginners
- Less dynamic and flexible than PyTorch for research and experimentation
- Slower development cycle for new features
Code Comparison
FBGEMM (PyTorch):
import fbgemm_gpu
tensor = fbgemm_gpu.new_empty_tensor(torch.device("cuda"), [2, 3], dtype=torch.float32)
TensorFlow:
import tensorflow as tf
tensor = tf.zeros([2, 3], dtype=tf.float32)
Summary
FBGEMM is a specialized library for optimizing matrix multiplication and convolution operations, primarily used within PyTorch. TensorFlow, on the other hand, is a comprehensive machine learning framework with a wider range of applications. While FBGEMM focuses on performance optimizations for specific operations, TensorFlow offers a more complete ecosystem for developing and deploying machine learning models. The choice between the two depends on the specific requirements of your project and your familiarity with each framework's ecosystem.
oneAPI Deep Neural Network Library (oneDNN)
Pros of oneDNN
- Broader hardware support, including CPUs, GPUs, and FPGAs
- More comprehensive set of deep learning primitives and operations
- Better integration with other Intel oneAPI tools and libraries
Cons of oneDNN
- Potentially more complex setup and configuration for non-Intel hardware
- May have less specialized optimizations for Facebook-specific workloads
Code Comparison
FBGEMM (C++):
fbgemm::PackAMatrix<int8_t> packA(
matrix_op_t::NoTranspose, M, K, A, K, nullptr, 1);
fbgemm::PackBMatrix<int8_t> packB(
matrix_op_t::NoTranspose, K, N, B, N, nullptr, 1);
oneDNN (C++):
auto src_md = memory::desc({N, IC, IH, IW}, memory::data_type::f32, memory::format_tag::nhwc);
auto weights_md = memory::desc({OC, IC, KH, KW}, memory::data_type::f32, memory::format_tag::ohwi);
auto conv_desc = convolution_forward::desc(prop_kind::forward_inference, algorithm::convolution_direct,
src_md, weights_md, dst_md, strides, padding_l, padding_r);
Both libraries provide optimized primitives for deep learning workloads, but FBGEMM focuses more on quantized operations for Facebook's specific needs, while oneDNN offers a broader range of primitives and hardware support within the Intel ecosystem.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
FBGEMM
FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference.
The library provides efficient low-precision general matrix multiplication for small batch sizes and support for accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization. FBGEMM also exploits fusion opportunities in order to overcome the unique challenges of matrix multiplication at lower precision with bandwidth-bound operations.
FBGEMM is used as a backend of PyTorch quantized operators for x86 machines:
See the full Documentation for more information on building, installing, and developing with FBGEMM, as well as the most up-to-date support matrix and API documentation for this library.
What's New?
- New Features and Recent Improvements (January, 2020)
Citation
For a high-level overview, design philosophy and brief descriptions of various parts of FBGEMM please see our blog post.
For those looking for the appropriate article to cite regarding FBGEMM, we recommend citing our paper:
@article{fbgemm,
title={FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference},
author={Khudia, Daya and Huang, Jianyu and Basu, Protonu and Deng, Summer and Liu, Haixin and Park, Jongsoo and Smelyanskiy, Mikhail},
journal={arXiv preprint arXiv:2101.05615},
year={2021}
}
Join the FBGEMM community
For questions, support, news updates, or feature requests, please feel free to:
- File a ticket in GitHub Issues
- Post a discussion in GitHub Discussions
- Reach out to us on the
#fbgemm
channel in PyTorch Slack
For contributions, please see the CONTRIBUTING
file for
ways to help out.
License
FBGEMM is BSD licensed, as found in the LICENSE
file.
Top Related Projects
A library for efficient similarity search and clustering of dense vectors.
Low-precision matrix multiplication
High-efficiency floating-point neural network inference operators for mobile, server, and Web
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
An Open Source Machine Learning Framework for Everyone
oneAPI Deep Neural Network Library (oneDNN)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot