Convert Figma logo to code with AI

pytorch logoFBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

1,161
471
1,161
355

Top Related Projects

30,928

A library for efficient similarity search and clustering of dense vectors.

Low-precision matrix multiplication

1,845

High-efficiency floating-point neural network inference operators for mobile, server, and Web

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

185,446

An Open Source Machine Learning Framework for Everyone

3,577

oneAPI Deep Neural Network Library (oneDNN)

Quick Overview

FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplication and convolution library for server-side inference. It is optimized for x86 CPUs and focuses on delivering efficient performance for quantized neural networks.

Pros

  • Highly optimized for x86 architectures, providing excellent performance for server-side inference
  • Supports various low-precision data types, enabling efficient quantized neural network computations
  • Integrates well with PyTorch, allowing seamless use in deep learning workflows
  • Includes specialized kernels for common operations in neural networks, such as fully connected layers and convolutions

Cons

  • Limited to x86 architectures, not suitable for other platforms like ARM or GPUs
  • Requires expertise in low-level optimization and quantization techniques for optimal usage
  • May have a steeper learning curve compared to higher-level deep learning libraries
  • Documentation could be more comprehensive for newcomers to the project

Code Examples

  1. Basic matrix multiplication:
#include <fbgemm/Fbgemm.h>

// Initialize matrices
fbgemm::matrix_op_t trans_A = fbgemm::matrix_op_t::NoTranspose;
fbgemm::matrix_op_t trans_B = fbgemm::matrix_op_t::NoTranspose;
int m = 5, n = 6, k = 4;
std::vector<float> A(m * k, 1.0f);
std::vector<float> B(k * n, 2.0f);
std::vector<float> C(m * n, 0.0f);

// Perform matrix multiplication
fbgemm::cblas_sgemm_ref(trans_A, trans_B, m, n, k, 1.0f, A.data(), k, B.data(), n, 0.0f, C.data(), n);
  1. Quantized fully connected layer:
#include <fbgemm/FbgemmI8Spmdm.h>

// Initialize weights and input
int m = 128, n = 512, k = 256;
std::vector<uint8_t> A(m * k);
std::vector<int8_t> B(k * n);
std::vector<int32_t> C(m * n);

// Create and execute the fully connected layer
fbgemm::PackBMatrix<int8_t> packB(fbgemm::matrix_op_t::NoTranspose, k, n, B.data(), n);
fbgemm::DoNothing<int32_t, int32_t> doNothingObj{};
fbgemm::ReQuantizeForFloat<false> outputProcObj(doNothingObj);
fbgemm::fbgemmPacked(m, n, k, A.data(), k, packB, C.data(), n, nullptr, 1, outputProcObj, 0, 1);
  1. Convolution operation:
#include <fbgemm/FbgemmI8DepthwiseAvx2.h>

// Initialize convolution parameters
int N = 1, IC = 64, OC = 64, H = 28, W = 28, G = 64, R = 3, S = 3;
std::vector<uint8_t> A(N * H * W * IC);
std::vector<int8_t> B(OC * R * S);
std::vector<int32_t> C(N * H * W * OC);

// Perform depthwise convolution
fbgemm::depthwise_2d_same_pad<QuantizationGranularity::TENSOR, inst_set_t::avx2>(
    N, H, W, IC, OC, R, S, G, A.data(), B.data(), C.data(), nullptr, nullptr);

Getting Started

To use FBGEMM in your project:

  1. Clone the repository:

    git clone https://github.com/pytorch/FBGEMM.git
    
  2. Build FBGEMM:

    cd FBGEMM
    mkdir buil
    

Competitor Comparisons

30,928

A library for efficient similarity search and clustering of dense vectors.

Pros of Faiss

  • Specialized for efficient similarity search and clustering of dense vectors
  • Supports GPU acceleration for faster processing of large datasets
  • Offers a wide range of indexing algorithms for different use cases

Cons of Faiss

  • More focused on vector search, less versatile for general matrix operations
  • May require more setup and configuration for specific use cases
  • Limited integration with deep learning frameworks compared to FBGEMM

Code Comparison

FBGEMM (matrix multiplication):

fbgemm::cblas_gemm_compute(
    matrix_op_t::NoTranspose, matrix_op_t::NoTranspose,
    m, n, k, A, lda, B, ldb, beta, C, ldc);

Faiss (vector search):

index = faiss.IndexFlatL2(d)
index.add(xb)
D, I = index.search(xq, k)

FBGEMM focuses on optimized matrix operations for deep learning, while Faiss specializes in efficient similarity search and clustering of dense vectors. FBGEMM is more tightly integrated with PyTorch and offers broader support for various quantization schemes. Faiss, on the other hand, excels in vector search tasks and provides GPU acceleration for large-scale operations. The choice between the two depends on the specific requirements of your project, whether it's optimizing deep learning computations or performing efficient similarity searches.

Low-precision matrix multiplication

Pros of gemmlowp

  • Designed specifically for low-precision GEMM operations
  • Highly portable, works on various platforms including mobile devices
  • Extensive documentation and examples for ease of use

Cons of gemmlowp

  • Limited to integer arithmetic, not suitable for floating-point operations
  • Less actively maintained compared to FBGEMM
  • Narrower focus on GEMM operations, while FBGEMM offers a broader range of optimizations

Code Comparison

gemmlowp:

#include "gemmlowp/public/gemmlowp.h"

typedef gemmlowp::MatrixMap<const std::uint8_t, gemmlowp::MapOrder::RowMajor> InputMap;
typedef gemmlowp::MatrixMap<std::int32_t, gemmlowp::MapOrder::RowMajor> OutputMap;

gemmlowp::GemmContext context;
gemmlowp::GemmWithOutputPipeline<std::uint8_t, std::int32_t, gemmlowp::DefaultL8R8BitDepthParams>(
    &context, lhs, rhs, &result, lhs_offset, rhs_offset, output_pipeline);

FBGEMM:

#include "fbgemm/FbgemmI8DepthwiseAvx2.h"

fbgemm::conv_param_t<2> conv_p(
    1, // output channels
    3, // kernel height
    3, // kernel width
    1, // stride height
    1, // stride width
    0, // pad height
    0  // pad width
);

fbgemm::depthwise_2d_same_pad<QuantizationGranularity::TENSOR>(
    conv_p, A_zero_point, A, B_zero_point, B, C_multiplier, C_zero_point, C, act_times_w_scale);
1,845

High-efficiency floating-point neural network inference operators for mobile, server, and Web

Pros of XNNPACK

  • Broader platform support, including mobile and web
  • More extensive documentation and examples
  • Actively maintained with frequent updates

Cons of XNNPACK

  • Less optimized for high-performance server-side inference
  • Smaller community and ecosystem compared to FBGEMM
  • Limited support for quantization techniques

Code Comparison

XNNPACK example:

xnn_initialize(NULL);
xnn_operator_t conv_op = NULL;
xnn_status status = xnn_create_convolution2d_nhwc_f32(
  /* ... parameters ... */
  &conv_op);

FBGEMM example:

fbgemm::conv_param_t<2> conv_p(
  /* ... parameters ... */
);
fbgemm::PackWeightsForConv<2> pack_w(conv_p);
fbgemm::ConvFastPath<float, int32_t, float> conv(conv_p);

Both libraries provide low-level optimizations for neural network operations, but XNNPACK focuses on broader platform support and ease of use, while FBGEMM emphasizes high-performance server-side inference. XNNPACK offers more extensive documentation and examples, making it more accessible for developers. However, FBGEMM provides better optimization for specific use cases, particularly in PyTorch integration and quantization techniques.

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Pros of ONNX Runtime

  • Broader ecosystem support and compatibility with multiple frameworks
  • Extensive optimization capabilities for various hardware platforms
  • Robust production-ready deployment options

Cons of ONNX Runtime

  • Potentially higher overhead for simple models or specific use cases
  • Less specialized for Facebook-specific optimizations compared to FBGEMM

Code Comparison

ONNX Runtime example:

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
input_name = session.get_inputs()[0].name
output = session.run(None, {input_name: input_data})

FBGEMM example:

#include "fbgemm/FbgemmI8DepthwiseAvx2.h"

fbgemm::depthwise_3x3_pad_1(
    N, H, W, IC, OC, stride_h, stride_w,
    A_zero_point, A, B_zero_point, B, C_multiplier, C_zero_point, C);

ONNX Runtime offers a higher-level API for general inference, while FBGEMM provides low-level optimized functions for specific operations. ONNX Runtime is more versatile across different models and frameworks, whereas FBGEMM is tailored for Facebook's specific use cases and optimizations.

185,446

An Open Source Machine Learning Framework for Everyone

Pros of TensorFlow

  • Broader ecosystem with more tools and libraries
  • Better support for production deployment and serving models
  • More extensive documentation and community resources

Cons of TensorFlow

  • Steeper learning curve for beginners
  • Less dynamic and flexible than PyTorch for research and experimentation
  • Slower development cycle for new features

Code Comparison

FBGEMM (PyTorch):

import fbgemm_gpu
tensor = fbgemm_gpu.new_empty_tensor(torch.device("cuda"), [2, 3], dtype=torch.float32)

TensorFlow:

import tensorflow as tf
tensor = tf.zeros([2, 3], dtype=tf.float32)

Summary

FBGEMM is a specialized library for optimizing matrix multiplication and convolution operations, primarily used within PyTorch. TensorFlow, on the other hand, is a comprehensive machine learning framework with a wider range of applications. While FBGEMM focuses on performance optimizations for specific operations, TensorFlow offers a more complete ecosystem for developing and deploying machine learning models. The choice between the two depends on the specific requirements of your project and your familiarity with each framework's ecosystem.

3,577

oneAPI Deep Neural Network Library (oneDNN)

Pros of oneDNN

  • Broader hardware support, including CPUs, GPUs, and FPGAs
  • More comprehensive set of deep learning primitives and operations
  • Better integration with other Intel oneAPI tools and libraries

Cons of oneDNN

  • Potentially more complex setup and configuration for non-Intel hardware
  • May have less specialized optimizations for Facebook-specific workloads

Code Comparison

FBGEMM (C++):

fbgemm::PackAMatrix<int8_t> packA(
    matrix_op_t::NoTranspose, M, K, A, K, nullptr, 1);
fbgemm::PackBMatrix<int8_t> packB(
    matrix_op_t::NoTranspose, K, N, B, N, nullptr, 1);

oneDNN (C++):

auto src_md = memory::desc({N, IC, IH, IW}, memory::data_type::f32, memory::format_tag::nhwc);
auto weights_md = memory::desc({OC, IC, KH, KW}, memory::data_type::f32, memory::format_tag::ohwi);
auto conv_desc = convolution_forward::desc(prop_kind::forward_inference, algorithm::convolution_direct,
    src_md, weights_md, dst_md, strides, padding_l, padding_r);

Both libraries provide optimized primitives for deep learning workloads, but FBGEMM focuses more on quantized operations for Facebook's specific needs, while oneDNN offers a broader range of primitives and hardware support within the Intel ecosystem.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

FBGEMM

FBGEMM CI

FBGEMM (Facebook GEneral Matrix Multiplication) is a low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference.

The library provides efficient low-precision general matrix multiplication for small batch sizes and support for accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization. FBGEMM also exploits fusion opportunities in order to overcome the unique challenges of matrix multiplication at lower precision with bandwidth-bound operations.

FBGEMM is used as a backend of PyTorch quantized operators for x86 machines:

See the full Documentation for more information on building, installing, and developing with FBGEMM, as well as the most up-to-date support matrix and API documentation for this library.

What's New?

Citation

For a high-level overview, design philosophy and brief descriptions of various parts of FBGEMM please see our blog post.

For those looking for the appropriate article to cite regarding FBGEMM, we recommend citing our paper:

@article{fbgemm,
  title={FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference},
  author={Khudia, Daya and Huang, Jianyu and Basu, Protonu and Deng, Summer and Liu, Haixin and Park, Jongsoo and Smelyanskiy, Mikhail},
  journal={arXiv preprint arXiv:2101.05615},
  year={2021}
}

Join the FBGEMM community

For questions, support, news updates, or feature requests, please feel free to:

For contributions, please see the CONTRIBUTING file for ways to help out.

License

FBGEMM is BSD licensed, as found in the LICENSE file.