Convert Figma logo to code with AI

facebookresearch logoxformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

8,775
631
8,775
309

Top Related Projects

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

8,460

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

14,221

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

38,368

TensorFlow code and pre-trained models for BERT

Quick Overview

xformers is a library developed by Facebook Research that focuses on optimizing and extending Transformer models. It provides a collection of composable Transformer building blocks, efficient attention mechanisms, and various optimizations to improve the performance and flexibility of Transformer-based architectures.

Pros

  • Highly modular and composable architecture for building custom Transformer models
  • Implements efficient attention mechanisms and optimizations for improved performance
  • Supports both PyTorch and JAX/Flax frameworks
  • Actively maintained and regularly updated by Facebook Research

Cons

  • Steeper learning curve compared to more straightforward Transformer implementations
  • Documentation can be sparse or outdated in some areas
  • May require more setup and configuration compared to simpler libraries
  • Some features are still experimental and subject to change

Code Examples

  1. Creating a basic Transformer encoder:
import torch
from xformers.factory import xFormerEncoderConfig, xFormerEncoder

config = xFormerEncoderConfig(
    dim_model=512,
    num_layers=6,
    multi_head_config={
        "num_heads": 8,
        "dim_head": 64,
        "residual_dropout": 0.1,
    },
    feedforward_config={
        "dim_feedforward": 2048,
        "activation": "relu",
        "dropout": 0.1,
    },
)

encoder = xFormerEncoder(config)
x = torch.randn(32, 100, 512)  # (batch_size, seq_len, dim_model)
output = encoder(x)
  1. Using an efficient attention mechanism:
from xformers.components import MultiHeadDispatch
from xformers.components.attention import ScaledDotProduct

efficient_attention = MultiHeadDispatch(
    dim_model=512,
    num_heads=8,
    attention_cls=ScaledDotProduct,
    attention_kwargs={"dropout": 0.1},
)

q = k = v = torch.randn(32, 100, 512)
output = efficient_attention(q, k, v)
  1. Applying memory-efficient attention:
from xformers.ops import memory_efficient_attention

q = k = v = torch.randn(32, 100, 512)
output = memory_efficient_attention(q, k, v)

Getting Started

To get started with xformers, follow these steps:

  1. Install xformers:
pip install xformers
  1. Import and use xformers components in your PyTorch project:
import torch
from xformers.components import MultiHeadDispatch
from xformers.components.attention import ScaledDotProduct

# Create an efficient attention mechanism
attention = MultiHeadDispatch(
    dim_model=512,
    num_heads=8,
    attention_cls=ScaledDotProduct,
)

# Use the attention mechanism
x = torch.randn(32, 100, 512)
output = attention(x, x, x)

For more advanced usage and customization, refer to the xformers documentation and examples in the GitHub repository.

Competitor Comparisons

85,015

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Pros of PyTorch

  • Broader scope and functionality, covering a wide range of deep learning tasks
  • Larger community and ecosystem, with more resources and third-party libraries
  • More mature and stable, with regular updates and long-term support

Cons of PyTorch

  • Larger codebase and installation size, potentially slower for specific use cases
  • Steeper learning curve for beginners due to its comprehensive feature set
  • May have higher memory usage for certain operations compared to optimized libraries

Code Comparison

PyTorch:

import torch

x = torch.randn(3, 3)
y = torch.matmul(x, x.t())
z = torch.relu(y)

xformers:

import torch
from xformers.ops import memory_efficient_attention

q, k, v = torch.randn(3, 16, 8).chunk(3, dim=-1)
output = memory_efficient_attention(q, k, v)

xformers focuses on efficient transformer implementations, while PyTorch provides a more general-purpose deep learning framework. xformers offers memory-efficient attention operations, which can be beneficial for large-scale transformer models. PyTorch, on the other hand, provides a comprehensive set of tools for various deep learning tasks beyond transformers.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of transformers

  • Extensive model support: Covers a wide range of transformer architectures and pre-trained models
  • Rich documentation and community support
  • Easy-to-use high-level APIs for various NLP tasks

Cons of transformers

  • Can be slower for certain operations compared to xformers
  • May have higher memory usage for large models
  • Less focus on performance optimizations for specific hardware

Code comparison

transformers:

from transformers import BertModel, BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

xformers:

import torch
from xformers.components import MultiHeadDispatch

mha = MultiHeadDispatch(
    dim_model=512,
    num_heads=8,
    attention="scaled_dot_product"
)
q, k, v = torch.rand(3, 1, 4, 512)
output = mha(q, k, v)
35,868

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

  • More comprehensive optimization toolkit, including ZeRO optimizer and pipeline parallelism
  • Better integration with popular deep learning frameworks like PyTorch and TensorFlow
  • Extensive documentation and tutorials for various use cases

Cons of DeepSpeed

  • Steeper learning curve due to its broader feature set
  • May require more configuration and setup for simpler use cases
  • Less focused on specific transformer optimizations compared to xformers

Code Comparison

xformers:

from xformers.components import MultiHeadDispatch

attention = MultiHeadDispatch(
    dim_model=512,
    num_heads=8,
    attention_dropout=0.1,
    residual_dropout=0.1
)

DeepSpeed:

import deepspeed

model_engine, optimizer, _, _ = deepspeed.initialize(
    args=args,
    model=model,
    model_parameters=params
)

Both libraries aim to optimize transformer-based models, but they approach it differently. xformers focuses on efficient implementations of transformer components, while DeepSpeed provides a broader set of optimization techniques for large-scale model training. The choice between them depends on specific project requirements and the level of optimization needed.

8,460

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Pros of Apex

  • Mature and well-established library with extensive NVIDIA GPU optimizations
  • Supports mixed precision training and distributed training out of the box
  • Offers a wider range of optimization techniques beyond just transformers

Cons of Apex

  • Limited to NVIDIA GPUs, reducing portability across different hardware
  • Requires separate installation and setup, which can be complex
  • Less focused on transformer-specific optimizations compared to xformers

Code Comparison

Apex (Mixed Precision Training):

model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
with amp.scale_loss(loss, optimizer) as scaled_loss:
    scaled_loss.backward()

xformers (Memory Efficient Attention):

from xformers.components import Attention

attention = Attention(dim, num_heads, attention_dropout=0.1)
output = attention(query, key, value)

Both libraries aim to improve performance and efficiency in deep learning tasks, but they focus on different aspects. Apex provides a broader set of optimization tools for NVIDIA GPUs, while xformers specializes in transformer-specific optimizations with a focus on memory efficiency and hardware flexibility.

14,221

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Pros of Horovod

  • Designed specifically for distributed deep learning, offering excellent scalability across multiple GPUs and nodes
  • Supports multiple deep learning frameworks (TensorFlow, PyTorch, MXNet) with a unified API
  • Integrates well with existing codebases, requiring minimal changes to enable distributed training

Cons of Horovod

  • Primarily focused on data parallelism, lacking built-in support for model parallelism
  • May have a steeper learning curve for users not familiar with distributed computing concepts
  • Less emphasis on memory efficiency optimizations compared to xformers

Code Comparison

Horovod (distributed training):

import horovod.torch as hvd
hvd.init()
optimizer = optim.SGD(model.parameters())
optimizer = hvd.DistributedOptimizer(optimizer)
hvd.broadcast_parameters(model.state_dict(), root_rank=0)

xformers (memory-efficient attention):

from xformers.components import Attention
attention = Attention(dim, num_heads, attention_mechanism="linear")
output = attention(query, key, value)
38,368

TensorFlow code and pre-trained models for BERT

Pros of BERT

  • Widely adopted and well-established in the NLP community
  • Extensive pre-trained models available for various languages and tasks
  • Comprehensive documentation and numerous tutorials available

Cons of BERT

  • Less flexible for non-NLP tasks compared to xformers
  • Higher computational requirements for fine-tuning and inference
  • Limited support for optimizations and custom attention mechanisms

Code Comparison

BERT example:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

xformers example:

import torch
from xformers.components import MultiHeadDispatch
attention = MultiHeadDispatch(
    dim_model=512,
    num_heads=8,
    attention_dropout=0.1,
    residual_dropout=0.1
)
q, k, v = torch.rand(3, 1, 16, 512)
output = attention(q, k, v)

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Open in Colab
CircleCI Codecov black
PRs welcome


xFormers - Toolbox to Accelerate Research on Transformers

xFormers is:

  • Customizable building blocks: Independent/customizable building blocks that can be used without boilerplate code. The components are domain-agnostic and xFormers is used by researchers in vision, NLP and more.
  • Research first: xFormers contains bleeding-edge components, that are not yet available in mainstream libraries like PyTorch.
  • Built with efficiency in mind: Because speed of iteration matters, components are as fast and memory-efficient as possible. xFormers contains its own CUDA kernels, but dispatches to other libraries when relevant.

Installing xFormers

  • (RECOMMENDED, linux & win) Install latest stable with pip: Requires PyTorch 2.5.1
# [linux only] cuda 11.8 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118
# [linux only] cuda 12.1 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu121
# [linux & win] cuda 12.4 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu124
# [linux only] (EXPERIMENTAL) rocm 6.1 version
pip3 install -U xformers --index-url https://download.pytorch.org/whl/rocm6.1
  • Development binaries:
# Same requirements as for the stable version above
pip install --pre -U xformers
  • Install from source: If you want to use with another version of PyTorch for instance (including nightly-releases)
# (Optional) Makes the build much faster
pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)

Benchmarks

Memory-efficient MHA Benchmarks for ViTS Setup: A100 on f16, measured total time for a forward+backward pass

Note that this is exact attention, not an approximation, just by calling xformers.ops.memory_efficient_attention

More benchmarks

xFormers provides many components, and more benchmarks are available in BENCHMARKS.md.

(Optional) Testing the installation

This command will provide information on an xFormers installation, and what kernels are built/available:

python -m xformers.info

Using xFormers

Key Features

  1. Optimized building blocks, beyond PyTorch primitives
    1. Memory-efficient exact attention - up to 10x faster
    2. sparse attention
    3. block-sparse attention
    4. fused softmax
    5. fused linear layer
    6. fused layer norm
    7. fused dropout(activation(x+bias))
    8. fused SwiGLU

Install troubleshooting

  • NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with module unload cuda; module load cuda/xx.x, possibly also nvcc
  • the version of GCC that you're using matches the current NVCC capabilities
  • the TORCH_CUDA_ARCH_LIST env variable is set to the architectures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"
  • If the build from source OOMs, it's possible to reduce the parallelism of ninja with MAX_JOBS (eg MAX_JOBS=2)

License

xFormers has a BSD-style license, as found in the LICENSE file. It includes code from the triton-lang/kernels repo.

Citing xFormers

If you use xFormers in your publication, please cite it by using the following BibTeX entry.

@Misc{xFormers2022,
  author =       {Benjamin Lefaudeux and Francisco Massa and Diana Liskovich and Wenhan Xiong and Vittorio Caggiano and Sean Naren and Min Xu and Jieru Hu and Marta Tintore and Susan Zhang and Patrick Labatut and Daniel Haziza and Luca Wehrstedt and Jeremy Reizenstein and Grigory Sizov},
  title =        {xFormers: A modular and hackable Transformer modelling library},
  howpublished = {\url{https://github.com/facebookresearch/xformers}},
  year =         {2022}
}

Credits

The following repositories are used in xFormers, either in close to original form or as an inspiration: