fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

31,682

6,586

31,682

1,340

View on GitHub

Top Related Projects

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

OpenNMT-py

6,923

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

tensor2tensor

16,358

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

text-to-text-transfer-transformer

6,384

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Quick Overview

Fairseq is a sequence modeling toolkit developed by Facebook AI Research. It provides a flexible and extensible framework for training and evaluating various sequence-to-sequence models, particularly for natural language processing tasks such as machine translation, text summarization, and speech recognition.

Pros

Highly modular and customizable architecture
Supports a wide range of state-of-the-art models and techniques
Efficient implementation with support for distributed training
Active development and maintenance by Facebook AI Research

Cons

Steep learning curve for beginners
Documentation can be sparse or outdated in some areas
Requires significant computational resources for large-scale tasks
Some features may be experimental or not thoroughly tested

Code Examples

Loading a pre-trained model:

from fairseq.models.roberta import RobertaModel

roberta = RobertaModel.from_pretrained('/path/to/roberta/model', checkpoint_file='model.pt')
roberta.eval()  # disable dropout for inference

Tokenizing and encoding text:

tokens = roberta.encode('Hello world!')
assert tokens.tolist() == [0, 31414, 232, 328, 2]

Extracting features from the model:

# Extract the last layer's features for the given tokens
last_layer_features = roberta.extract_features(tokens)
assert last_layer_features.size() == torch.Size([1, 5, 768])

Fine-tuning a model for classification:

from fairseq.models.roberta import RobertaClassificationHead

model = RobertaModel.from_pretrained('/path/to/roberta/model', checkpoint_file='model.pt')
classification_head = RobertaClassificationHead(
    input_dim=768,
    inner_dim=768,
    num_classes=2,
    activation_fn='tanh'
)
model.register_classification_head('sentence_classification_head', classification_head)

# Now you can fine-tune the model on your classification task

Getting Started

To get started with Fairseq, follow these steps:

Install Fairseq:

pip install fairseq

Download a pre-trained model:

wget https://dl.fbaipublicfiles.com/fairseq/models/roberta.base.tar.gz
tar -xzvf roberta.base.tar.gz

Use the model in your Python script:

from fairseq.models.roberta import RobertaModel

roberta = RobertaModel.from_pretrained('./roberta.base', checkpoint_file='model.pt')
roberta.eval()

tokens = roberta.encode('Hello world!')
features = roberta.extract_features(tokens)

This will load a pre-trained RoBERTa model and extract features from the input text. You can then use these features for various downstream tasks or fine-tune the model for your specific application.

Competitor Comparisons

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Extensive documentation and examples
Large community support and active development
Wide range of pre-trained models available

Cons of fairseq

Can be complex for beginners
Requires more computational resources
May have a steeper learning curve

Code Comparison

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
model.translate('Hello world!')

Both repositories are actually the same project (fairseq), so there is no code comparison to be made between them. The repository facebookresearch/fairseq is the main and only repository for the fairseq project.

Additional Notes

fairseq is a powerful sequence modeling toolkit that supports training custom models for translation, summarization, language modeling, and other text generation tasks. It provides a flexible and modular codebase for developing state-of-the-art natural language processing models.

The project is widely used in both research and industry, offering a comprehensive set of tools and pre-trained models. While it may require some initial effort to understand and set up, fairseq's capabilities and community support make it a valuable resource for NLP practitioners and researchers.

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Broader model support across various architectures and tasks
More extensive documentation and community support
Easier integration with popular deep learning frameworks

Cons of Transformers

Can be slower for certain tasks compared to Fairseq
May have a steeper learning curve for beginners
Less focus on specialized sequence-to-sequence tasks

Code Comparison

Fairseq example:

from fairseq.models.roberta import RobertaModel

roberta = RobertaModel.from_pretrained('path/to/roberta.base')
tokens = roberta.encode('Hello world!')
features = roberta.extract_features(tokens)

Transformers example:

from transformers import RobertaTokenizer, RobertaModel

tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')
inputs = tokenizer('Hello world!', return_tensors='pt')
outputs = model(**inputs)

Both repositories offer powerful tools for working with transformer models, but Transformers provides a more versatile and user-friendly experience across a wider range of tasks and architectures. Fairseq, on the other hand, excels in specific sequence-to-sequence applications and may offer performance advantages in certain scenarios.

OpenNMT-py

6,923

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

Pros of OpenNMT-py

More user-friendly and easier to get started with for beginners
Extensive documentation and tutorials available
Flexible and modular architecture for easy customization

Cons of OpenNMT-py

Less optimized for large-scale training compared to fairseq
Fewer pre-trained models and benchmarks available
Limited support for some advanced NLP tasks

Code Comparison

OpenNMT-py:

import onmt

model = onmt.models.build_model(opt, fields, checkpoint)
translator = onmt.translate.Translator(model, fields, opt)
translated = translator.translate(src_data, src_dir=opt.src_dir, batch_size=opt.batch_size)

fairseq:

from fairseq.models.transformer import TransformerModel

en2de = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
en2de.translate('Hello world!')

Both repositories offer powerful tools for neural machine translation and sequence-to-sequence tasks. OpenNMT-py is more beginner-friendly and flexible, while fairseq provides better performance for large-scale training and offers more pre-trained models. The code examples demonstrate the different approaches to loading and using models in each library.

tensor2tensor

16,358

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Pros of tensor2tensor

More comprehensive library with a wider range of models and tasks
Better integration with TensorFlow ecosystem
More active community and frequent updates

Cons of tensor2tensor

Steeper learning curve due to complexity
Less focus on specific sequence-to-sequence tasks
Potentially slower training and inference compared to PyTorch-based fairseq

Code Comparison

tensor2tensor:

import tensorflow as tf
import tensor2tensor as t2t

problem = t2t.problems.problem("translate_ende_wmt32k")
model = t2t.models.transformer.Transformer(problem.get_hparams())

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained(
    'data-bin/wmt14_en_de',
    checkpoint_file='checkpoint_best.pt',
    data_name_or_path='data-bin/wmt14_en_de'
)

Both libraries provide high-level APIs for working with transformer models, but tensor2tensor offers a more generalized approach, while fairseq focuses on sequence-to-sequence tasks. tensor2tensor integrates closely with TensorFlow, whereas fairseq is built on PyTorch, which may influence performance and ease of use depending on your familiarity with these frameworks.

text-to-text-transfer-transformer

6,384

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Pros of text-to-text-transfer-transformer

Unified approach for multiple NLP tasks
Extensive pre-training on diverse datasets
Flexible and adaptable to various downstream tasks

Cons of text-to-text-transfer-transformer

Higher computational requirements for training and inference
Less specialized for specific NLP tasks
Steeper learning curve for newcomers

Code Comparison

text-to-text-transfer-transformer:

import t5
model = t5.load_t5_model("t5-base")
inputs = ["translate English to German: Hello, how are you?"]
outputs = model.predict(inputs)

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('transformer.wmt19.en-de')
translated = model.translate('Hello, how are you?')

Both repositories offer powerful tools for natural language processing tasks. text-to-text-transfer-transformer provides a more versatile approach, suitable for a wide range of NLP tasks, while fairseq offers more specialized models and tools for specific tasks like machine translation. The choice between them depends on the specific requirements of your project and the level of customization needed.

DeepSpeed

39,112

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Pros of DeepSpeed

Focuses on optimizing large-scale model training, offering advanced techniques like ZeRO and 3D parallelism
Provides a more comprehensive set of optimization techniques for distributed training
Integrates well with popular frameworks like PyTorch and offers easy-to-use APIs

Cons of DeepSpeed

Steeper learning curve due to its advanced features and optimization techniques
May require more configuration and tuning to achieve optimal performance
Less focus on sequence-to-sequence tasks compared to Fairseq

Code Comparison

DeepSpeed:

import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
                                                     model=model,
                                                     model_parameters=params)

Fairseq:

from fairseq import checkpoint_utils, distributed_utils, options, tasks

parser = options.get_training_parser()
args = options.parse_args_and_arch(parser)

Both libraries offer powerful tools for training large-scale models, but they have different focuses. DeepSpeed excels in optimizing distributed training for massive models, while Fairseq specializes in sequence-to-sequence tasks and provides a rich set of pre-built models. The choice between them depends on the specific requirements of your project and the scale of your training needs.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

We provide reference implementations of various sequence modeling papers:

List of implemented papers

Convolutional Neural Networks (CNN)
LightConv and DynamicConv models
- Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)
Long Short-Term Memory (LSTM) networks
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
Transformer (self-attention) networks
Non-autoregressive Transformers
- Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
- Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
Finetuning
- Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)

What's New:

May 2023 Released models for Scaling Speech Technology to 1,000+ Languages (Pratap, et al., 2023)
June 2022 Released code for wav2vec-U 2.0 from Towards End-to-end Unsupervised Speech Recognition (Liu, et al., 2022)
May 2022 Integration with xFormers
December 2021 Released Direct speech-to-speech translation code
October 2021 Released VideoCLIP and VLM models
October 2021 Released multilingual finetuned XLSR-53 model
September 2021 master branch renamed to main.
July 2021 Released DrNMT code
July 2021 Released Robust wav2vec 2.0 model
June 2021 Released XLMR-XL and XLMR-XXL models
May 2021 Released Unsupervised Speech Recognition code
March 2021 Added full parameter and optimizer state sharding + CPU offloading
February 2021 Added LASER training code
December 2020: Added Adaptive Attention Span code
December 2020: GottBERT model and code released
November 2020: Adopted the Hydra configuration framework
- see documentation explaining how to use it for new and existing projects
November 2020: fairseq 0.10.0 released
October 2020: Added R3F/R4F (Better Fine-Tuning) code
October 2020: Deep Transformer with Latent Depth code released
October 2020: Added CRISS models and code

Previous updates

September 2020: Added Linformer code
September 2020: Added pointer-generator networks
August 2020: Added lexically constrained decoding
August 2020: wav2vec2 models and code released
July 2020: Unsupervised Quality Estimation code released
May 2020: Follow fairseq on Twitter
April 2020: Monotonic Multihead Attention code released
April 2020: Quant-Noise code released
April 2020: Initial model parallel support and 11B parameters unidirectional LM released
March 2020: Byte-level BPE code released
February 2020: mBART model and code released
February 2020: Added tutorial for back-translation
December 2019: fairseq 0.9.0 released
November 2019: VizSeq released (a visual analysis toolkit for evaluating fairseq models)
November 2019: CamemBERT model and code released
November 2019: BART model and code released
November 2019: XLM-R models and code released
September 2019: Nonautoregressive translation code released
August 2019: WMT'19 models released
July 2019: fairseq relicensed under MIT license
July 2019: RoBERTa models and code released
June 2019: wav2vec models and code released

Features:

multi-GPU training on one machine or across multiple machines (data and model parallel)
fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search (Vijayakumar et al., 2016)
- sampling (unconstrained, top-k and top-p/nucleus)
- lexically constrained decoding (Post & Vilar, 2018)
gradient accumulation enables training with large mini-batches even on a single GPU
mixed precision training (trains faster with less GPU memory on NVIDIA tensor cores)
extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
flexible configuration based on Hydra allowing a combination of code, command-line and file based configuration
full parameter and optimizer state sharding
offloading parameters to CPU

We also provide pre-trained models for translation and language modeling with a convenient torch.hub interface:

en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'

See the PyTorch Hub tutorials for translation and RoBERTa for more examples.

Requirements and Installation

PyTorch version >= 1.10.0
Python version >= 3.8
For training new models, you'll also need an NVIDIA GPU and NCCL
To install fairseq and develop locally:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./

# to install the latest stable release (0.10.x)
# pip install fairseq

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

For large datasets install PyArrow: pip install pyarrow
If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run .

Getting Started

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.

Pre-trained models and examples

We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.

Translation: convolutional and transformer models are available
Language Modeling: convolutional and transformer models are available

We also have more detailed READMEs to reproduce results from specific papers:

Join the fairseq community

Twitter: https://twitter.com/fairseq
Facebook page: https://www.facebook.com/groups/fairseq.users
Google group: https://groups.google.com/forum/#!forum/fairseq-users

License

fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as:

@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot