Top Related Projects
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Quick Overview
Fairseq is a sequence modeling toolkit developed by Facebook AI Research. It provides a flexible and extensible framework for training and evaluating various sequence-to-sequence models, particularly for natural language processing tasks such as machine translation, text summarization, and speech recognition.
Pros
- Highly modular and customizable architecture
- Supports a wide range of state-of-the-art models and techniques
- Efficient implementation with support for distributed training
- Active development and maintenance by Facebook AI Research
Cons
- Steep learning curve for beginners
- Documentation can be sparse or outdated in some areas
- Requires significant computational resources for large-scale tasks
- Some features may be experimental or not thoroughly tested
Code Examples
- Loading a pre-trained model:
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('/path/to/roberta/model', checkpoint_file='model.pt')
roberta.eval() # disable dropout for inference
- Tokenizing and encoding text:
tokens = roberta.encode('Hello world!')
assert tokens.tolist() == [0, 31414, 232, 328, 2]
- Extracting features from the model:
# Extract the last layer's features for the given tokens
last_layer_features = roberta.extract_features(tokens)
assert last_layer_features.size() == torch.Size([1, 5, 768])
- Fine-tuning a model for classification:
from fairseq.models.roberta import RobertaClassificationHead
model = RobertaModel.from_pretrained('/path/to/roberta/model', checkpoint_file='model.pt')
classification_head = RobertaClassificationHead(
input_dim=768,
inner_dim=768,
num_classes=2,
activation_fn='tanh'
)
model.register_classification_head('sentence_classification_head', classification_head)
# Now you can fine-tune the model on your classification task
Getting Started
To get started with Fairseq, follow these steps:
- Install Fairseq:
pip install fairseq
- Download a pre-trained model:
wget https://dl.fbaipublicfiles.com/fairseq/models/roberta.base.tar.gz
tar -xzvf roberta.base.tar.gz
- Use the model in your Python script:
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('./roberta.base', checkpoint_file='model.pt')
roberta.eval()
tokens = roberta.encode('Hello world!')
features = roberta.extract_features(tokens)
This will load a pre-trained RoBERTa model and extract features from the input text. You can then use these features for various downstream tasks or fine-tune the model for your specific application.
Competitor Comparisons
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- Extensive documentation and examples
- Large community support and active development
- Wide range of pre-trained models available
Cons of fairseq
- Can be complex for beginners
- Requires more computational resources
- May have a steeper learning curve
Code Comparison
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
model.translate('Hello world!')
Both repositories are actually the same project (fairseq), so there is no code comparison to be made between them. The repository facebookresearch/fairseq is the main and only repository for the fairseq project.
Additional Notes
fairseq is a powerful sequence modeling toolkit that supports training custom models for translation, summarization, language modeling, and other text generation tasks. It provides a flexible and modular codebase for developing state-of-the-art natural language processing models.
The project is widely used in both research and industry, offering a comprehensive set of tools and pre-trained models. While it may require some initial effort to understand and set up, fairseq's capabilities and community support make it a valuable resource for NLP practitioners and researchers.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of Transformers
- Broader model support across various architectures and tasks
- More extensive documentation and community support
- Easier integration with popular deep learning frameworks
Cons of Transformers
- Can be slower for certain tasks compared to Fairseq
- May have a steeper learning curve for beginners
- Less focus on specialized sequence-to-sequence tasks
Code Comparison
Fairseq example:
from fairseq.models.roberta import RobertaModel
roberta = RobertaModel.from_pretrained('path/to/roberta.base')
tokens = roberta.encode('Hello world!')
features = roberta.extract_features(tokens)
Transformers example:
from transformers import RobertaTokenizer, RobertaModel
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')
inputs = tokenizer('Hello world!', return_tensors='pt')
outputs = model(**inputs)
Both repositories offer powerful tools for working with transformer models, but Transformers provides a more versatile and user-friendly experience across a wider range of tasks and architectures. Fairseq, on the other hand, excels in specific sequence-to-sequence applications and may offer performance advantages in certain scenarios.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Pros of OpenNMT-py
- More user-friendly and easier to get started with for beginners
- Extensive documentation and tutorials available
- Flexible and modular architecture for easy customization
Cons of OpenNMT-py
- Less optimized for large-scale training compared to fairseq
- Fewer pre-trained models and benchmarks available
- Limited support for some advanced NLP tasks
Code Comparison
OpenNMT-py:
import onmt
model = onmt.models.build_model(opt, fields, checkpoint)
translator = onmt.translate.Translator(model, fields, opt)
translated = translator.translate(src_data, src_dir=opt.src_dir, batch_size=opt.batch_size)
fairseq:
from fairseq.models.transformer import TransformerModel
en2de = TransformerModel.from_pretrained('/path/to/model', checkpoint_file='model.pt')
en2de.translate('Hello world!')
Both repositories offer powerful tools for neural machine translation and sequence-to-sequence tasks. OpenNMT-py is more beginner-friendly and flexible, while fairseq provides better performance for large-scale training and offers more pre-trained models. The code examples demonstrate the different approaches to loading and using models in each library.
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Pros of tensor2tensor
- More comprehensive library with a wider range of models and tasks
- Better integration with TensorFlow ecosystem
- More active community and frequent updates
Cons of tensor2tensor
- Steeper learning curve due to complexity
- Less focus on specific sequence-to-sequence tasks
- Potentially slower training and inference compared to PyTorch-based fairseq
Code Comparison
tensor2tensor:
import tensorflow as tf
import tensor2tensor as t2t
problem = t2t.problems.problem("translate_ende_wmt32k")
model = t2t.models.transformer.Transformer(problem.get_hparams())
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained(
'data-bin/wmt14_en_de',
checkpoint_file='checkpoint_best.pt',
data_name_or_path='data-bin/wmt14_en_de'
)
Both libraries provide high-level APIs for working with transformer models, but tensor2tensor offers a more generalized approach, while fairseq focuses on sequence-to-sequence tasks. tensor2tensor integrates closely with TensorFlow, whereas fairseq is built on PyTorch, which may influence performance and ease of use depending on your familiarity with these frameworks.
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
Pros of text-to-text-transfer-transformer
- Unified approach for multiple NLP tasks
- Extensive pre-training on diverse datasets
- Flexible and adaptable to various downstream tasks
Cons of text-to-text-transfer-transformer
- Higher computational requirements for training and inference
- Less specialized for specific NLP tasks
- Steeper learning curve for newcomers
Code Comparison
text-to-text-transfer-transformer:
import t5
model = t5.load_t5_model("t5-base")
inputs = ["translate English to German: Hello, how are you?"]
outputs = model.predict(inputs)
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('transformer.wmt19.en-de')
translated = model.translate('Hello, how are you?')
Both repositories offer powerful tools for natural language processing tasks. text-to-text-transfer-transformer provides a more versatile approach, suitable for a wide range of NLP tasks, while fairseq offers more specialized models and tools for specific tasks like machine translation. The choice between them depends on the specific requirements of your project and the level of customization needed.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Focuses on optimizing large-scale model training, offering better performance for very large models
- Provides more advanced distributed training features, including ZeRO optimizer stages
- Offers a more flexible API for customizing training pipelines
Cons of DeepSpeed
- Steeper learning curve due to its focus on advanced optimization techniques
- Less comprehensive documentation compared to Fairseq
- Primarily designed for PyTorch, while Fairseq supports multiple frameworks
Code Comparison
DeepSpeed:
model_engine, optimizer, _, _ = deepspeed.initialize(
args=args,
model=model,
model_parameters=model.parameters()
)
Fairseq:
trainer = Trainer(args, task, model, criterion)
trainer.train()
DeepSpeed focuses on initializing the model with its optimization features, while Fairseq provides a higher-level abstraction for training. DeepSpeed offers more fine-grained control over the training process, whereas Fairseq simplifies the training setup with its Trainer class.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.
We provide reference implementations of various sequence modeling papers:
List of implemented papers
- Convolutional Neural Networks (CNN)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- LightConv and DynamicConv models
- Long Short-Term Memory (LSTM) networks
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
- Transformer (self-attention) networks
- Attention Is All You Need (Vaswani et al., 2017)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Adaptive Input Representations for Neural Language Modeling (Baevski and Auli, 2018)
- Lexically constrained decoding with dynamic beam allocation (Post & Vilar, 2018)
- Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (Dai et al., 2019)
- Adaptive Attention Span in Transformers (Sukhbaatar et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)
- Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- Deep Transformers with Latent Depth (Li et al., 2020)
- Unsupervised Cross-lingual Representation Learning for Speech Recognition (Conneau et al., 2020)
- Self-training and Pre-training are Complementary for Speech Recognition (Xu et al., 2020)
- Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training (Hsu, et al., 2021)
- Unsupervised Speech Recognition (Baevski, et al., 2021)
- Simple and Effective Zero-shot Cross-lingual Phoneme Recognition (Xu et al., 2021)
- VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding (Xu et. al., 2021)
- VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding (Xu et. al., 2021)
- NormFormer: Improved Transformer Pretraining with Extra Normalization (Shleifer et. al, 2021)
- Non-autoregressive Transformers
- Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
- Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
- Finetuning
What's New:
- May 2023 Released models for Scaling Speech Technology to 1,000+ Languages (Pratap, et al., 2023)
- June 2022 Released code for wav2vec-U 2.0 from Towards End-to-end Unsupervised Speech Recognition (Liu, et al., 2022)
- May 2022 Integration with xFormers
- December 2021 Released Direct speech-to-speech translation code
- October 2021 Released VideoCLIP and VLM models
- October 2021 Released multilingual finetuned XLSR-53 model
- September 2021
master
branch renamed tomain
. - July 2021 Released DrNMT code
- July 2021 Released Robust wav2vec 2.0 model
- June 2021 Released XLMR-XL and XLMR-XXL models
- May 2021 Released Unsupervised Speech Recognition code
- March 2021 Added full parameter and optimizer state sharding + CPU offloading
- February 2021 Added LASER training code
- December 2020: Added Adaptive Attention Span code
- December 2020: GottBERT model and code released
- November 2020: Adopted the Hydra configuration framework
- November 2020: fairseq 0.10.0 released
- October 2020: Added R3F/R4F (Better Fine-Tuning) code
- October 2020: Deep Transformer with Latent Depth code released
- October 2020: Added CRISS models and code
Previous updates
- September 2020: Added Linformer code
- September 2020: Added pointer-generator networks
- August 2020: Added lexically constrained decoding
- August 2020: wav2vec2 models and code released
- July 2020: Unsupervised Quality Estimation code released
- May 2020: Follow fairseq on Twitter
- April 2020: Monotonic Multihead Attention code released
- April 2020: Quant-Noise code released
- April 2020: Initial model parallel support and 11B parameters unidirectional LM released
- March 2020: Byte-level BPE code released
- February 2020: mBART model and code released
- February 2020: Added tutorial for back-translation
- December 2019: fairseq 0.9.0 released
- November 2019: VizSeq released (a visual analysis toolkit for evaluating fairseq models)
- November 2019: CamemBERT model and code released
- November 2019: BART model and code released
- November 2019: XLM-R models and code released
- September 2019: Nonautoregressive translation code released
- August 2019: WMT'19 models released
- July 2019: fairseq relicensed under MIT license
- July 2019: RoBERTa models and code released
- June 2019: wav2vec models and code released
Features:
- multi-GPU training on one machine or across multiple machines (data and model parallel)
- fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search (Vijayakumar et al., 2016)
- sampling (unconstrained, top-k and top-p/nucleus)
- lexically constrained decoding (Post & Vilar, 2018)
- gradient accumulation enables training with large mini-batches even on a single GPU
- mixed precision training (trains faster with less GPU memory on NVIDIA tensor cores)
- extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
- flexible configuration based on Hydra allowing a combination of code, command-line and file based configuration
- full parameter and optimizer state sharding
- offloading parameters to CPU
We also provide pre-trained models for translation and language modeling
with a convenient torch.hub
interface:
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'
See the PyTorch Hub tutorials for translation and RoBERTa for more examples.
Requirements and Installation
- PyTorch version >= 1.10.0
- Python version >= 3.8
- For training new models, you'll also need an NVIDIA GPU and NCCL
- To install fairseq and develop locally:
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./
# to install the latest stable release (0.10.x)
# pip install fairseq
- For faster training install NVIDIA's apex library:
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
--global-option="--deprecated_fused_adam" --global-option="--xentropy" \
--global-option="--fast_multihead_attn" ./
- For large datasets install PyArrow:
pip install pyarrow
- If you use Docker make sure to increase the shared memory size either with
--ipc=host
or--shm-size
as command line options tonvidia-docker run
.
Getting Started
The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.
Pre-trained models and examples
We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.
- Translation: convolutional and transformer models are available
- Language Modeling: convolutional and transformer models are available
We also have more detailed READMEs to reproduce results from specific papers:
- XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale (Babu et al., 2021)
- Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)
- wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)
- Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)
- Training with Quantization Noise for Extreme Model Compression ({Fan*, Stock*} et al., 2020)
- Neural Machine Translation with Byte-Level Subwords (Wang et al., 2020)
- Multilingual Denoising Pre-training for Neural Machine Translation (Liu et at., 2020)
- Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)
- Jointly Learning to Align and Translate with Transformer Models (Garg et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
- Facebook FAIR's WMT19 News Translation Task Submission (Ng et al., 2019)
- RoBERTa: A Robustly Optimized BERT Pretraining Approach (Liu et al., 2019)
- wav2vec: Unsupervised Pre-training for Speech Recognition (Schneider et al., 2019)
- Mixture Models for Diverse Machine Translation: Tricks of the Trade (Shen et al., 2019)
- Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)
- Understanding Back-Translation at Scale (Edunov et al., 2018)
- Classical Structured Prediction Losses for Sequence to Sequence Learning (Edunov et al., 2018)
- Hierarchical Neural Story Generation (Fan et al., 2018)
- Scaling Neural Machine Translation (Ott et al., 2018)
- Convolutional Sequence to Sequence Learning (Gehring et al., 2017)
- Language Modeling with Gated Convolutional Networks (Dauphin et al., 2017)
Join the fairseq community
- Twitter: https://twitter.com/fairseq
- Facebook page: https://www.facebook.com/groups/fairseq.users
- Google group: https://groups.google.com/forum/#!forum/fairseq-users
License
fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.
Citation
Please cite as:
@inproceedings{ott2019fairseq,
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
year = {2019},
}
Top Related Projects
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot