transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
TensorFlow code and pre-trained models for BERT
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Quick Overview
Hugging Face's Transformers library is a state-of-the-art natural language processing (NLP) toolkit. It provides thousands of pre-trained models for various NLP tasks, supporting multiple deep learning frameworks like PyTorch, TensorFlow, and JAX. The library offers a unified API for using these models, making it easy to download, train, and deploy cutting-edge NLP models.
Pros
- Extensive collection of pre-trained models for various NLP tasks
- Easy-to-use API with support for multiple deep learning frameworks
- Active community and frequent updates
- Comprehensive documentation and examples
Cons
- Can be resource-intensive, especially for larger models
- Learning curve for beginners due to the vast array of models and features
- Dependency management can be complex
- Some advanced features may require in-depth understanding of NLP concepts
Code Examples
- Loading and using a pre-trained model for sentiment analysis:
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("I love using Transformers!")
print(result)
- Fine-tuning a pre-trained model on a custom dataset:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, Trainer, TrainingArguments
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
# Assume 'train_dataset' and 'eval_dataset' are prepared
training_args = TrainingArguments(output_dir="./results", num_train_epochs=3, per_device_train_batch_size=16)
trainer = Trainer(model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset)
trainer.train()
- Using a pre-trained model for text generation:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
input_text = "Once upon a time"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
Getting Started
To get started with Transformers, follow these steps:
- Install the library:
pip install transformers
- Import and use a pre-trained model:
from transformers import pipeline
# Use a pre-trained model for named entity recognition
ner = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")
text = "My name is Sarah and I work at Google in London."
result = ner(text)
print(result)
This quick start example demonstrates how to install the library and use a pre-trained model for named entity recognition. The Transformers library offers many more features and models, which you can explore in their documentation.
Competitor Comparisons
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- More comprehensive ecosystem for end-to-end machine learning
- Better support for deployment and production environments
- Stronger performance optimization capabilities
Cons of TensorFlow
- Steeper learning curve for beginners
- Less focus on natural language processing tasks
- More complex API compared to Transformers
Code Comparison
TensorFlow:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
Transformers:
from transformers import BertModel, BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
TensorFlow is a comprehensive machine learning framework that offers a wide range of tools and libraries for various ML tasks. It excels in performance optimization and deployment scenarios. However, it has a steeper learning curve and a more complex API.
Transformers, on the other hand, focuses specifically on natural language processing tasks and provides easy-to-use interfaces for working with pre-trained models. It offers a more straightforward API for NLP tasks but may not be as versatile for other machine learning applications.
The code comparison illustrates the difference in complexity and focus between the two libraries. TensorFlow requires more setup for creating a basic model, while Transformers allows for quick implementation of pre-trained NLP models.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- More flexible and lower-level framework, allowing for greater customization
- Broader scope, supporting a wide range of deep learning applications beyond NLP
- Larger community and ecosystem, with more third-party libraries and tools
Cons of PyTorch
- Steeper learning curve for beginners in machine learning
- Requires more boilerplate code for common NLP tasks
- Less streamlined API for working with pre-trained models and datasets
Code Comparison
PyTorch:
import torch
import torch.nn as nn
class SimpleModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(10, 1)
def forward(self, x):
return self.linear(x)
Transformers:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
outputs = model(**inputs)
The PyTorch example shows a basic model definition, while the Transformers example demonstrates how easily pre-trained models can be loaded and used.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Focuses on optimizing training speed and efficiency for large models
- Offers advanced distributed training techniques like ZeRO and 3D parallelism
- Provides memory optimization features for training larger models on limited hardware
Cons of DeepSpeed
- Steeper learning curve compared to Transformers' user-friendly API
- Less extensive model library and pre-trained models
- Primarily designed for training, with less emphasis on inference and deployment
Code Comparison
Transformers:
from transformers import BertModel, BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
DeepSpeed:
import deepspeed
import torch
model = MyModel()
engine = deepspeed.initialize(model=model, config_params=ds_config)
DeepSpeed focuses on optimizing training performance and enabling large-scale model training, while Transformers provides a comprehensive library of pre-trained models and a user-friendly API for various NLP tasks. DeepSpeed is ideal for researchers and organizations working with massive models and distributed training, whereas Transformers is more accessible for general NLP tasks and rapid prototyping.
TensorFlow code and pre-trained models for BERT
Pros of BERT
- Original implementation by Google Research, providing a reference point for the BERT model
- Focused specifically on BERT, offering a streamlined codebase for this particular architecture
- Includes pre-training scripts, allowing users to train BERT models from scratch
Cons of BERT
- Limited to BERT model only, lacking support for other transformer architectures
- Less actively maintained compared to Transformers, with fewer updates and contributions
- Fewer features and utilities for downstream tasks and fine-tuning
Code Comparison
BERT:
import modeling
import tokenization
bert_config = modeling.BertConfig.from_json_file("bert_config.json")
tokenizer = tokenization.FullTokenizer(vocab_file="vocab.txt", do_lower_case=True)
Transformers:
from transformers import BertConfig, BertTokenizer
config = BertConfig.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
The Transformers library offers a more user-friendly API with pre-trained models readily available, while BERT requires more manual setup and configuration.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- More focused on sequence-to-sequence tasks and machine translation
- Offers advanced features for distributed training and mixed precision
- Includes implementations of cutting-edge research papers from FAIR
Cons of fairseq
- Steeper learning curve and less beginner-friendly documentation
- Smaller community and fewer pre-trained models compared to Transformers
- Less frequent updates and maintenance
Code Comparison
fairseq:
model = TransformerModel.build_model(args, task)
loss = model(src_tokens, src_lengths, prev_output_tokens, tgt)
loss.backward()
Transformers:
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
outputs = model(input_ids=input_ids, labels=labels)
loss = outputs.loss
loss.backward()
Both libraries provide high-level APIs for working with transformer models, but Transformers offers a more streamlined approach with its AutoModel classes and easier access to pre-trained models. fairseq provides more flexibility in model architecture and training configurations, which can be beneficial for advanced users and researchers.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Pros of spaCy
- Lightweight and efficient, optimized for production use
- Comprehensive linguistic features (tokenization, POS tagging, dependency parsing)
- Easy-to-use API with built-in visualizers
Cons of spaCy
- Limited support for deep learning models compared to Transformers
- Smaller community and ecosystem
- Less flexibility for custom model architectures
Code Comparison
spaCy:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.label_)
Transformers:
from transformers import pipeline
ner = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")
text = "Apple is looking at buying U.K. startup for $1 billion"
results = ner(text)
for result in results:
print(f"{result['word']} - {result['entity']}")
Both libraries offer NLP capabilities, but spaCy is more focused on efficient, production-ready processing of linguistic features, while Transformers provides a wider range of state-of-the-art deep learning models for various NLP tasks.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
English | ç®ä½ä¸æ | ç¹é«ä¸æ | íêµì´ | Español | æ¥æ¬èª | हिनà¥à¤¦à¥ | Ð ÑÑÑкий | Ð ortuguês | à°¤à±à°²à±à°à± | Français | Deutsch | Tiếng Viá»t | اÙعربÙØ© | ارد٠|
State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
ð¤ Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
These models can be applied on:
- ð Text, for tasks like text classification, information extraction, question answering, summarization, translation, and text generation, in over 100 languages.
- ð¼ï¸ Images, for tasks like image classification, object detection, and segmentation.
- ð£ï¸ Audio, for tasks like speech recognition and audio classification.
Transformer models can also perform tasks on several modalities combined, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
ð¤ Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
ð¤ Transformers is backed by the three most popular deep learning libraries â Jax, PyTorch and TensorFlow â with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.
Online demos
You can test most of our models directly on their pages from the model hub. We also offer private model hosting, versioning, & an inference API for public and private models.
Here are a few examples:
In Natural Language Processing:
- Masked word completion with BERT
- Named Entity Recognition with Electra
- Text generation with Mistral
- Natural Language Inference with RoBERTa
- Summarization with BART
- Question answering with DistilBERT
- Translation with T5
In Computer Vision:
- Image classification with ViT
- Object Detection with DETR
- Semantic Segmentation with SegFormer
- Panoptic Segmentation with Mask2Former
- Depth Estimation with Depth Anything
- Video Classification with VideoMAE
- Universal Segmentation with OneFormer
In Audio:
- Automatic Speech Recognition with Whisper
- Keyword Spotting with Wav2Vec2
- Audio Classification with Audio Spectrogram Transformer
In Multimodal tasks:
- Table Question Answering with TAPAS
- Visual Question Answering with ViLT
- Image captioning with LLaVa
- Zero-shot Image Classification with SigLIP
- Document Question Answering with LayoutLM
- Zero-shot Video Classification with X-CLIP
- Zero-shot Object Detection with OWLv2
- Zero-shot Image Segmentation with CLIPSeg
- Automatic Mask Generation with SAM
100 projects using Transformers
Transformers is more than a toolkit to use pretrained models: it's a community of projects built around it and the Hugging Face Hub. We want Transformers to enable developers, researchers, students, professors, engineers, and anyone else to build their dream projects.
In order to celebrate the 100,000 stars of transformers, we have decided to put the spotlight on the community, and we have created the awesome-transformers page which lists 100 incredible projects built in the vicinity of transformers.
If you own or use a project that you believe should be part of the list, please open a PR to add it!
Serious about AI in your organisation? Build faster with the Hugging Face Enterprise Hub.
Quick tour
To immediately use a model on a given input (text, image, audio, ...), we provide the pipeline
API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
>>> from transformers import pipeline
# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]
The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here, the answer is "positive" with a confidence of 99.97%.
Many tasks have a pre-trained pipeline
ready to go, in NLP but also in computer vision and speech. For example, we can easily extract detected objects in an image:
>>> import requests
>>> from PIL import Image
>>> from transformers import pipeline
# Download an image with cute cats
>>> url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/coco_sample.png"
>>> image_data = requests.get(url, stream=True).raw
>>> image = Image.open(image_data)
# Allocate a pipeline for object detection
>>> object_detector = pipeline('object-detection')
>>> object_detector(image)
[{'score': 0.9982201457023621,
'label': 'remote',
'box': {'xmin': 40, 'ymin': 70, 'xmax': 175, 'ymax': 117}},
{'score': 0.9960021376609802,
'label': 'remote',
'box': {'xmin': 333, 'ymin': 72, 'xmax': 368, 'ymax': 187}},
{'score': 0.9954745173454285,
'label': 'couch',
'box': {'xmin': 0, 'ymin': 1, 'xmax': 639, 'ymax': 473}},
{'score': 0.9988006353378296,
'label': 'cat',
'box': {'xmin': 13, 'ymin': 52, 'xmax': 314, 'ymax': 470}},
{'score': 0.9986783862113953,
'label': 'cat',
'box': {'xmin': 345, 'ymin': 23, 'xmax': 640, 'ymax': 368}}]
Here, we get a list of objects detected in the image, with a box surrounding the object and a confidence score. Here is the original image on the left, with the predictions displayed on the right:
You can learn more about the tasks supported by the pipeline
API in this tutorial.
In addition to pipeline
, to download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:
>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = AutoModel.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)
And here is the equivalent code for TensorFlow:
>>> from transformers import AutoTokenizer, TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("google-bert/bert-base-uncased")
>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)
The tokenizer is responsible for all the preprocessing the pretrained model expects and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.
The model itself is a regular Pytorch nn.Module
or a TensorFlow tf.keras.Model
(depending on your backend) which you can use as usual. This tutorial explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our Trainer
API to quickly fine-tune on a new dataset.
Why should I use transformers?
-
Easy-to-use state-of-the-art models:
- High performance on natural language understanding & generation, computer vision, and audio tasks.
- Low barrier to entry for educators and practitioners.
- Few user-facing abstractions with just three classes to learn.
- A unified API for using all our pretrained models.
-
Lower compute costs, smaller carbon footprint:
- Researchers can share trained models instead of always retraining.
- Practitioners can reduce compute time and production costs.
- Dozens of architectures with over 400,000 pretrained models across all modalities.
-
Choose the right framework for every part of a model's lifetime:
- Train state-of-the-art models in 3 lines of code.
- Move a single model between TF2.0/PyTorch/JAX frameworks at will.
- Seamlessly pick the right framework for training, evaluation, and production.
-
Easily customize a model or an example to your needs:
- We provide examples for each architecture to reproduce the results published by its original authors.
- Model internals are exposed as consistently as possible.
- Model files can be used independently of the library for quick experiments.
Why shouldn't I use transformers?
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library (possibly, Accelerate).
- While we strive to present as many use cases as possible, the scripts in our examples folder are just that: examples. It is expected that they won't work out-of-the-box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.
Installation
With pip
This repository is tested on Python 3.9+, Flax 0.4.1+, PyTorch 1.11+, and TensorFlow 2.6+.
You should install ð¤ Transformers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.
First, create a virtual environment with the version of Python you're going to use and activate it.
Then, you will need to install at least one of Flax, PyTorch, or TensorFlow. Please refer to TensorFlow installation page, PyTorch installation page and/or Flax and Jax installation pages regarding the specific installation command for your platform.
When one of those backends has been installed, ð¤ Transformers can be installed using pip as follows:
pip install transformers
If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must install the library from source.
With conda
ð¤ Transformers can be installed using conda as follows:
conda install conda-forge::transformers
NOTE: Installing
transformers
from thehuggingface
channel is deprecated.
Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.
NOTE: On Windows, you may be prompted to activate Developer Mode in order to benefit from caching. If this is not an option for you, please let us know in this issue.
Model architectures
All the model checkpoints provided by ð¤ Transformers are seamlessly integrated from the huggingface.co model hub, where they are uploaded directly by users and organizations.
Current number of checkpoints:
ð¤ Transformers currently provides the following architectures: see here for a high-level summary of each them.
To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the ð¤ Tokenizers library, refer to this table.
These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the documentation.
Learn more
Section | Description |
---|---|
Documentation | Full API documentation and tutorials |
Task summary | Tasks supported by ð¤ Transformers |
Preprocessing tutorial | Using the Tokenizer class to prepare data for the models |
Training and fine-tuning | Using the models provided by ð¤ Transformers in a PyTorch/TensorFlow training loop and the Trainer API |
Quick tour: Fine-tuning/usage scripts | Example scripts for fine-tuning models on a wide range of tasks |
Model sharing and uploading | Upload and share your fine-tuned models with the community |
Citation
We now have a paper you can cite for the ð¤ Transformers library:
@inproceedings{wolf-etal-2020-transformers,
title = "Transformers: State-of-the-Art Natural Language Processing",
author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
month = oct,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
pages = "38--45"
}
Top Related Projects
An Open Source Machine Learning Framework for Everyone
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
TensorFlow code and pre-trained models for BERT
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot