Top Related Projects
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Quick Overview
PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting pre-trained language models to various downstream applications. It provides state-of-the-art parameter-efficient fine-tuning methods, allowing users to fine-tune large language models with significantly fewer trainable parameters.
Pros
- Reduces computational resources required for fine-tuning large language models
- Supports multiple parameter-efficient methods like LoRA, Prefix Tuning, and P-Tuning
- Integrates seamlessly with Hugging Face's Transformers library
- Enables fine-tuning of models that wouldn't fit in GPU memory otherwise
Cons
- Limited to specific architectures and tasks supported by the library
- May require some understanding of parameter-efficient methods for optimal use
- Performance improvements can vary depending on the specific task and model
- Documentation could be more comprehensive for advanced use cases
Code Examples
- Loading a pre-trained model and applying LoRA:
from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig, TaskType
model = AutoModelForCausalLM.from_pretrained("gpt2")
peft_config = LoraConfig(task_type=TaskType.CAUSAL_LM, r=8, lora_alpha=32, lora_dropout=0.1)
peft_model = get_peft_model(model, peft_config)
- Training a PEFT model:
from transformers import Trainer, TrainingArguments
trainer = Trainer(
model=peft_model,
args=TrainingArguments(output_dir="output", num_train_epochs=3),
train_dataset=train_dataset,
)
trainer.train()
- Saving and loading a PEFT model:
peft_model.save_pretrained("peft_model")
from peft import PeftModel, PeftConfig
config = PeftConfig.from_pretrained("peft_model")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
peft_model = PeftModel.from_pretrained(model, "peft_model")
Getting Started
To get started with PEFT, first install the library:
pip install peft
Then, you can use PEFT with a pre-trained model:
from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig, TaskType
model = AutoModelForCausalLM.from_pretrained("gpt2")
peft_config = LoraConfig(task_type=TaskType.CAUSAL_LM, r=8, lora_alpha=32, lora_dropout=0.1)
peft_model = get_peft_model(model, peft_config)
# Your fine-tuning code here
This sets up a GPT-2 model with LoRA for parameter-efficient fine-tuning. You can then proceed with your specific task and training loop.
Competitor Comparisons
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Pros of LoRA
- Original implementation of the LoRA technique, potentially offering more direct alignment with the research paper
- Focused specifically on LoRA, which may result in a more streamlined and optimized implementation
- Includes examples and implementations for various model architectures beyond language models
Cons of LoRA
- Less actively maintained compared to PEFT
- Narrower scope, focusing primarily on LoRA rather than a broader range of parameter-efficient fine-tuning techniques
- May have less integration with popular deep learning frameworks and ecosystems
Code Comparison
PEFT:
from peft import get_peft_model, LoraConfig
peft_config = LoraConfig(task_type="CAUSAL_LM", r=8, lora_alpha=32, lora_dropout=0.1)
model = get_peft_model(model, peft_config)
LoRA:
from lora import LoRALinear
lora_linear = LoRALinear(in_features, out_features, r=8, lora_alpha=32)
output = lora_linear(input_tensor)
Both repositories provide implementations of the LoRA technique, but PEFT offers a more comprehensive toolkit for parameter-efficient fine-tuning, while LoRA focuses specifically on the LoRA method. PEFT is more actively maintained and integrated with the Hugging Face ecosystem, making it potentially easier to use in conjunction with popular pre-trained models and libraries.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Pros of DeepSpeed
- Offers more comprehensive optimization techniques, including ZeRO, pipeline parallelism, and 3D parallelism
- Provides better support for distributed training across multiple GPUs and nodes
- Includes advanced features like DeepSpeed Inference for optimized model serving
Cons of DeepSpeed
- Steeper learning curve and more complex setup compared to PEFT
- Less focus on parameter-efficient fine-tuning methods
- May be overkill for smaller-scale projects or simpler fine-tuning tasks
Code Comparison
PEFT example:
from peft import get_peft_model, LoraConfig
peft_config = LoraConfig(task_type="SEQ_CLS", r=8, lora_alpha=32, lora_dropout=0.1)
peft_model = get_peft_model(model, peft_config)
DeepSpeed example:
import deepspeed
model_engine, optimizer, _, _ = deepspeed.initialize(args=args,
model=model,
model_parameters=params)
Both libraries aim to improve the efficiency of training large language models, but they focus on different aspects. PEFT specializes in parameter-efficient fine-tuning techniques, while DeepSpeed offers a broader range of optimization and distributed training features. The choice between them depends on the specific requirements of your project and the scale of your training infrastructure.
Pros of t5x
- Specialized for T5 models, offering optimized performance and features
- Integrated with JAX and Flax for efficient training on TPUs
- Comprehensive documentation and examples for T5-specific tasks
Cons of t5x
- Limited to T5 architecture, less flexible for other model types
- Steeper learning curve due to JAX/Flax ecosystem
- Less active community compared to PEFT
Code Comparison
t5x example:
import jax
from t5x import models
from t5x import utils
model = models.EncoderDecoderModel(...)
trainer = utils.Trainer(model=model, ...)
trainer.train(...)
PEFT example:
from transformers import AutoModelForSeq2SeqLM
from peft import get_peft_model, LoraConfig
model = AutoModelForSeq2SeqLM.from_pretrained("t5-base")
peft_config = LoraConfig(task_type="SEQ_2_SEQ_LM", ...)
model = get_peft_model(model, peft_config)
Summary
t5x is tailored for T5 models with JAX/Flax integration, offering optimized performance but limited flexibility. PEFT provides a more versatile approach for various model types, with easier integration into existing workflows. The choice between them depends on specific project requirements and familiarity with the respective ecosystems.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- Comprehensive toolkit for sequence modeling tasks
- Supports a wide range of architectures and pre-trained models
- Highly optimized for performance and scalability
Cons of fairseq
- Steeper learning curve due to its extensive features
- Less focus on parameter-efficient fine-tuning techniques
- May be overkill for simpler NLP tasks
Code Comparison
fairseq:
from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('/path/to/model')
translated = model.translate('Hello world!')
PEFT:
from peft import get_peft_model, LoraConfig
peft_config = LoraConfig(task_type="SEQ_CLS", r=8, lora_alpha=32)
peft_model = get_peft_model(model, peft_config)
Key Differences
- fairseq is a comprehensive toolkit for various sequence modeling tasks, while PEFT focuses specifically on parameter-efficient fine-tuning techniques.
- fairseq offers a wider range of pre-trained models and architectures, whereas PEFT provides methods to efficiently adapt existing models.
- PEFT is designed to work seamlessly with Hugging Face's Transformers library, making it easier to integrate into existing workflows.
- fairseq is more suitable for large-scale projects and research, while PEFT is ideal for fine-tuning models with limited computational resources.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Pros of gpt-neox
- Designed specifically for training large language models
- Includes optimizations for distributed training on multiple GPUs
- Provides a complete framework for model architecture, training, and inference
Cons of gpt-neox
- More complex setup and configuration compared to PEFT
- Less flexible for fine-tuning pre-trained models
- Requires more computational resources for training
Code Comparison
gpt-neox:
from megatron.neox_arguments import NeoXArgs
from megatron.global_vars import set_global_variables, get_args
from megatron.training import pretrain
args = NeoXArgs.from_ymls("configs/your_config.yml")
set_global_variables(args)
pretrain(get_args())
PEFT:
from transformers import AutoModelForCausalLM
from peft import get_peft_model, LoraConfig
model = AutoModelForCausalLM.from_pretrained("base_model")
peft_config = LoraConfig(task_type="CAUSAL_LM", r=8, lora_alpha=32, lora_dropout=0.1)
peft_model = get_peft_model(model, peft_config)
gpt-neox is more suitable for training large language models from scratch, while PEFT is designed for efficient fine-tuning of pre-trained models. gpt-neox offers more control over the training process but requires more setup, while PEFT provides a simpler interface for quick adaptations of existing models.
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Pros of OpenNMT-py
- Comprehensive toolkit for neural machine translation
- Supports a wide range of architectures and features
- Well-documented with extensive examples and tutorials
Cons of OpenNMT-py
- Steeper learning curve for beginners
- Less focus on parameter-efficient fine-tuning techniques
- May require more computational resources for training
Code Comparison
OpenNMT-py:
import onmt
model = onmt.models.build_model(opt, model_opt, fields, checkpoint)
trainer = onmt.Trainer(model, train_loss, valid_loss, optim, trunc_size)
trainer.train(train_iter, valid_iter, train_steps, valid_steps)
PEFT:
from peft import get_peft_model, LoraConfig, TaskType
peft_config = LoraConfig(task_type=TaskType.SEQ_2_SEQ_LM, r=8, lora_alpha=32, lora_dropout=0.1)
model = get_peft_model(model, peft_config)
model.train()
While OpenNMT-py offers a complete solution for neural machine translation, PEFT focuses on efficient fine-tuning of large language models. OpenNMT-py provides more flexibility in model architecture, while PEFT excels in parameter-efficient adaptation of pre-trained models.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ð¤ PEFT
State-of-the-art Parameter-Efficient Fine-Tuning (PEFT) methods
Fine-tuning large pretrained models is often prohibitively costly due to their scale. Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of large pretrained models to various downstream applications by only fine-tuning a small number of (extra) model parameters instead of all the model's parameters. This significantly decreases the computational and storage costs. Recent state-of-the-art PEFT techniques achieve performance comparable to fully fine-tuned models.
PEFT is integrated with Transformers for easy model training and inference, Diffusers for conveniently managing different adapters, and Accelerate for distributed training and inference for really big models.
[!TIP] Visit the PEFT organization to read about the PEFT methods implemented in the library and to see notebooks demonstrating how to apply these methods to a variety of downstream tasks. Click the "Watch repos" button on the organization page to be notified of newly implemented methods and notebooks!
Check the PEFT Adapters API Reference section for a list of supported PEFT methods, and read the Adapters, Soft prompts, and IA3 conceptual guides to learn more about how these methods work.
Quickstart
Install PEFT from pip:
pip install peft
Prepare a model for training with a PEFT method such as LoRA by wrapping the base model and PEFT configuration with get_peft_model
. For the bigscience/mt0-large model, you're only training 0.19% of the parameters!
from transformers import AutoModelForSeq2SeqLM
from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
model_name_or_path = "bigscience/mt0-large"
tokenizer_name_or_path = "bigscience/mt0-large"
peft_config = LoraConfig(
task_type=TaskType.SEQ_2_SEQ_LM, inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1
)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name_or_path)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
"trainable params: 2359296 || all params: 1231940608 || trainable%: 0.19151053100118282"
To load a PEFT model for inference:
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch
model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora").to("cuda")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-350m")
model.eval()
inputs = tokenizer("Preheat the oven to 350 degrees and place the cookie dough", return_tensors="pt")
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=50)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])
"Preheat the oven to 350 degrees and place the cookie dough in the center of the oven. In a large bowl, combine the flour, baking powder, baking soda, salt, and cinnamon. In a separate bowl, combine the egg yolks, sugar, and vanilla."
Why you should use PEFT
There are many benefits of using PEFT but the main one is the huge savings in compute and storage, making PEFT applicable to many different use cases.
High performance on consumer hardware
Consider the memory requirements for training the following models on the ought/raft/twitter_complaints dataset with an A100 80GB GPU with more than 64GB of CPU RAM.
Model | Full Finetuning | PEFT-LoRA PyTorch | PEFT-LoRA DeepSpeed with CPU Offloading |
---|---|---|---|
bigscience/T0_3B (3B params) | 47.14GB GPU / 2.96GB CPU | 14.4GB GPU / 2.96GB CPU | 9.8GB GPU / 17.8GB CPU |
bigscience/mt0-xxl (12B params) | OOM GPU | 56GB GPU / 3GB CPU | 22GB GPU / 52GB CPU |
bigscience/bloomz-7b1 (7B params) | OOM GPU | 32GB GPU / 3.8GB CPU | 18.1GB GPU / 35GB CPU |
With LoRA you can fully finetune a 12B parameter model that would've otherwise run out of memory on the 80GB GPU, and comfortably fit and train a 3B parameter model. When you look at the 3B parameter model's performance, it is comparable to a fully finetuned model at a fraction of the GPU memory.
Submission Name | Accuracy |
---|---|
Human baseline (crowdsourced) | 0.897 |
Flan-T5 | 0.892 |
lora-t0-3b | 0.863 |
[!TIP] The bigscience/T0_3B model performance isn't optimized in the table above. You can squeeze even more performance out of it by playing around with the input instruction templates, LoRA hyperparameters, and other training related hyperparameters. The final checkpoint size of this model is just 19MB compared to 11GB of the full bigscience/T0_3B model. Learn more about the advantages of finetuning with PEFT in this blog post.
Quantization
Quantization is another method for reducing the memory requirements of a model by representing the data in a lower precision. It can be combined with PEFT methods to make it even easier to train and load LLMs for inference.
- Learn how to finetune meta-llama/Llama-2-7b-hf with QLoRA and the TRL library on a 16GB GPU in the Finetune LLMs on your own consumer hardware using tools from PyTorch and Hugging Face ecosystem blog post.
- Learn how to finetune a openai/whisper-large-v2 model for multilingual automatic speech recognition with LoRA and 8-bit quantization in this notebook (see this notebook instead for an example of streaming a dataset).
Save compute and storage
PEFT can help you save storage by avoiding full finetuning of models on each of downstream task or dataset. In many cases, you're only finetuning a very small fraction of a model's parameters and each checkpoint is only a few MBs in size (instead of GBs). These smaller PEFT adapters demonstrate performance comparable to a fully finetuned model. If you have many datasets, you can save a lot of storage with a PEFT model and not have to worry about catastrophic forgetting or overfitting the backbone or base model.
PEFT integrations
PEFT is widely supported across the Hugging Face ecosystem because of the massive efficiency it brings to training and inference.
Diffusers
The iterative diffusion process consumes a lot of memory which can make it difficult to train. PEFT can help reduce the memory requirements and reduce the storage size of the final model checkpoint. For example, consider the memory required for training a Stable Diffusion model with LoRA on an A100 80GB GPU with more than 64GB of CPU RAM. The final model checkpoint size is only 8.8MB!
Model | Full Finetuning | PEFT-LoRA | PEFT-LoRA with Gradient Checkpointing |
---|---|---|---|
CompVis/stable-diffusion-v1-4 | 27.5GB GPU / 3.97GB CPU | 15.5GB GPU / 3.84GB CPU | 8.12GB GPU / 3.77GB CPU |
[!TIP] Take a look at the examples/lora_dreambooth/train_dreambooth.py training script to try training your own Stable Diffusion model with LoRA, and play around with the smangrul/peft-lora-sd-dreambooth Space which is running on a T4 instance. Learn more about the PEFT integration in Diffusers in this tutorial.
Accelerate
Accelerate is a library for distributed training and inference on various training setups and hardware (GPUs, TPUs, Apple Silicon, etc.). PEFT models work with Accelerate out of the box, making it really convenient to train really large models or use them for inference on consumer hardware with limited resources.
TRL
PEFT can also be applied to training LLMs with RLHF components such as the ranker and policy. Get started by reading:
- Fine-tune a Mistral-7b model with Direct Preference Optimization with PEFT and the TRL library to learn more about the Direct Preference Optimization (DPO) method and how to apply it to a LLM.
- Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU with PEFT and the TRL library, and then try out the gpt2-sentiment_peft.ipynb notebook to optimize GPT2 to generate positive movie reviews.
- StackLLaMA: A hands-on guide to train LLaMA with RLHF with PEFT, and then try out the stack_llama/scripts for supervised finetuning, reward modeling, and RL finetuning.
Model support
Use this Space or check out the docs to find which models officially support a PEFT method out of the box. Even if you don't see a model listed below, you can manually configure the model config to enable PEFT for a model. Read the New transformers architecture guide to learn how.
Contribute
If you would like to contribute to PEFT, please check out our contribution guide.
Citing ð¤ PEFT
To use ð¤ PEFT in your publication, please cite it by using the following BibTeX entry.
@Misc{peft,
title = {PEFT: State-of-the-art Parameter-Efficient Fine-Tuning methods},
author = {Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan},
howpublished = {\url{https://github.com/huggingface/peft}},
year = {2022}
}
Top Related Projects
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Open Source Neural Machine Translation and (Large) Language Models in PyTorch
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot