stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
Top Related Projects
Instruct-tune LLaMA on consumer hardware
A Gradio web UI for Large Language Models.
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
StableLM: Stability AI Language Models
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Quick Overview
Stanford Alpaca is an open-source project that aims to create a reproducible instruction-following language model based on the LLaMA 7B model. It provides a dataset and fine-tuning process to train a model capable of following instructions in a conversational manner, similar to ChatGPT.
Pros
- Open-source and freely available for research and non-commercial use
- Provides a reproducible approach to fine-tuning large language models
- Offers a high-quality dataset for instruction-following tasks
- Demonstrates good performance with relatively small model size (7B parameters)
Cons
- Requires access to the LLaMA base model, which has restricted availability
- Limited to non-commercial use due to licensing constraints
- May not perform as well as larger, more advanced models like GPT-3 or GPT-4
- Potential ethical concerns regarding the generation of synthetic data
Code Examples
# Load the Alpaca model
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tatsu-lab/alpaca-7b")
model = AutoModelForCausalLM.from_pretrained("tatsu-lab/alpaca-7b")
# Generate a response to an instruction
instruction = "Explain the concept of quantum entanglement in simple terms."
input_text = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
# Fine-tune the model on custom data
from datasets import load_dataset
from transformers import Trainer, TrainingArguments
dataset = load_dataset("your_custom_dataset")
training_args = TrainingArguments(output_dir="./results", num_train_epochs=3, per_device_train_batch_size=4)
trainer = Trainer(model=model, args=training_args, train_dataset=dataset)
trainer.train()
Getting Started
-
Clone the repository:
git clone https://github.com/tatsu-lab/stanford_alpaca.git cd stanford_alpaca
-
Install dependencies:
pip install -r requirements.txt
-
Download the LLaMA model (requires approval from Meta AI)
-
Fine-tune the model:
python train.py --model_name_or_path /path/to/llama/model --data_path ./alpaca_data.json
-
Use the fine-tuned model as shown in the code examples above.
Competitor Comparisons
Instruct-tune LLaMA on consumer hardware
Pros of alpaca-lora
- Uses LoRA for more efficient fine-tuning, requiring less computational resources
- Supports various model sizes (7B, 13B, 30B, 65B) and quantization options
- Includes scripts for inference and interactive chat
Cons of alpaca-lora
- May have slightly lower performance compared to full fine-tuning
- Requires additional steps to merge LoRA weights with the base model
Code Comparison
alpaca-lora:
model = PeftModel.from_pretrained(
model, lora_weights, torch_dtype=torch.float16
)
model = model.merge_and_unload()
stanford_alpaca:
model = LlamaForCausalLM.from_pretrained(
model_name, torch_dtype=torch.float16, device_map="auto"
)
model.load_adapter(checkpoint_dir)
The alpaca-lora code demonstrates the process of loading LoRA weights and merging them with the base model, while stanford_alpaca loads a fully fine-tuned model directly.
A Gradio web UI for Large Language Models.
Pros of text-generation-webui
- User-friendly web interface for interacting with various language models
- Supports multiple models and architectures (e.g., GPT-J, LLaMA, OPT)
- Offers advanced features like chat, instruct mode, and notebook interface
Cons of text-generation-webui
- Requires more setup and dependencies compared to stanford_alpaca
- May have higher resource requirements due to its extensive features
- Less focused on a specific model or training approach
Code Comparison
text-generation-webui:
def generate_reply(
question, history, generate_params, stopping_strings=None
):
generate_params = generate_params.copy()
generate_params['stopping_strings'] = stopping_strings
return model.generate(question, history, generate_params)
stanford_alpaca:
def generate_prompt(instruction, input=None):
if input:
return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
else:
return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
Pros of Open-Assistant
- Larger community-driven project with more contributors and broader scope
- Focuses on creating a fully open-source assistant, including training data and model
- Supports multiple languages and aims for multilingual capabilities
Cons of Open-Assistant
- More complex project structure and potentially slower development due to its scale
- May require more computational resources for training and fine-tuning
- Less focused on a specific use case compared to Stanford Alpaca
Code Comparison
Open-Assistant:
from oasst_api import OAsstAPI
api = OAsstAPI()
response = api.generate_response("Tell me a joke")
print(response.text)
Stanford Alpaca:
from alpaca_lora import AlpacaLora
model = AlpacaLora.from_pretrained("stanford-alpaca")
response = model.generate("Tell me a joke")
print(response)
Both projects aim to create open-source language models, but Open-Assistant has a broader scope and community-driven approach, while Stanford Alpaca focuses on fine-tuning existing models with specific techniques. Open-Assistant's code emphasizes API usage, while Stanford Alpaca's code highlights model loading and generation.
StableLM: Stability AI Language Models
Pros of StableLM
- Larger model sizes available (up to 65B parameters)
- More recent training data, potentially better performance on current topics
- Open-source license allows for commercial use and modification
Cons of StableLM
- Less focused on instruction-following compared to Stanford Alpaca
- May require more fine-tuning for specific tasks
- Larger models demand more computational resources
Code Comparison
Stanford Alpaca:
def generate_prompt(instruction, input=None):
if input:
return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
else:
return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
StableLM:
def generate_prompt(prompt):
return f"<|SYSTEM|>You are a helpful AI assistant.<|USER|>{prompt}<|ASSISTANT|>"
The code snippets show different approaches to prompt generation, with Stanford Alpaca using a more structured format for instructions and inputs, while StableLM uses a simpler system-user-assistant format.
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
Pros of Dolly
- Open-source model with a more permissive license (Apache 2.0)
- Includes training code and datasets, enabling full reproducibility
- Supports multiple model sizes and architectures
Cons of Dolly
- Smaller community and less active development
- Limited to English language support
- Fewer pre-trained models available
Code Comparison
Stanford Alpaca:
def generate_prompt(instruction, input=None):
if input:
return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
else:
return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
Dolly:
def format_prompt(instruction, context=None, response=None):
prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"
if context:
prompt = f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Response:\n"
if response:
prompt += f"{response}"
return prompt
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Pros of FastChat
- More comprehensive and feature-rich, offering a complete chatbot system
- Actively maintained with frequent updates and improvements
- Supports multiple models and architectures beyond just Alpaca
Cons of FastChat
- More complex setup and configuration required
- Potentially higher resource requirements due to additional features
- Steeper learning curve for beginners
Code Comparison
Stanford Alpaca:
def generate_prompt(instruction, input=None):
if input:
return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
else:
return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
FastChat:
def get_conv_template(template_name):
if template_name == "vicuna_v1.1":
return get_conv_template("vicuna").copy()
elif template_name not in conv_templates:
raise ValueError(f"Template {template_name} not found")
return conv_templates[template_name].copy()
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Stanford Alpaca: An Instruction-following LLaMA Model
This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:
- The 52K data used for fine-tuning the model.
- The code for generating the data.
- The code for fine-tuning the model.
- The code for recovering Alpaca-7B weights from our released weight diff.
Note: We thank the community for feedback on Stanford-Alpaca and supporting our research. Our live demo is suspended until further notice.
Usage and License Notices: Alpaca is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. The weight diff is also CC BY NC 4.0 (allowing only non-commercial use).
Overview
The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section.
In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the text-davinci-003
model on the Self-Instruct instruction-following evaluation suite [2].
Alpaca is still under development, and there are many limitations that have to be addressed. Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless. We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.
Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.
Please read our release blog post for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process for releasing a reproducible model.
[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1
[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560
Data Release
alpaca_data.json
contains 52K instruction-following data we used for fine-tuning the Alpaca model.
This JSON file is a list of dictionaries, each dictionary contains the following fields:
instruction
:str
, describes the task the model should perform. Each of the 52K instructions is unique.input
:str
, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.output
:str
, the answer to the instruction as generated bytext-davinci-003
.
We used the following prompts for fine-tuning the Alpaca model:
- for examples with a non-empty input field:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
- for examples with an empty input field:
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:
During inference (eg for the web demo), we use the user instruction with an empty input field (second option).
Data Generation Process
Running the code
- Set environment variables
OPENAI_API_KEY
to your OpenAI API key. - Install the dependencies with
pip install -r requirements.txt
. - Run
python -m generate_instruction generate_instruction_following_data
to generate the data.
We built on the data generation pipeline from self-instruct and made the following modifications:
- We used
text-davinci-003
to generate the instruction data instead ofdavinci
. - We wrote a new prompt (
prompt.txt
) that explicitly gave the requirement of instruction generation totext-davinci-003
. Note: there is a slight error in the prompt we used, and future users should incorporate the edit in https://github.com/tatsu-lab/stanford_alpaca/pull/24 - We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
- We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
- We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].
This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500). In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by self-instruct. We plot the below figure (in the style of Figure 2 in the self-instruct paper to demonstrate the diversity of our data. The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.
Fine-tuning
We fine-tune our models using standard Hugging Face training code. We fine-tune LLaMA-7B and LLaMA-13B with the following hyperparameters:
Hyperparameter | LLaMA-7B | LLaMA-13B |
---|---|---|
Batch size | 128 | 128 |
Learning rate | 2e-5 | 1e-5 |
Epochs | 3 | 5 |
Max length | 512 | 512 |
Weight decay | 0 | 0 |
To reproduce our fine-tuning runs for LLaMA, first install the requirements
pip install -r requirements.txt
Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP full_shard
mode.
We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using Python 3.10.
Replace <your_random_port>
with a port of your own, <your_path_to_hf_converted_llama_ckpt_and_tokenizer>
with the
path to your converted checkpoint and tokenizer (following instructions in the PR), and <your_output_dir>
with where you want to store your outputs.
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
--model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
--data_path ./alpaca_data.json \
--bf16 True \
--output_dir <your_output_dir> \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True
The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
--model_name_or_path "facebook/opt-6.7b" \
--data_path ./alpaca_data.json \
--bf16 True \
--output_dir <your_output_dir> \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 2000 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \
--tf32 True
Note the given training script is meant to be simple and easy to use, and is not particularly optimized.
To run on more gpus, you may prefer to turn down gradient_accumulation_steps
to keep a global batch size of 128. Global batch size has not been tested for optimality.
Addressing OOM
Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you'd like to further reduce the memory footprint, here are some options:
- Turn on CPU offload for FSDP with
--fsdp "full_shard auto_wrap offload"
. This saves VRAM at the cost of longer runtime. - In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here's an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:
pip install deepspeed torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \ --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \ --data_path ./alpaca_data.json \ --bf16 True \ --output_dir <your_output_dir> \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps 8 \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 2000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --deepspeed "./configs/default_offload_opt_param.json" \ --tf32 True
- The DeepSpeed library also provides some helpful functions to estimate memory usage.
- LoRA fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the peft codebase can be a useful resource.
Recovering Alpaca Weights
The weight diff between Alpaca-7B and LLaMA-7B is located here. To recover the original Alpaca-7B weights, follow these steps:
1. Convert Meta's released weights into huggingface format. Follow this guide:
https://huggingface.co/docs/transformers/main/model_doc/llama
2. Make sure you cloned the released weight diff into your local machine. The weight diff is located at:
https://huggingface.co/tatsu-lab/alpaca-7b/tree/main
3. Run this function with the correct paths. E.g.,
python weight_diff.py recover --path_raw <path_to_step_1_dir> --path_diff <path_to_step_2_dir> --path_tuned <path_to_store_recovered_weights>
Once step 3 completes, you should have a directory with the recovered weights, from which you can load the model like the following
import transformers
alpaca_model = transformers.AutoModelForCausalLM.from_pretrained("<path_to_store_recovered_weights>")
alpaca_tokenizer = transformers.AutoTokenizer.from_pretrained("<path_to_store_recovered_weights>")
Authors
All grad students below contributed equally and the order is determined by random draw.
All advised by Tatsunori B. Hashimoto. Yann is also advised by Percy Liang and Xuechen is also advised by Carlos Guestrin.
Citation
Please cite the repo if you use the data or code in this repo.
@misc{alpaca,
author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
title = {Stanford Alpaca: An Instruction-following LLaMA model},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}
Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].
Acknowledgements
We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot. We thank Yifan Mai for helpful support, and members of the Stanford NLP Group as well as the Center for Research on Foundation Models (CRFM) for their helpful feedback.
Top Related Projects
Instruct-tune LLaMA on consumer hardware
A Gradio web UI for Large Language Models.
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
StableLM: Stability AI Language Models
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot