stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

30,098

4,042

30,098

188

View on GitHub

Top Related Projects

alpaca-lora

18,931

Instruct-tune LLaMA on consumer hardware

text-generation-webui

44,456

LLM UI with advanced features, easy setup, and multiple backend support.

Open-Assistant

37,391

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

StableLM

15,825

StableLM: Stability AI Language Models

dolly

10,808

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

FastChat

38,912

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Quick Overview

Stanford Alpaca is an open-source project that aims to create a reproducible instruction-following language model based on the LLaMA 7B model. It provides a dataset and fine-tuning process to train a model capable of following instructions in a conversational manner, similar to ChatGPT.

Pros

Open-source and freely available for research and non-commercial use
Provides a reproducible approach to fine-tuning large language models
Offers a high-quality dataset for instruction-following tasks
Demonstrates good performance with relatively small model size (7B parameters)

Cons

Requires access to the LLaMA base model, which has restricted availability
Limited to non-commercial use due to licensing constraints
May not perform as well as larger, more advanced models like GPT-3 or GPT-4
Potential ethical concerns regarding the generation of synthetic data

Code Examples

# Load the Alpaca model
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tatsu-lab/alpaca-7b")
model = AutoModelForCausalLM.from_pretrained("tatsu-lab/alpaca-7b")

# Generate a response to an instruction
instruction = "Explain the concept of quantum entanglement in simple terms."
input_text = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# Fine-tune the model on custom data
from datasets import load_dataset
from transformers import Trainer, TrainingArguments

dataset = load_dataset("your_custom_dataset")
training_args = TrainingArguments(output_dir="./results", num_train_epochs=3, per_device_train_batch_size=4)
trainer = Trainer(model=model, args=training_args, train_dataset=dataset)
trainer.train()

Getting Started

Clone the repository:

git clone https://github.com/tatsu-lab/stanford_alpaca.git
cd stanford_alpaca

Install dependencies:
```
pip install -r requirements.txt
```
Download the LLaMA model (requires approval from Meta AI)

Fine-tune the model:

python train.py --model_name_or_path /path/to/llama/model --data_path ./alpaca_data.json

Use the fine-tuned model as shown in the code examples above.

Competitor Comparisons

alpaca-lora

18,931

Instruct-tune LLaMA on consumer hardware

Pros of alpaca-lora

Uses LoRA for more efficient fine-tuning, requiring less computational resources
Supports various model sizes (7B, 13B, 30B, 65B) and quantization options
Includes scripts for inference and interactive chat

Cons of alpaca-lora

May have slightly lower performance compared to full fine-tuning
Requires additional steps to merge LoRA weights with the base model

Code Comparison

alpaca-lora:

model = PeftModel.from_pretrained(
    model, lora_weights, torch_dtype=torch.float16
)
model = model.merge_and_unload()

stanford_alpaca:

model = LlamaForCausalLM.from_pretrained(
    model_name, torch_dtype=torch.float16, device_map="auto"
)
model.load_adapter(checkpoint_dir)

The alpaca-lora code demonstrates the process of loading LoRA weights and merging them with the base model, while stanford_alpaca loads a fully fine-tuned model directly.

text-generation-webui

44,456

LLM UI with advanced features, easy setup, and multiple backend support.

Pros of text-generation-webui

User-friendly web interface for interacting with various language models
Supports multiple models and architectures (e.g., GPT-J, LLaMA, OPT)
Offers advanced features like chat, instruct mode, and notebook interface

Cons of text-generation-webui

Requires more setup and dependencies compared to stanford_alpaca
May have higher resource requirements due to its extensive features
Less focused on a specific model or training approach

Code Comparison

text-generation-webui:

def generate_reply(
    question, history, generate_params, stopping_strings=None
):
    generate_params = generate_params.copy()
    generate_params['stopping_strings'] = stopping_strings
    return model.generate(question, history, generate_params)

stanford_alpaca:

def generate_prompt(instruction, input=None):
    if input:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

Open-Assistant

37,391

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Pros of Open-Assistant

Larger community-driven project with more contributors and broader scope
Focuses on creating a fully open-source assistant, including training data and model
Supports multiple languages and aims for multilingual capabilities

Cons of Open-Assistant

More complex project structure and potentially slower development due to its scale
May require more computational resources for training and fine-tuning
Less focused on a specific use case compared to Stanford Alpaca

Code Comparison

Open-Assistant:

from oasst_api import OAsstAPI

api = OAsstAPI()
response = api.generate_response("Tell me a joke")
print(response.text)

Stanford Alpaca:

from alpaca_lora import AlpacaLora

model = AlpacaLora.from_pretrained("stanford-alpaca")
response = model.generate("Tell me a joke")
print(response)

Both projects aim to create open-source language models, but Open-Assistant has a broader scope and community-driven approach, while Stanford Alpaca focuses on fine-tuning existing models with specific techniques. Open-Assistant's code emphasizes API usage, while Stanford Alpaca's code highlights model loading and generation.

StableLM

15,825

StableLM: Stability AI Language Models

Pros of StableLM

Larger model sizes available (up to 65B parameters)
More recent training data, potentially better performance on current topics
Open-source license allows for commercial use and modification

Cons of StableLM

Less focused on instruction-following compared to Stanford Alpaca
May require more fine-tuning for specific tasks
Larger models demand more computational resources

Code Comparison

Stanford Alpaca:

def generate_prompt(instruction, input=None):
    if input:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

StableLM:

def generate_prompt(prompt):
    return f"<|SYSTEM|>You are a helpful AI assistant.<|USER|>{prompt}<|ASSISTANT|>"

The code snippets show different approaches to prompt generation, with Stanford Alpaca using a more structured format for instructions and inputs, while StableLM uses a simpler system-user-assistant format.

dolly

10,808

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

Pros of Dolly

Open-source model with a more permissive license (Apache 2.0)
Includes training code and datasets, enabling full reproducibility
Supports multiple model sizes and architectures

Cons of Dolly

Smaller community and less active development
Limited to English language support
Fewer pre-trained models available

Code Comparison

Stanford Alpaca:

def generate_prompt(instruction, input=None):
    if input:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

Dolly:

def format_prompt(instruction, context=None, response=None):
    prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:\n"
    if context:
        prompt = f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{context}\n\n### Response:\n"
    if response:
        prompt += f"{response}"
    return prompt

FastChat

38,912

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Pros of FastChat

More comprehensive and feature-rich, offering a complete chatbot system
Actively maintained with frequent updates and improvements
Supports multiple models and architectures beyond just Alpaca

Cons of FastChat

More complex setup and configuration required
Potentially higher resource requirements due to additional features
Steeper learning curve for beginners

Code Comparison

Stanford Alpaca:

def generate_prompt(instruction, input=None):
    if input:
        return f"Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"
    else:
        return f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"

FastChat:

def get_conv_template(template_name):
    if template_name == "vicuna_v1.1":
        return get_conv_template("vicuna").copy()
    elif template_name not in conv_templates:
        raise ValueError(f"Template {template_name} not found")
    return conv_templates[template_name].copy()

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Stanford Alpaca: An Instruction-following LLaMA Model

This is the repo for the Stanford Alpaca project, which aims to build and share an instruction-following LLaMA model. The repo contains:

The 52K data used for fine-tuning the model.
The code for generating the data.
The code for fine-tuning the model.
The code for recovering Alpaca-7B weights from our released weight diff.

Note: We thank the community for feedback on Stanford-Alpaca and supporting our research. Our live demo is suspended until further notice.

Usage and License Notices: Alpaca is intended and licensed for research use only. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes. The weight diff is also CC BY NC 4.0 (allowing only non-commercial use).

Overview

The current Alpaca model is fine-tuned from a 7B LLaMA model [1] on 52K instruction-following data generated by the techniques in the Self-Instruct [2] paper, with some modifications that we discuss in the next section. In a preliminary human evaluation, we found that the Alpaca 7B model behaves similarly to the text-davinci-003 model on the Self-Instruct instruction-following evaluation suite [2].

Alpaca is still under development, and there are many limitations that have to be addressed. Importantly, we have not yet fine-tuned the Alpaca model to be safe and harmless. We thus encourage users to be cautious when interacting with Alpaca, and to report any concerning behavior to help improve the safety and ethical considerations of the model.

Our initial release contains the data generation procedure, dataset, and training recipe. We intend to release the model weights if we are given permission to do so by the creators of LLaMA. For now, we have chosen to host a live demo to help readers better understand the capabilities and limits of Alpaca, as well as a way to help us better evaluate Alpaca's performance on a broader audience.

Please read our release blog post for more details about the model, our discussion of the potential harm and limitations of Alpaca models, and our thought process for releasing a reproducible model.

[1]: LLaMA: Open and Efficient Foundation Language Models. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, TimothÃ©e Lacroix, Baptiste RoziÃ¨re, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. https://arxiv.org/abs/2302.13971v1

[2]: Self-Instruct: Aligning Language Model with Self Generated Instructions. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. https://arxiv.org/abs/2212.10560

Data Release

alpaca_data.json contains 52K instruction-following data we used for fine-tuning the Alpaca model. This JSON file is a list of dictionaries, each dictionary contains the following fields:

instruction: str, describes the task the model should perform. Each of the 52K instructions is unique.
input: str, optional context or input for the task. For example, when the instruction is "Summarize the following article", the input is the article. Around 40% of the examples have an input.
output: str, the answer to the instruction as generated by text-davinci-003.

We used the following prompts for fine-tuning the Alpaca model:

for examples with a non-empty input field:

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Input:
{input}

### Response:

for examples with an empty input field:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{instruction}

### Response:

During inference (eg for the web demo), we use the user instruction with an empty input field (second option).

Data Generation Process

Running the code

Set environment variables OPENAI_API_KEY to your OpenAI API key.
Install the dependencies with pip install -r requirements.txt.
Run python -m generate_instruction generate_instruction_following_data to generate the data.

We built on the data generation pipeline from self-instruct and made the following modifications:

We used text-davinci-003 to generate the instruction data instead of davinci.
We wrote a new prompt (prompt.txt) that explicitly gave the requirement of instruction generation to text-davinci-003. Note: there is a slight error in the prompt we used, and future users should incorporate the edit in https://github.com/tatsu-lab/stanford_alpaca/pull/24
We adopted much more aggressive batch decoding, i.e., generating 20 instructions at once, which significantly reduced the cost of data generation.
We simplified the data generation pipeline by discarding the difference between classification and non-classification instructions.
We only generated a single instance for each instruction, instead of 2 to 3 instances as in [1].

This produced an instruction-following dataset with 52K examples obtained at a much lower cost (less than $500). In a preliminary study, we also find our 52K generated data to be much more diverse than the data released by self-instruct. We plot the below figure (in the style of Figure 2 in the self-instruct paper to demonstrate the diversity of our data. The inner circle of the plot represents the root verb of the instructions, and the outer circle represents the direct objects.

Fine-tuning

We fine-tune our models using standard Hugging Face training code. We fine-tune LLaMA-7B and LLaMA-13B with the following hyperparameters:

Hyperparameter	LLaMA-7B	LLaMA-13B
Batch size	128	128
Learning rate	2e-5	1e-5
Epochs	3	5
Max length	512	512
Weight decay	0	0

To reproduce our fine-tuning runs for LLaMA, first install the requirements

pip install -r requirements.txt

Below is a command that fine-tunes LLaMA-7B with our dataset on a machine with 4 A100 80G GPUs in FSDP full_shard mode. We were able to reproduce a model of similar quality as the one we hosted in our demo with the following command using Python 3.10. Replace <your_random_port> with a port of your own, <your_path_to_hf_converted_llama_ckpt_and_tokenizer> with the path to your converted checkpoint and tokenizer (following instructions in the PR), and <your_output_dir> with where you want to store your outputs.

torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
    --tf32 True

The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B

torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path "facebook/opt-6.7b" \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fsdp "full_shard auto_wrap" \
    --fsdp_transformer_layer_cls_to_wrap 'OPTDecoderLayer' \
    --tf32 True

Note the given training script is meant to be simple and easy to use, and is not particularly optimized. To run on more gpus, you may prefer to turn down gradient_accumulation_steps to keep a global batch size of 128. Global batch size has not been tested for optimality.

Addressing OOM

Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you'd like to further reduce the memory footprint, here are some options:

Turn on CPU offload for FSDP with --fsdp "full_shard auto_wrap offload". This saves VRAM at the cost of longer runtime.

In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here's an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:

pip install deepspeed
torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
    --model_name_or_path <your_path_to_hf_converted_llama_ckpt_and_tokenizer> \
    --data_path ./alpaca_data.json \
    --bf16 True \
    --output_dir <your_output_dir> \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 1 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --deepspeed "./configs/default_offload_opt_param.json" \
    --tf32 True

The DeepSpeed library also provides some helpful functions to estimate memory usage.

LoRA fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the peft codebase can be a useful resource.

Recovering Alpaca Weights

The weight diff between Alpaca-7B and LLaMA-7B is located here. To recover the original Alpaca-7B weights, follow these steps:

1. Convert Meta's released weights into huggingface format. Follow this guide:
    https://huggingface.co/docs/transformers/main/model_doc/llama
2. Make sure you cloned the released weight diff into your local machine. The weight diff is located at:
    https://huggingface.co/tatsu-lab/alpaca-7b/tree/main
3. Run this function with the correct paths. E.g.,
    python weight_diff.py recover --path_raw <path_to_step_1_dir> --path_diff <path_to_step_2_dir> --path_tuned <path_to_store_recovered_weights>

Once step 3 completes, you should have a directory with the recovered weights, from which you can load the model like the following

import transformers
alpaca_model = transformers.AutoModelForCausalLM.from_pretrained("<path_to_store_recovered_weights>")
alpaca_tokenizer = transformers.AutoTokenizer.from_pretrained("<path_to_store_recovered_weights>")

Authors

All grad students below contributed equally and the order is determined by random draw.

All advised by Tatsunori B. Hashimoto. Yann is also advised by Percy Liang and Xuechen is also advised by Carlos Guestrin.

Citation

Please cite the repo if you use the data or code in this repo.

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

Naturally, you should also cite the original LLaMA paper [1] and the Self-Instruct paper [2].

Acknowledgements

We thank Yizhong Wang for his help in explaining the data generation pipeline in Self-Instruct and providing the code for the parse analysis plot. We thank Yifan Mai for helpful support, and members of the Stanford NLP Group as well as the Center for Research on Foundation Models (CRFM) for their helpful feedback.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot