gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"

23,998

5,753

23,998

190

View on GitHub

Top Related Projects

transformers

150,567

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

bert

39,558

TensorFlow code and pre-trained models for BERT

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

tensor2tensor

16,358

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

DialoGPT

2,411

Large-scale pretraining for dialogue

gpt-neo

8,297

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Quick Overview

GPT-2 is a large-scale unsupervised language model developed by OpenAI. It is capable of generating coherent and contextually relevant text based on a given prompt. The model has been trained on a diverse range of internet text and can be fine-tuned for various natural language processing tasks.

Pros

Highly versatile and can be applied to a wide range of language tasks
Produces high-quality, coherent text that often appears human-like
Can be fine-tuned for specific applications with relatively small datasets
Open-source implementation allows for research and experimentation

Cons

Requires significant computational resources for training and inference
May generate biased or inappropriate content if not properly filtered
Can be used maliciously to create fake news or misleading information
Limited control over the specific content generated by the model

Code Examples

# Load pre-trained GPT-2 model
from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Generate text based on a prompt
prompt = "Once upon a time"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
output = model.generate(input_ids, max_length=100, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

# Fine-tune GPT-2 on a custom dataset
from transformers import TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments

train_dataset = TextDataset(
    tokenizer=tokenizer,
    file_path="path/to/train.txt",
    block_size=128
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

training_args = TrainingArguments(
    output_dir="./gpt2-finetuned",
    overwrite_output_dir=True,
    num_train_epochs=3,
    per_device_train_batch_size=4,
    save_steps=10_000,
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
)

trainer.train()

Getting Started

To get started with GPT-2 using the Hugging Face Transformers library:

Install the required packages:

pip install transformers torch

Load the pre-trained model and tokenizer:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

Generate text:

prompt = "Hello, how are you?"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
output = model.generate(input_ids, max_length=50, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Competitor Comparisons

transformers

150,567

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of Transformers

Supports a wide range of models beyond GPT-2, including BERT, RoBERTa, and T5
Offers more comprehensive documentation and examples
Provides easier integration with PyTorch and TensorFlow

Cons of Transformers

Larger codebase, potentially more complex for beginners
May have slower inference times for some models compared to GPT-2's optimized implementation

Code Comparison

GPT-2:

import gpt_2_simple as gpt2

gpt2.download_gpt2(model_name="124M")
sess = gpt2.start_tf_sess()
gpt2.finetune(sess, dataset="path/to/data.txt", model_name="124M")

Transformers:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
input_ids = tokenizer.encode("Hello, I'm a language model", return_tensors="pt")
outputs = model.generate(input_ids)

Both repositories provide implementations of the GPT-2 model, but Transformers offers a more versatile and extensive framework for working with various transformer-based models. While GPT-2 focuses solely on its namesake model, Transformers provides a unified API for multiple architectures, making it more suitable for diverse NLP tasks and experimentation.

bert

39,558

TensorFlow code and pre-trained models for BERT

Pros of BERT

More focused on bidirectional context understanding
Better suited for tasks like question answering and sentiment analysis
Smaller model size, requiring less computational resources

Cons of BERT

Less effective for open-ended text generation tasks
Limited context window compared to GPT-2
May struggle with long-range dependencies in text

Code Comparison

BERT example:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')

GPT-2 example:

from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

Both repositories use the Transformers library, but BERT focuses on encoding input for various tasks, while GPT-2 is primarily used for text generation. BERT's bidirectional nature allows it to consider context from both directions, making it more suitable for certain NLP tasks. GPT-2, on the other hand, excels in generating coherent and contextually relevant text sequences.

fairseq

31,682

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Supports a wider range of sequence-to-sequence tasks beyond language modeling
More flexible and modular architecture for customization
Includes pre-trained models for various languages and tasks

Cons of fairseq

Steeper learning curve due to increased complexity
May require more computational resources for training and inference
Less focused on pure language modeling compared to GPT-2

Code Comparison

GPT-2:

import gpt_2_simple as gpt2

gpt2.download_gpt2(model_name="124M")
sess = gpt2.start_tf_sess()
gpt2.finetune(sess, dataset="path/to/data.txt", model_name="124M")

fairseq:

from fairseq.models.transformer import TransformerModel

model = TransformerModel.from_pretrained('/path/to/checkpoints')
tokens = model.encode('Hello world!')
translated = model.decode(model.translate(tokens))

The code snippets demonstrate the different approaches:

GPT-2 focuses on simple fine-tuning and generation
fairseq showcases more complex model loading and translation capabilities

tensor2tensor

16,358

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Pros of tensor2tensor

Broader scope: Supports a wide range of machine learning tasks beyond language modeling
More extensive documentation and examples
Active community development and regular updates

Cons of tensor2tensor

Higher complexity due to its broader scope
Steeper learning curve for beginners
May require more setup and configuration for specific tasks

Code Comparison

GPT-2:

import gpt_2_simple as gpt2

gpt2.download_gpt2(model_name="124M")
sess = gpt2.start_tf_sess()
gpt2.finetune(sess, dataset="path/to/data.txt", model_name="124M")

tensor2tensor:

from tensor2tensor.utils import trainer_lib
from tensor2tensor.utils import t2t_model

problem = t2t_model.problem("text_problem")
hparams = trainer_lib.create_hparams("transformer_base")
trainer_lib.train(problem, hparams)

Both repositories offer powerful tools for natural language processing and machine learning tasks. GPT-2 focuses specifically on language modeling and generation, while tensor2tensor provides a more comprehensive framework for various machine learning applications. The choice between the two depends on the specific requirements of your project and your level of expertise in the field.

DialoGPT

2,411

Large-scale pretraining for dialogue

Pros of DialoGPT

Specifically designed for conversational AI and dialogue generation
Includes pre-training on large-scale dialogue datasets
Offers better performance in multi-turn conversations

Cons of DialoGPT

More limited in general text generation tasks compared to GPT-2
Requires more computational resources for fine-tuning and inference
Less versatile for non-dialogue applications

Code Comparison

GPT-2:

import gpt_2_simple as gpt2

gpt2.download_gpt2(model_name="124M")
sess = gpt2.start_tf_sess()
gpt2.finetune(sess, dataset="path/to/data.txt", model_name="124M")

DialoGPT:

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

Both GPT-2 and DialoGPT are powerful language models, but they serve different purposes. GPT-2 is a more general-purpose model for text generation, while DialoGPT is specifically tailored for dialogue tasks. The choice between them depends on the specific application and requirements of the project.

gpt-neo

8,297

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Pros of GPT-Neo

Offers larger model sizes and improved performance
Provides more flexibility in training and fine-tuning
Includes additional features like sparse attention mechanisms

Cons of GPT-Neo

Requires more computational resources for training and inference
May have less extensive documentation and community support
Potentially more complex to implement and use for beginners

Code Comparison

GPT-2:

import gpt_2_simple as gpt2

gpt2.download_gpt2(model_name="124M")
sess = gpt2.start_tf_sess()
gpt2.finetune(sess, dataset="path/to/data.txt", model_name="124M")

GPT-Neo:

from transformers import GPTNeoForCausalLM, GPT2Tokenizer

model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
input_ids = tokenizer.encode("Hello, I am", return_tensors="pt")
output = model.generate(input_ids, max_length=50, do_sample=True)

Both repositories provide powerful language models, but GPT-Neo offers more advanced features and larger model sizes at the cost of increased complexity and resource requirements. GPT-2 may be more suitable for beginners or those with limited computational resources, while GPT-Neo is better suited for advanced users seeking state-of-the-art performance.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Status: Archive (code is provided as-is, no updates expected)

gpt-2

Code and models from the paper "Language Models are Unsupervised Multitask Learners".

You can read about GPT-2 and its staged release in our original blog post, 6 month follow-up post, and final post.

We have also released a dataset for researchers to study their behaviors.

^* Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper). Thus you may have seen small referred to as 117M and medium referred to as 345M.

Usage

This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.

For basic information, see our model card.

Some caveats

GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
The dataset our GPT-2 models were trained on contains many texts with biases and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.

Work with us

Please let us know if youâre doing interesting research with or working on applications of GPT-2! Weâre especially interested in hearing from and potentially working with those who are studying

Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
The extent of problematic content (e.g. bias) being baked into the models and effective mitigations

Development

See DEVELOPERS.md

Contributors

See CONTRIBUTORS.md

Citation

Please use the following bibtex entry:

@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}

Future work

We may release code for evaluating the models on various benchmarks.

We are still considering release of the larger models.

License

Modified MIT

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot