Top Related Projects
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Code for the paper "Language Models are Unsupervised Multitask Learners"
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Large-scale pretraining for dialogue
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Quick Overview
graykode/gpt-2-Pytorch is a PyTorch implementation of OpenAI's GPT-2 language model. It provides a flexible and accessible way to work with GPT-2 in PyTorch, allowing researchers and developers to fine-tune and experiment with this powerful language model.
Pros
- Implements GPT-2 in PyTorch, making it easier for PyTorch users to work with the model
- Includes pre-trained model loading and fine-tuning capabilities
- Provides a simple interface for text generation
- Supports multiple GPT-2 model sizes (124M, 355M, 774M, 1558M)
Cons
- May not be as optimized or up-to-date as the official TensorFlow implementation
- Limited documentation and examples compared to more established libraries
- Might require more manual setup and configuration compared to higher-level libraries
- May not include the latest GPT-2 improvements or variants
Code Examples
- Loading a pre-trained GPT-2 model:
from gpt2.model import GPT2LMHeadModel
from gpt2.utils import load_weight
model = GPT2LMHeadModel(n_vocab=50257, n_layer=12, n_embd=768, n_head=12)
load_weight(model, 'gpt2-pytorch_model.bin')
- Generating text:
from gpt2.sample import sample_sequence
context_tokens = [50256] # Start token
generated = sample_sequence(model, length=100, context=context_tokens)
print(generated)
- Fine-tuning the model:
import torch
from gpt2.train import train
optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
train_dataset = YourCustomDataset() # Implement your own dataset
train(model, optimizer, train_dataset, epochs=3, batch_size=4)
Getting Started
-
Clone the repository:
git clone https://github.com/graykode/gpt-2-Pytorch.git cd gpt-2-Pytorch
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained weights:
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
-
Run the interactive demo:
python interactive_conditional_samples.py
Competitor Comparisons
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Pros of transformers
- Supports a wide range of transformer models beyond just GPT-2
- Actively maintained with frequent updates and improvements
- Extensive documentation and community support
Cons of transformers
- Larger and more complex codebase, potentially harder to understand for beginners
- May have more dependencies and overhead for simple use cases
Code Comparison
gpt-2-Pytorch:
model = GPT2LMHeadModel(config)
model.load_state_dict(torch.load(pretrained_model_path))
model.eval()
transformers:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
Summary
transformers offers a more comprehensive and well-maintained solution for working with transformer models, including GPT-2. It provides support for multiple architectures and has extensive documentation. However, its larger codebase may be overwhelming for simpler projects. gpt-2-Pytorch is more focused on GPT-2 specifically and may be easier to understand for beginners, but lacks the broader support and active development of transformers.
Code for the paper "Language Models are Unsupervised Multitask Learners"
Pros of gpt-2
- Official implementation from OpenAI, ensuring authenticity and accuracy
- Comprehensive documentation and explanations of the model architecture
- Regular updates and maintenance from the original authors
Cons of gpt-2
- Implemented in TensorFlow, which may not be preferred by PyTorch users
- Lacks some features and optimizations present in community implementations
- May require more setup and configuration for certain use cases
Code Comparison
gpt-2 (TensorFlow):
import tensorflow as tf
from gpt2 import model
hparams = model.default_hparams()
with tf.Session(graph=tf.Graph()) as sess:
context = tf.placeholder(tf.int32, [1, None])
output = model.model(hparams=hparams, X=context)
gpt-2-Pytorch (PyTorch):
import torch
from model import GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained('gpt2')
input_ids = torch.tensor([[31, 51, 99]], dtype=torch.long)
outputs = model(input_ids)
The main difference is the underlying framework (TensorFlow vs. PyTorch) and the API structure. gpt-2-Pytorch provides a more PyTorch-native implementation, while gpt-2 uses TensorFlow conventions.
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Pros of minGPT
- Simpler and more lightweight implementation, making it easier to understand and modify
- Includes a character-level language model example, which is useful for learning purposes
- More modular design, allowing for easier experimentation with different model configurations
Cons of minGPT
- Less feature-rich compared to gpt-2-Pytorch, which offers more pre-trained models and tokenization options
- May require more setup and configuration for advanced use cases
- Limited documentation compared to gpt-2-Pytorch
Code Comparison
minGPT:
class GPT(nn.Module):
def __init__(self, config):
super().__init__()
self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
self.drop = nn.Dropout(config.embd_pdrop)
self.blocks = nn.Sequential(*[Block(config) for _ in range(config.n_layer)])
self.ln_f = nn.LayerNorm(config.n_embd)
self.head = nn.Linear(config.n_embd, config.vocab_size, bias=False)
gpt-2-Pytorch:
class GPT2Model(nn.Module):
def __init__(self, config):
super(GPT2Model, self).__init__()
self.wte = nn.Embedding(config.vocab_size, config.n_embd)
self.wpe = nn.Embedding(config.n_positions, config.n_embd)
self.drop = nn.Dropout(config.embd_pdrop)
self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])
self.ln_f = LayerNorm(config.n_embd, eps=config.layer_norm_epsilon)
Large-scale pretraining for dialogue
Pros of DialoGPT
- Specifically designed for dialogue tasks, offering better performance in conversational contexts
- Includes pre-trained models and datasets for easier implementation of chatbots
- More extensive documentation and examples for dialogue-specific applications
Cons of DialoGPT
- Larger model size, requiring more computational resources
- Less flexible for general text generation tasks outside of dialogue
- More complex setup process compared to the simpler gpt-2-Pytorch implementation
Code Comparison
DialoGPT:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")
gpt-2-Pytorch:
from gpt_2_simple import gpt2
model_name = "124M"
gpt2.download_gpt2(model_name=model_name)
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, model_name=model_name)
The code snippets show that DialoGPT uses the Transformers library for easier model loading, while gpt-2-Pytorch relies on a custom implementation with TensorFlow sessions.
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
Pros of GPT-Neo
- Supports larger model sizes and more advanced architectures
- Implements more recent improvements in transformer models
- Offers better performance and scalability for training large language models
Cons of GPT-Neo
- More complex codebase, potentially harder to understand for beginners
- Requires more computational resources for training and inference
- Less straightforward to use for simple text generation tasks
Code Comparison
GPT-2-Pytorch:
class GPT2Model(nn.Module):
def __init__(self, config):
super(GPT2Model, self).__init__()
self.wte = nn.Embedding(config.vocab_size, config.n_embd)
self.wpe = nn.Embedding(config.n_positions, config.n_embd)
self.drop = nn.Dropout(config.embd_pdrop)
GPT-Neo:
class GPTNeoModel(nn.Module):
def __init__(self, config):
super().__init__()
self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
self.layers = nn.ModuleList([GPTNeoBlock(config) for _ in range(config.num_layers)])
self.final_layer_norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_epsilon)
The code comparison shows that GPT-Neo uses a more modular approach with separate blocks for each layer, while GPT-2-Pytorch has a simpler structure. GPT-Neo also includes additional normalization and more advanced embedding techniques.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- Supports a wider range of NLP tasks and models beyond GPT-2
- More actively maintained with frequent updates and contributions
- Offers extensive documentation and examples for various use cases
Cons of fairseq
- Steeper learning curve due to its broader scope and complexity
- May be overkill for projects focused solely on GPT-2 implementation
- Requires more computational resources for setup and training
Code Comparison
gpt-2-Pytorch:
class GPT2Model(nn.Module):
def __init__(self, config):
super(GPT2Model, self).__init__()
self.wte = nn.Embedding(config.vocab_size, config.n_embd)
self.wpe = nn.Embedding(config.n_positions, config.n_embd)
self.drop = nn.Dropout(config.embd_pdrop)
fairseq:
class TransformerLanguageModel(FairseqLanguageModel):
def __init__(self, decoder):
super().__init__(decoder)
@classmethod
def build_model(cls, args, task):
base_lm = FairseqLanguageModel.build_model(args, task)
return cls(base_lm.decoder)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
GPT2-Pytorch with Text-Generator
Better Language Models and Their Implications
Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. from openAI Blog
This repository is simple implementation GPT-2 about text-generator in Pytorch with compress code
-
The original repertoire is openai/gpt-2. Also You can Read Paper about gpt-2, "Language Models are Unsupervised Multitask Learners". To Understand more detail concept, I recommend papers about Transformer Model.
-
Good implementation GPT-2 in Pytorch which I referred to, huggingface/pytorch-pretrained-BERT, You can see more detail implementation in huggingface repository.
-
Transformer(Self-Attention) Paper : Attention Is All You Need(2017)
-
First OpenAi-GPT Paper : Improving Language Understanding by Generative Pre-Training(2018)
-
See OpenAI Blog about GPT-2 and Paper
Quick Start
- download GPT2 pre-trained model in Pytorch which huggingface/pytorch-pretrained-BERT already made! (Thanks for sharing! it's help my problem transferring tensorflow(ckpt) file to Pytorch Model!)
$ git clone https://github.com/graykode/gpt-2-Pytorch && cd gpt-2-Pytorch
# download huggingface's pytorch model
$ curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
# setup requirements, if using mac os, then run additional setup as descibed below
$ pip install -r requirements.txt
- Now, You can run like this.
- Text from Book 1984, George Orwell
$ python main.py --text "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him."
- Also You can Quick Starting in Google Colab
Option
--text
: sentence to begin with.--quiet
: not print all of the extraneous stuff like the "================"--nsamples
: number of sample sampled in batch when multinomial function use--unconditional
: If true, unconditional generation.--batch_size
: number of batch size--length
: sentence length (< number of context)--temperature
: the thermodynamic temperature in distribution(default 0.7)
--top_k
: Returns the top k largest elements of the given input tensor along a given dimension.(default 40)
See more detail option about temperature
and top_k
in here
Dependencies
- Pytorch 0.41+
- regex 2017.4.5
Mac OS Setup
$ python3 -m venv venv
$ source venv/bin/activate
$ pip install torch tqdm
$ brew install libomp
$ export LC_ALL=en_US.UTF-8
$ export LANG=en_US.UTF-8
$ pip install -r requirements.txt
Author
- Tae Hwan Jung(Jeff Jung) @graykode
- Author Email : nlkey2022@gmail.com
License
- OpenAi/GPT2 follow MIT license, huggingface/pytorch-pretrained-BERT is Apache license.
- I follow MIT license with original GPT2 repository
Acknowledgement
Jeff Wu(@WuTheFWasThat), Thomas Wolf(@thomwolf) for allowing referring code.
Top Related Projects
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Code for the paper "Language Models are Unsupervised Multitask Learners"
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
Large-scale pretraining for dialogue
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot