gpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

1,001

228

1,001

View on GitHub

Top Related Projects

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

gpt-2

23,373

Code for the paper "Language Models are Unsupervised Multitask Learners"

minGPT

21,810

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

DialoGPT

2,389

Large-scale pretraining for dialogue

gpt-neo

8,287

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

fairseq

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

graykode/gpt-2-Pytorch is a PyTorch implementation of OpenAI's GPT-2 language model. It provides a flexible and accessible way to work with GPT-2 in PyTorch, allowing researchers and developers to fine-tune and experiment with this powerful language model.

Pros

Implements GPT-2 in PyTorch, making it easier for PyTorch users to work with the model
Includes pre-trained model loading and fine-tuning capabilities
Provides a simple interface for text generation
Supports multiple GPT-2 model sizes (124M, 355M, 774M, 1558M)

Cons

May not be as optimized or up-to-date as the official TensorFlow implementation
Limited documentation and examples compared to more established libraries
Might require more manual setup and configuration compared to higher-level libraries
May not include the latest GPT-2 improvements or variants

Code Examples

Loading a pre-trained GPT-2 model:

from gpt2.model import GPT2LMHeadModel
from gpt2.utils import load_weight

model = GPT2LMHeadModel(n_vocab=50257, n_layer=12, n_embd=768, n_head=12)
load_weight(model, 'gpt2-pytorch_model.bin')

Generating text:

from gpt2.sample import sample_sequence

context_tokens = [50256]  # Start token
generated = sample_sequence(model, length=100, context=context_tokens)
print(generated)

Fine-tuning the model:

import torch
from gpt2.train import train

optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
train_dataset = YourCustomDataset()  # Implement your own dataset
train(model, optimizer, train_dataset, epochs=3, batch_size=4)

Getting Started

Clone the repository:

git clone https://github.com/graykode/gpt-2-Pytorch.git
cd gpt-2-Pytorch

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained weights:

wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin

Run the interactive demo:

python interactive_conditional_samples.py

Competitor Comparisons

transformers

146,142

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

Supports a wide range of transformer models beyond just GPT-2
Actively maintained with frequent updates and improvements
Extensive documentation and community support

Cons of transformers

Larger and more complex codebase, potentially harder to understand for beginners
May have more dependencies and overhead for simple use cases

Code Comparison

gpt-2-Pytorch:

model = GPT2LMHeadModel(config)
model.load_state_dict(torch.load(pretrained_model_path))
model.eval()

transformers:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

Summary

transformers offers a more comprehensive and well-maintained solution for working with transformer models, including GPT-2. It provides support for multiple architectures and has extensive documentation. However, its larger codebase may be overwhelming for simpler projects. gpt-2-Pytorch is more focused on GPT-2 specifically and may be easier to understand for beginners, but lacks the broader support and active development of transformers.

gpt-2

23,373

Code for the paper "Language Models are Unsupervised Multitask Learners"

Pros of gpt-2

Official implementation from OpenAI, ensuring authenticity and accuracy
Comprehensive documentation and explanations of the model architecture
Regular updates and maintenance from the original authors

Cons of gpt-2

Implemented in TensorFlow, which may not be preferred by PyTorch users
Lacks some features and optimizations present in community implementations
May require more setup and configuration for certain use cases

Code Comparison

gpt-2 (TensorFlow):

import tensorflow as tf
from gpt2 import model

hparams = model.default_hparams()
with tf.Session(graph=tf.Graph()) as sess:
    context = tf.placeholder(tf.int32, [1, None])
    output = model.model(hparams=hparams, X=context)

gpt-2-Pytorch (PyTorch):

import torch
from model import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')
input_ids = torch.tensor([[31, 51, 99]], dtype=torch.long)
outputs = model(input_ids)

The main difference is the underlying framework (TensorFlow vs. PyTorch) and the API structure. gpt-2-Pytorch provides a more PyTorch-native implementation, while gpt-2 uses TensorFlow conventions.

minGPT

21,810

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Pros of minGPT

Simpler and more lightweight implementation, making it easier to understand and modify
Includes a character-level language model example, which is useful for learning purposes
More modular design, allowing for easier experimentation with different model configurations

Cons of minGPT

Less feature-rich compared to gpt-2-Pytorch, which offers more pre-trained models and tokenization options
May require more setup and configuration for advanced use cases
Limited documentation compared to gpt-2-Pytorch

Code Comparison

minGPT:

class GPT(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
        self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
        self.drop = nn.Dropout(config.embd_pdrop)
        self.blocks = nn.Sequential(*[Block(config) for _ in range(config.n_layer)])
        self.ln_f = nn.LayerNorm(config.n_embd)
        self.head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

gpt-2-Pytorch:

class GPT2Model(nn.Module):
    def __init__(self, config):
        super(GPT2Model, self).__init__()
        self.wte = nn.Embedding(config.vocab_size, config.n_embd)
        self.wpe = nn.Embedding(config.n_positions, config.n_embd)
        self.drop = nn.Dropout(config.embd_pdrop)
        self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])
        self.ln_f = LayerNorm(config.n_embd, eps=config.layer_norm_epsilon)

DialoGPT

2,389

Large-scale pretraining for dialogue

Pros of DialoGPT

Specifically designed for dialogue tasks, offering better performance in conversational contexts
Includes pre-trained models and datasets for easier implementation of chatbots
More extensive documentation and examples for dialogue-specific applications

Cons of DialoGPT

Larger model size, requiring more computational resources
Less flexible for general text generation tasks outside of dialogue
More complex setup process compared to the simpler gpt-2-Pytorch implementation

Code Comparison

DialoGPT:

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

gpt-2-Pytorch:

from gpt_2_simple import gpt2
model_name = "124M"
gpt2.download_gpt2(model_name=model_name)
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, model_name=model_name)

The code snippets show that DialoGPT uses the Transformers library for easier model loading, while gpt-2-Pytorch relies on a custom implementation with TensorFlow sessions.

gpt-neo

8,287

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Pros of GPT-Neo

Supports larger model sizes and more advanced architectures
Implements more recent improvements in transformer models
Offers better performance and scalability for training large language models

Cons of GPT-Neo

More complex codebase, potentially harder to understand for beginners
Requires more computational resources for training and inference
Less straightforward to use for simple text generation tasks

Code Comparison

GPT-2-Pytorch:

class GPT2Model(nn.Module):
    def __init__(self, config):
        super(GPT2Model, self).__init__()
        self.wte = nn.Embedding(config.vocab_size, config.n_embd)
        self.wpe = nn.Embedding(config.n_positions, config.n_embd)
        self.drop = nn.Dropout(config.embd_pdrop)

GPT-Neo:

class GPTNeoModel(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
        self.layers = nn.ModuleList([GPTNeoBlock(config) for _ in range(config.num_layers)])
        self.final_layer_norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_epsilon)

The code comparison shows that GPT-Neo uses a more modular approach with separate blocks for each layer, while GPT-2-Pytorch has a simpler structure. GPT-Neo also includes additional normalization and more advanced embedding techniques.

fairseq

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

Supports a wider range of NLP tasks and models beyond GPT-2
More actively maintained with frequent updates and contributions
Offers extensive documentation and examples for various use cases

Cons of fairseq

Steeper learning curve due to its broader scope and complexity
May be overkill for projects focused solely on GPT-2 implementation
Requires more computational resources for setup and training

Code Comparison

gpt-2-Pytorch:

class GPT2Model(nn.Module):
    def __init__(self, config):
        super(GPT2Model, self).__init__()
        self.wte = nn.Embedding(config.vocab_size, config.n_embd)
        self.wpe = nn.Embedding(config.n_positions, config.n_embd)
        self.drop = nn.Dropout(config.embd_pdrop)

fairseq:

class TransformerLanguageModel(FairseqLanguageModel):
    def __init__(self, decoder):
        super().__init__(decoder)

    @classmethod
    def build_model(cls, args, task):
        base_lm = FairseqLanguageModel.build_model(args, task)
        return cls(base_lm.decoder)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GPT2-Pytorch with Text-Generator

Better Language Models and Their Implications

Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. from openAI Blog

This repository is simple implementation GPT-2 about text-generator in Pytorch with compress code

The original repertoire is openai/gpt-2. Also You can Read Paper about gpt-2, "Language Models are Unsupervised Multitask Learners". To Understand more detail concept, I recommend papers about Transformer Model.
Good implementation GPT-2 in Pytorch which I referred to, huggingface/pytorch-pretrained-BERT, You can see more detail implementation in huggingface repository.
Transformer(Self-Attention) Paper : Attention Is All You Need(2017)
First OpenAi-GPT Paper : Improving Language Understanding by Generative Pre-Training(2018)
See OpenAI Blog about GPT-2 and Paper

Quick Start

download GPT2 pre-trained model in Pytorch which huggingface/pytorch-pretrained-BERT already made! (Thanks for sharing! it's help my problem transferring tensorflow(ckpt) file to Pytorch Model!)

$ git clone https://github.com/graykode/gpt-2-Pytorch && cd gpt-2-Pytorch
# download huggingface's pytorch model 
$ curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
# setup requirements, if using mac os, then run additional setup as descibed below
$ pip install -r requirements.txt

Now, You can run like this.

Text from Book 1984, George Orwell

$ python main.py --text "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him."

Also You can Quick Starting in Google Colab

Option

--text : sentence to begin with.
--quiet : not print all of the extraneous stuff like the "================"
--nsamples : number of sample sampled in batch when multinomial function use
--unconditional : If true, unconditional generation.
--batch_size : number of batch size
--length : sentence length (< number of context)
--temperature: the thermodynamic temperature in distribution (default 0.7)
--top_k : Returns the top k largest elements of the given input tensor along a given dimension. (default 40)

See more detail option about temperature and top_k in here

Dependencies

Pytorch 0.41+
regex 2017.4.5

Mac OS Setup

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install torch tqdm
$ brew install libomp
$ export LC_ALL=en_US.UTF-8
$ export LANG=en_US.UTF-8
$ pip install -r requirements.txt

Author

Tae Hwan Jung(Jeff Jung) @graykode
Author Email : nlkey2022@gmail.com

License

OpenAi/GPT2 follow MIT license, huggingface/pytorch-pretrained-BERT is Apache license.
I follow MIT license with original GPT2 repository

Acknowledgement

Jeff Wu(@WuTheFWasThat), Thomas Wolf(@thomwolf) for allowing referring code.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot