Convert Figma logo to code with AI

graykode logogpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

1,001
228
1,001
18

Top Related Projects

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

23,373

Code for the paper "Language Models are Unsupervised Multitask Learners"

21,810

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Large-scale pretraining for dialogue

8,287

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Quick Overview

graykode/gpt-2-Pytorch is a PyTorch implementation of OpenAI's GPT-2 language model. It provides a flexible and accessible way to work with GPT-2 in PyTorch, allowing researchers and developers to fine-tune and experiment with this powerful language model.

Pros

  • Implements GPT-2 in PyTorch, making it easier for PyTorch users to work with the model
  • Includes pre-trained model loading and fine-tuning capabilities
  • Provides a simple interface for text generation
  • Supports multiple GPT-2 model sizes (124M, 355M, 774M, 1558M)

Cons

  • May not be as optimized or up-to-date as the official TensorFlow implementation
  • Limited documentation and examples compared to more established libraries
  • Might require more manual setup and configuration compared to higher-level libraries
  • May not include the latest GPT-2 improvements or variants

Code Examples

  1. Loading a pre-trained GPT-2 model:
from gpt2.model import GPT2LMHeadModel
from gpt2.utils import load_weight

model = GPT2LMHeadModel(n_vocab=50257, n_layer=12, n_embd=768, n_head=12)
load_weight(model, 'gpt2-pytorch_model.bin')
  1. Generating text:
from gpt2.sample import sample_sequence

context_tokens = [50256]  # Start token
generated = sample_sequence(model, length=100, context=context_tokens)
print(generated)
  1. Fine-tuning the model:
import torch
from gpt2.train import train

optimizer = torch.optim.Adam(model.parameters(), lr=1e-5)
train_dataset = YourCustomDataset()  # Implement your own dataset
train(model, optimizer, train_dataset, epochs=3, batch_size=4)

Getting Started

  1. Clone the repository:

    git clone https://github.com/graykode/gpt-2-Pytorch.git
    cd gpt-2-Pytorch
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained weights:

    wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
    
  4. Run the interactive demo:

    python interactive_conditional_samples.py
    

Competitor Comparisons

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Pros of transformers

  • Supports a wide range of transformer models beyond just GPT-2
  • Actively maintained with frequent updates and improvements
  • Extensive documentation and community support

Cons of transformers

  • Larger and more complex codebase, potentially harder to understand for beginners
  • May have more dependencies and overhead for simple use cases

Code Comparison

gpt-2-Pytorch:

model = GPT2LMHeadModel(config)
model.load_state_dict(torch.load(pretrained_model_path))
model.eval()

transformers:

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

Summary

transformers offers a more comprehensive and well-maintained solution for working with transformer models, including GPT-2. It provides support for multiple architectures and has extensive documentation. However, its larger codebase may be overwhelming for simpler projects. gpt-2-Pytorch is more focused on GPT-2 specifically and may be easier to understand for beginners, but lacks the broader support and active development of transformers.

23,373

Code for the paper "Language Models are Unsupervised Multitask Learners"

Pros of gpt-2

  • Official implementation from OpenAI, ensuring authenticity and accuracy
  • Comprehensive documentation and explanations of the model architecture
  • Regular updates and maintenance from the original authors

Cons of gpt-2

  • Implemented in TensorFlow, which may not be preferred by PyTorch users
  • Lacks some features and optimizations present in community implementations
  • May require more setup and configuration for certain use cases

Code Comparison

gpt-2 (TensorFlow):

import tensorflow as tf
from gpt2 import model

hparams = model.default_hparams()
with tf.Session(graph=tf.Graph()) as sess:
    context = tf.placeholder(tf.int32, [1, None])
    output = model.model(hparams=hparams, X=context)

gpt-2-Pytorch (PyTorch):

import torch
from model import GPT2LMHeadModel

model = GPT2LMHeadModel.from_pretrained('gpt2')
input_ids = torch.tensor([[31, 51, 99]], dtype=torch.long)
outputs = model(input_ids)

The main difference is the underlying framework (TensorFlow vs. PyTorch) and the API structure. gpt-2-Pytorch provides a more PyTorch-native implementation, while gpt-2 uses TensorFlow conventions.

21,810

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Pros of minGPT

  • Simpler and more lightweight implementation, making it easier to understand and modify
  • Includes a character-level language model example, which is useful for learning purposes
  • More modular design, allowing for easier experimentation with different model configurations

Cons of minGPT

  • Less feature-rich compared to gpt-2-Pytorch, which offers more pre-trained models and tokenization options
  • May require more setup and configuration for advanced use cases
  • Limited documentation compared to gpt-2-Pytorch

Code Comparison

minGPT:

class GPT(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.tok_emb = nn.Embedding(config.vocab_size, config.n_embd)
        self.pos_emb = nn.Parameter(torch.zeros(1, config.block_size, config.n_embd))
        self.drop = nn.Dropout(config.embd_pdrop)
        self.blocks = nn.Sequential(*[Block(config) for _ in range(config.n_layer)])
        self.ln_f = nn.LayerNorm(config.n_embd)
        self.head = nn.Linear(config.n_embd, config.vocab_size, bias=False)

gpt-2-Pytorch:

class GPT2Model(nn.Module):
    def __init__(self, config):
        super(GPT2Model, self).__init__()
        self.wte = nn.Embedding(config.vocab_size, config.n_embd)
        self.wpe = nn.Embedding(config.n_positions, config.n_embd)
        self.drop = nn.Dropout(config.embd_pdrop)
        self.h = nn.ModuleList([Block(config.n_ctx, config, scale=True) for _ in range(config.n_layer)])
        self.ln_f = LayerNorm(config.n_embd, eps=config.layer_norm_epsilon)

Large-scale pretraining for dialogue

Pros of DialoGPT

  • Specifically designed for dialogue tasks, offering better performance in conversational contexts
  • Includes pre-trained models and datasets for easier implementation of chatbots
  • More extensive documentation and examples for dialogue-specific applications

Cons of DialoGPT

  • Larger model size, requiring more computational resources
  • Less flexible for general text generation tasks outside of dialogue
  • More complex setup process compared to the simpler gpt-2-Pytorch implementation

Code Comparison

DialoGPT:

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

gpt-2-Pytorch:

from gpt_2_simple import gpt2
model_name = "124M"
gpt2.download_gpt2(model_name=model_name)
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, model_name=model_name)

The code snippets show that DialoGPT uses the Transformers library for easier model loading, while gpt-2-Pytorch relies on a custom implementation with TensorFlow sessions.

8,287

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Pros of GPT-Neo

  • Supports larger model sizes and more advanced architectures
  • Implements more recent improvements in transformer models
  • Offers better performance and scalability for training large language models

Cons of GPT-Neo

  • More complex codebase, potentially harder to understand for beginners
  • Requires more computational resources for training and inference
  • Less straightforward to use for simple text generation tasks

Code Comparison

GPT-2-Pytorch:

class GPT2Model(nn.Module):
    def __init__(self, config):
        super(GPT2Model, self).__init__()
        self.wte = nn.Embedding(config.vocab_size, config.n_embd)
        self.wpe = nn.Embedding(config.n_positions, config.n_embd)
        self.drop = nn.Dropout(config.embd_pdrop)

GPT-Neo:

class GPTNeoModel(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
        self.layers = nn.ModuleList([GPTNeoBlock(config) for _ in range(config.num_layers)])
        self.final_layer_norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_epsilon)

The code comparison shows that GPT-Neo uses a more modular approach with separate blocks for each layer, while GPT-2-Pytorch has a simpler structure. GPT-Neo also includes additional normalization and more advanced embedding techniques.

31,373

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • Supports a wider range of NLP tasks and models beyond GPT-2
  • More actively maintained with frequent updates and contributions
  • Offers extensive documentation and examples for various use cases

Cons of fairseq

  • Steeper learning curve due to its broader scope and complexity
  • May be overkill for projects focused solely on GPT-2 implementation
  • Requires more computational resources for setup and training

Code Comparison

gpt-2-Pytorch:

class GPT2Model(nn.Module):
    def __init__(self, config):
        super(GPT2Model, self).__init__()
        self.wte = nn.Embedding(config.vocab_size, config.n_embd)
        self.wpe = nn.Embedding(config.n_positions, config.n_embd)
        self.drop = nn.Dropout(config.embd_pdrop)

fairseq:

class TransformerLanguageModel(FairseqLanguageModel):
    def __init__(self, decoder):
        super().__init__(decoder)

    @classmethod
    def build_model(cls, args, task):
        base_lm = FairseqLanguageModel.build_model(args, task)
        return cls(base_lm.decoder)

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

GPT2-Pytorch with Text-Generator

Better Language Models and Their Implications

Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment in responsible disclosure, we are instead releasing a much smaller model for researchers to experiment with, as well as a technical paper. from openAI Blog

This repository is simple implementation GPT-2 about text-generator in Pytorch with compress code

Quick Start

  1. download GPT2 pre-trained model in Pytorch which huggingface/pytorch-pretrained-BERT already made! (Thanks for sharing! it's help my problem transferring tensorflow(ckpt) file to Pytorch Model!)
$ git clone https://github.com/graykode/gpt-2-Pytorch && cd gpt-2-Pytorch
# download huggingface's pytorch model 
$ curl --output gpt2-pytorch_model.bin https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-pytorch_model.bin
# setup requirements, if using mac os, then run additional setup as descibed below
$ pip install -r requirements.txt
  1. Now, You can run like this.
  • Text from Book 1984, George Orwell
$ python main.py --text "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him."
  1. Also You can Quick Starting in Google Colab

Option

  • --text : sentence to begin with.
  • --quiet : not print all of the extraneous stuff like the "================"
  • --nsamples : number of sample sampled in batch when multinomial function use
  • --unconditional : If true, unconditional generation.
  • --batch_size : number of batch size
  • --length : sentence length (< number of context)
  • --temperature: the thermodynamic temperature in distribution (default 0.7)
  • --top_k : Returns the top k largest elements of the given input tensor along a given dimension. (default 40)

See more detail option about temperature and top_k in here

Dependencies

  • Pytorch 0.41+
  • regex 2017.4.5

Mac OS Setup

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install torch tqdm
$ brew install libomp
$ export LC_ALL=en_US.UTF-8
$ export LANG=en_US.UTF-8
$ pip install -r requirements.txt

Author

License

  • OpenAi/GPT2 follow MIT license, huggingface/pytorch-pretrained-BERT is Apache license.
  • I follow MIT license with original GPT2 repository

Acknowledgement

Jeff Wu(@WuTheFWasThat), Thomas Wolf(@thomwolf) for allowing referring code.