Convert Figma logo to code with AI

salesforce logoctrl

Conditional Transformer Language Model for Controllable Generation

1,864
208
1,864
26

Top Related Projects

22,214

Code for the paper "Language Models are Unsupervised Multitask Learners"

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

37,810

TensorFlow code and pre-trained models for BERT

30,129

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Large-scale pretraining for dialogue

8,204

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Quick Overview

CTRL (Conditional Transformer Language) is a large-scale language model developed by Salesforce Research. It is designed to generate coherent and controllable text based on specific prompts or conditions. CTRL aims to provide more control over the generated content compared to traditional language models.

Pros

  • Offers fine-grained control over text generation through the use of control codes
  • Trained on a diverse range of internet sources, resulting in broad knowledge and versatility
  • Supports multiple languages and domains
  • Open-source implementation allows for further research and development

Cons

  • Requires significant computational resources for training and inference
  • May produce biased or inappropriate content if not properly filtered
  • Limited documentation and examples for implementation
  • Potential privacy concerns due to the model's training on internet data

Code Examples

# Load the CTRL model and tokenizer
from transformers import CTRLTokenizer, CTRLLMHeadModel

tokenizer = CTRLTokenizer.from_pretrained("ctrl")
model = CTRLLMHeadModel.from_pretrained("ctrl")

# Generate text with a specific control code
input_text = "Links\nHow to make a delicious pizza:"
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_length=100, temperature=0.7)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
# Fine-tune CTRL on custom data
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    save_steps=10_000,
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=lambda data: {'input_ids': torch.stack([f[0] for f in data]),
                                'attention_mask': torch.stack([f[1] for f in data]),
                                'labels': torch.stack([f[0] for f in data])},
)

trainer.train()

Getting Started

To get started with CTRL, follow these steps:

  1. Install the required libraries:

    pip install transformers torch
    
  2. Load the model and tokenizer:

    from transformers import CTRLTokenizer, CTRLLMHeadModel
    
    tokenizer = CTRLTokenizer.from_pretrained("ctrl")
    model = CTRLLMHeadModel.from_pretrained("ctrl")
    
  3. Generate text using a control code:

    input_text = "Links\nWhat are the benefits of exercise?"
    input_ids = tokenizer.encode(input_text, return_tensors="pt")
    output = model.generate(input_ids, max_length=100, temperature=0.7)
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    print(generated_text)
    

For more advanced usage and fine-tuning, refer to the Hugging Face Transformers documentation and the CTRL GitHub repository.

Competitor Comparisons

22,214

Code for the paper "Language Models are Unsupervised Multitask Learners"

Pros of GPT-2

  • More widely adopted and studied in the research community
  • Offers a simpler architecture, making it easier to understand and implement
  • Provides pre-trained models of various sizes, allowing for flexibility in deployment

Cons of GPT-2

  • Less control over the generated content compared to CTRL
  • May require more fine-tuning for specific tasks or domains
  • Limited in its ability to handle structured or conditional text generation

Code Comparison

GPT-2:

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

CTRL:

import torch
from transformers import CTRLTokenizer, CTRLLMHeadModel

tokenizer = CTRLTokenizer.from_pretrained("ctrl")
model = CTRLLMHeadModel.from_pretrained("ctrl")

Both repositories use the Hugging Face Transformers library for easy implementation. The main difference lies in the specific model and tokenizer classes used. GPT-2 uses the GPT2LMHeadModel and GPT2Tokenizer, while CTRL uses CTRLLMHeadModel and CTRLTokenizer. This reflects the architectural differences between the two models and their respective approaches to language generation.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Pros of Transformers

  • Broader model support: Includes many popular architectures beyond CTRL
  • Active community: More frequent updates and contributions
  • Extensive documentation and examples for various NLP tasks

Cons of Transformers

  • Larger codebase: Can be more complex to navigate and understand
  • Potentially slower inference: Generalized architecture may sacrifice some speed

Code Comparison

CTRL (model initialization):

from pytorch_transformers import CTRLLMHeadModel, CTRLTokenizer

tokenizer = CTRLTokenizer.from_pretrained('ctrl')
model = CTRLLMHeadModel.from_pretrained('ctrl')

Transformers (model initialization):

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

Both repositories provide powerful language models, but Transformers offers a more comprehensive toolkit for various NLP tasks. CTRL focuses specifically on the CTRL model, which may be advantageous for users primarily interested in that architecture. Transformers' broader scope comes with increased complexity but also greater flexibility and community support.

37,810

TensorFlow code and pre-trained models for BERT

Pros of BERT

  • More widely adopted and extensively studied in the NLP community
  • Offers pre-trained models for various languages and tasks
  • Smaller model size, making it more accessible for fine-tuning on limited hardware

Cons of BERT

  • Limited to bidirectional context understanding
  • Fixed input sequence length, which can be restrictive for certain tasks
  • Less suitable for open-ended text generation compared to CTRL

Code Comparison

BERT example:

from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

CTRL example:

from transformers import CTRLTokenizer, CTRLModel
tokenizer = CTRLTokenizer.from_pretrained('ctrl')
model = CTRLModel.from_pretrained('ctrl')
inputs = tokenizer("Temperature:", return_tensors="pt")
outputs = model.generate(inputs.input_ids, max_length=50)

Both repositories provide powerful language models, but they serve different purposes. BERT excels in understanding context and is widely used for various NLP tasks, while CTRL offers more control over text generation and is better suited for open-ended content creation.

30,129

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Pros of fairseq

  • More comprehensive toolkit with support for various NLP tasks
  • Active development and frequent updates
  • Extensive documentation and examples

Cons of fairseq

  • Steeper learning curve due to its broader scope
  • May require more computational resources for some tasks

Code Comparison

CTRL:

from transformers import CTRLTokenizer, CTRLModel
tokenizer = CTRLTokenizer.from_pretrained("ctrl")
model = CTRLModel.from_pretrained("ctrl")

fairseq:

from fairseq.models.transformer import TransformerModel
model = TransformerModel.from_pretrained('path/to/checkpoint')

Key Differences

  • CTRL focuses on controllable text generation, while fairseq offers a wider range of NLP tasks
  • CTRL uses a single large model, whereas fairseq provides modular components for customization
  • fairseq has a more active community and frequent updates

Use Cases

  • CTRL: Ideal for projects requiring fine-grained control over text generation
  • fairseq: Better suited for diverse NLP tasks and research applications

Learning Resources

  • CTRL: Limited official documentation, relies more on community contributions
  • fairseq: Extensive official documentation, tutorials, and examples

Large-scale pretraining for dialogue

Pros of DialoGPT

  • Specifically designed for open-domain dialogue, making it more suitable for conversational AI applications
  • Trained on a large corpus of Reddit conversations, providing a diverse range of dialogue styles and topics
  • Offers pre-trained models of various sizes, allowing for flexibility in deployment based on resource constraints

Cons of DialoGPT

  • Limited to English language, whereas CTRL supports multiple languages
  • Lacks the fine-grained control over text generation that CTRL offers through its control codes
  • May produce less coherent responses in certain contexts due to its focus on casual online conversations

Code Comparison

DialoGPT:

from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

CTRL:

from transformers import CTRLTokenizer, CTRLModel
tokenizer = CTRLTokenizer.from_pretrained("ctrl")
model = CTRLModel.from_pretrained("ctrl")

Both repositories use the Hugging Face Transformers library for easy model loading and tokenization. The main difference lies in the specific model and tokenizer classes used, reflecting the unique architectures of each model.

8,204

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Pros of GPT-Neo

  • Open-source and more accessible for research and development
  • Supports training on custom datasets, allowing for specialized models
  • Implements more recent advancements in language model architecture

Cons of GPT-Neo

  • Generally smaller model sizes, potentially limiting performance on complex tasks
  • Less control over specific text generation attributes compared to CTRL
  • May require more computational resources for fine-tuning and deployment

Code Comparison

GPT-Neo:

import gpt_neo
from transformers import GPTNeoForCausalLM, GPT2Tokenizer

model = GPTNeoForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
tokenizer = GPT2Tokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")

CTRL:

import torch
from transformers import CTRLTokenizer, CTRLModel

tokenizer = CTRLTokenizer.from_pretrained("ctrl")
model = CTRLModel.from_pretrained("ctrl")

Both repositories provide powerful language models, but they differ in their approach and focus. GPT-Neo offers more flexibility for custom training and implementation of recent advancements, while CTRL provides more control over text generation attributes. The choice between them depends on specific project requirements and available resources.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CTRL - A Conditional Transformer Language Model for Controllable Generation

Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong, and Richard Socher

Updates

Apr 20, 2020

We are adding a model card for CTRL! Please reach out if you have any questions about it.

Oct 31, 2019

Adding functionality to convert a model from TF to HuggingFace/Transformers in response to a request. To convert the checkpoint, simply install transformers via pip install transformers and run python -u convert_tf_to_huggingface_pytorch.py --tf <path_to_tensorflow_data_checkpoint> --pytorch <path_to_where_you_want_to_store_pytorch_checkpoint>

Then, to use this in HuggingFace:

# create folder and contents for HuggingFace/Transformers
mkdir custom_ctrl_model
cd custom_ctrl_model
mv <path_to_pytorch_checkpoint_from_above> .
wget -O config.json https://storage.googleapis.com/sf-ctrl/pytorch/ctrl-config.json
wget -O merges.txt https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-merges.txt
wget -O vocab.json https://raw.githubusercontent.com/salesforce/ctrl/master/ctrl-vocab.json

# run
python examples/run_generation.py  --model_type ctrl --model_name <path_to_custom_ctrl_model>/ --temperature 0 --repetition 1.2

Oct 21, 2019

CTRL is now in hugginface/transformers!

You can simply follow the installation instructions and run:

python examples/run_generation.py  --model_type ctrl --model_name ctrl --temperature 0 --repetition 1.2

Sep 25, 2019

Two more additions:

  1. We add the code to fine-tune the model on a custom dataset in the training_utils folder. Please refer to the README within the folder for details and example usage.

  2. You can get a 36-layer model from gs://sf-ctrl/seqlen256_36layers_v0.ckpt/; the generation of this model is markedly worse than the 48-layer (base) model but still quite coherent.

Sep 23, 2019

The repo now supports (experimental) inference on PyTorch; Collaboratory: https://colab.research.google.com/drive/1nDh3ayRPJGK5ciPO2D3TFkYZFqclBWHY. Simply install PyTorch via pip install torch and run python pytorch_generation.py with the same flags as the base generation.py script except one exception: unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example). The code will convert the weights from TensorFlow in the first run and then create a loadable checkpoint for easier subsequent loading. You still need Tensorflow installed for the first step.

Sep 19, 2019

You should now be able to run inference on K80/T4/P100/similar GPUs using the lower_memory branch. We quantized certain weights to fp16 which reduced memory usage. Simply clone the repo and git checkout lower_memory. Here is a collaboratory link that demonstrates this functionality: https://colab.research.google.com/drive/1hVveBQShDru1Mjnhe4C21uQv4A2eH1tV

This functionality is being tested, please file GitHub issues if you see something aberrent. We still recommend using the full model if possible. Once the functionality has been sufficiently tested, we will update the repo and merge into master.

Two quick notes: (1) Unlike the base version, here, the model_path requires the path to the .data file and not just the ckpt folder (see collaboratory for example), (2) the first generation is slow because of overhead in setting up the model but the subsequent ones should be fast.

Introduction

Large-scale language models show promising text generation capabilities, but users cannot easily control this generation process. We release CTRL, a 1.6 billion-parameter conditional transformer language model, trained to condition on control codes that specify domain, subdomain, entities, relationships between entities, dates, and task-specific behavior. Control codes were derived from structure that naturally co-occurs with raw text, preserving the advantages of unsupervised learning while providing more explicit control over text generation.

Paper link: https://arxiv.org/abs/1909.05858

Blog link: https://blog.einstein.ai/introducing-a-conditional-transformer-language-model-for-controllable-generation/

The code currently supports two functionalities:

  1. Generating from a trained model, two models are available for download - one with a sequence length of 256 and another with a sequence length of 512 -- they are trained with word-level vocabularies and through a sliding window approach can generate well beyond their trained sequence lengths.
  2. Source attribution - given a prompt, prints the perplexity of the prompt conditional on each domain control code (see Section 5 of the paper).

Please refer to the argument flags for more details regarding the options available for either.

Table of Contents

  1. Citation
  2. License
  3. Questions for Deliberation
  4. Usage
  5. Sample Generations
  6. Sample Source Attributions
  7. FAQs
  8. Get Involved

Citation

@article{keskarCTRL2019,
  title={{CTRL - A Conditional Transformer Language Model for Controllable Generation}},
  author={Keskar, Nitish Shirish and McCann, Bryan and Varshney, Lav and Xiong, Caiming and Socher, Richard},
  journal={arXiv preprint arXiv:1909.05858},
  year={2019}
}

License

The code is released under the BSD-3 License (see LICENSE.txt for details), but we also ask that users respect the following:

This software should not be used to promote or profit from:

violence, hate, and division,

environmental destruction,

abuse of human rights, or

the destruction of people's physical and mental health.

We encourage users of this software to tell us about the applications in which they are putting it to use by emailing ctrl-monitoring@salesforce.com, and to use appropriate documentation when developing high-stakes applications of this model.

Questions for Deliberation

We consulted extended members of the AI community in the responsible publication of this model. In particular, a preview of a Partnership on AI (PAI) project relating to AI research publication norms was considered prior to the release of this work. While this PAI project is as-yet unpublished, it is informed by companies, organizations, and people differently affected by artificial intelligence and presents key considerations to evaluate before publishing potentially high-impact research.

The questions referenced from the early draft of the PAI project included:

  1. How do you envision your research being used in the world? Who will use it? How much expertise is required to use it?
  2. Who will use it?
  3. Why would they be motivated to replicate / productionize your work?
  4. How would a science fiction author turn your research into a dystopian story?
  5. What is the worst way someone could use your research finding, given no resource constraints?
  6. What are the historical patterns of misuse or application in this area? How can the research be made more robust against such misuse?
  7. Which populations or communities will this technology negatively affect, deployed in the scenarios you envision? Will some groups be disproportionately affected?

Usage

Here are the steps to get generating:

  1. Install the dependencies

This code relies on TensorFlow 1.14 and fastBPE.

TensorFlow can be installed via pip install tensorflow[-gpu]==1.14. fastBPE installation instructions can be found in the GitHub repository linked above. We highly recommend experimenting within a virtualenv or Docker image.

For inference on PyTorch, please see the update on Sep 23 at the top of this README. If you use PyTorch, you can skip Step 2.

  1. Patch the /usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/keras.py (or equivalent, if installed elsewhere) by running

patch -b <path_to_tensorflow_estimator_package>/python/estimator/keras.py estimator.patch

We highly recommend experimenting within a virtualenv or Docker image since the workflow involves patching a TensorFlow file to support some custom functionality. This step is not optional; skipping this step will cause errors (irrespective of device).

If you run into OOM issues because of GPU memory exhaustion, please use the lower_memory branch. See the (Sep 19, 2019) update at the top of this README for details.

  1. Get the model files from gs://sf-ctrl/seqlen256_v1.ckpt/ or gs://sf-ctrl/seqlen512_v1.ckpt/.

A 36-layer model is also available at gs://sf-ctrl/seqlen256_36layers_v0.ckpt/.

The model architecture is identical for both checkpoints. The former is trained with lower training sequence length (256) while the latter is trained with a larger one (512). We plan to update the models (with the appropriate version tags) as we continue to train them longer and on more data. Our current recommendation is to use the 256_v1 model unless you have a strong reason not to. If you have no preference for domain, Links is always a good first choice.

With gsutil installed, you can simply run gsutil -m cp -r gs://sf-ctrl/seqlen256_v1.ckpt/ . for copying the model checkpoint over.

Without gsutil, you can follow the route recommended @ https://github.com/salesforce/ctrl/issues/7#issuecomment-531303214

  1. Run the generation script generation.py or the source attribution script source_attribution.py.

The generation.py prompts the user to input text and then prints the continuation. The source_attribution.py promps the user to input text and then prints a sorted list of domains and the perplexity of the text conditional on each individual domain.

Generations

The generations and attributions computed below have been generated using the 256 sequence length model. Comparable results can be obtained from the 512 version of the model as well. We demonstrate only a few of the functionalities, especially the control codes. For a complete list of the control codes, and how to use them, please refer to the paper. Note that <GENERATION_BEGINS> is only included for demonstrative purposes and is not actually generated by the model.

  1. Links
Links In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

<GENERATION_BEGINS> 

The scientists were able to track down the unicorn population by analyzing their DNA. They found out that they are descended from an ancient species called the “Tapiri,” which is believed to have been extinct for over 10,000 years.

 According to scientists, this discovery proves that there may be other undiscovered creatures on our planet.

 “We believe these animals could still exist and we hope to find them one day,” said Dr. Carlos Jaramillo, director of the National Geographic Society’s Center for Biodiversity Research at the University of Arizona.

 Scientists also confirmed that the Unicorn Genome Project has already identified several genes associated with the animal’s unique horn structure.

 “The genome project will help us understand how the animal evolved its horns so it can better protect itself against predators like jaguars or pumas.” 
 
 Researchers say that if all goes well, the new genetic information should allow them to create a 3D model of the animal.

 “It would give us a much clearer picture about what makes the creature unique.”- Dr David Sillito, lead researcher ...

As a generate note, you don't have to use greedy sampling. You can switch to topk or nucleus through the appropriate argument flags.

Running it with --temperature 0.2 and --topk 5 yields:

Links In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

 The scientists were able to track down and capture one of these unicorn herds as it roamed through the valley for several days. They then took DNA samples from each animal and compared them with other animals found throughout the region.

 Advertisement

 What they found is truly astounding.

 According to the scientists, there are approximately 50 different species of unicorn, all of which live on two continents — North America and South America...
  1. Wikipedia
Wikipedia Salesforce Inc. is <GENERATION_BEGINS> a software company that provides cloud computing and business management software. It was founded in 1999 by Marc Benioff as an online retailer of enterprise software, but it has since expanded into other areas such as e-commerce, customer relationship management, and digital media...
  1. Reviews
Reviews Rating: 1.0 <GENERATION_BEGINS> 
 I bought this for my son who is a huge fan of the show. He was so excited to get it and when he opened it, we were all very disappointed. The quality of the product is terrible. It looks like something you would buy at a dollar store.
Rating: 1.0
 This item arrived in poor condition. There are several scratches on the front cover as well as some other minor damage...
Reviews Rating: 4.0 <GENERATION_BEGINS> 
 I bought this for my husband and he loves it. He has a small wrist so it is hard to find watches that fit him well. This one fits perfectly.
 Rating: 4.0
 The watch looks great but the band was too big even though i had read other reviews about how large the bands were.
 Rating: 4.0
 It works well as an everyday watch. It's not very fancy or anything, just simple with no extra bells & whistles.
 
  1. Books
Books Weary with toil, I haste me to my bed, <GENERATION_BEGINS> 
 And sleep the slumber of a weary man.
 The morning sun is up, and shines on high;
 My bed is ready, but no friend appears.
 A little while ago he came in sight,
 But now he's gone, as if he'd never come.
 He left his hat upon the table there,
 His coat was hanging up behind the door;
 And, when at last I looked around, there stood
 No one beside me, nor within call.
 Oh, what shall I do? What can be done?
 I'm sure that some strange accident has happened.
 I'll go out into the street again,
 To seek for him, or call him back again.
 But, oh, how lonely it will seem
 When all are gone who once were near me!
  1. Reddit - r/running
Running Title: <GENERATION_BEGINS> I just ran my first 5k in under 30 minutes.
 Text: It was a pretty easy run, but it felt great.

 My goal is to run at least 10 miles per week for the next few months.

 I'm not sure if this qualifies as running or not, but it's definitely something that I've been wanting to do since high school.

 Thanks r/running!

 Edit: Thank you all so much for your kind words and advice.

 Edit 2: Wow, thanks guys!
  1. Reddit - r/fitness
Fitness Title: <GENERATION_BEGINS> How do you get your body to burn fat when you're not working out?
 Text: I know that the reason why people lose weight is because they are eating less and burning more calories than their bodies need but how does this work for someone who doesn't have a lot of time or energy to exercise?

 For example, if I'm at home doing nothing all day then I'll be able to eat whatever I want without feeling like I've eaten too much. But what happens after that?

 EDIT: Thanks everyone for the responses!
  1. Reddit - r/frugal
Saving Title: <GENERATION_BEGINS> How to get a free credit report
 Text: I have been trying for months now and it seems like no one is willing to give me one.

 Is there any way that you can just go online and fill out the forms yourself or do they require some sort of ID or something?

 Edit: Thanks everyone for your help!

 Edit 2: Thank you all so much for helping me!
  1. Reddit - r/personalfinance
Finance Title: <GENERATION_BEGINS> I have a question about my credit score.
 Text: So I'm currently in the process of applying for an auto loan and I've been told that if you don't pay your bills on time then they will not approve you.

 My questions are:

 1. How do I know when to start paying off debt?
 2. What is the best way to get out of debt without having to file bankruptcy?
 3. Is it possible to refinance my car loan?
 4. Should I just wait until after school starts so I can take advantage of lower interest rates?
 5. If so, how long should I wait?

 Thanks
  1. Questions
Questions Q: What is the capital of Australia? <GENERATION_BEGINS>
 A: Canberra
 Q: How many people live in Canberra?
 A: 650,000
  1. Translation
Translation English : This is a natural language processing model that aims to generate coherent text in a controllable manner. ; French : <GENERATION_BEGINS> 
Il s'agit d'un modèle de traitement du langage naturel qui vise à générer un texte cohérent et contrôlable.
Translation English : This is a natural language processing model that aims to generate coherent text in a controllable manner. ; German : <GENERATION_BEGINS> 
Es handelt sich um ein natürliches Textverarbeitungssystem, das auf eine einheitliche und kontrollierbare Erzeugung von Text abzielt.

Source Attributions

  1. I lost 10 lbs! Feeling great!
PROMPT: I lost 10 lbs! Feeling great!
Diet ppl = 28.960714
Weight ppl = 29.223865
Fitness ppl = 36.162671
...
  1. My landlord is suing me for unpaid rent
PROMPT: My landlord is suing me for unpaid rent
Legal ppl = 21.210965
Finance ppl = 24.619064
Saving ppl = 27.923208
...
  1. And then I saw him, the man in the mirror.
PROMPT: And then I saw him, the man in the mirror.
Horror ppl = 17.919299
Scary ppl = 18.587843
Writing ppl = 23.154564
...
  1. Anarchism is an anti-authoritarian political philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions.
PROMPT: Anarchism is an anti-authoritarian political philosophy that rejects hierarchies deemed unjust and advocates their replacement with self-managed, self-governed societies based on voluntary, cooperative institutions.
Wikipedia ppl = 34.446701
News ppl = 34.484165
Links ppl = 35.460126
...
  1. I love God
PROMPT: I love God
Christianity ppl = 55.653985
Atheism ppl = 116.811038
Confessions ppl = 133.619834
...

FAQs

(We hope to update this section frequently).

  1. Will you be releasing the training code and data?

We plan to release the training code soon. Please refer to the update on Sep 25 for details on training code.

We will not be releasing the training data, but we will release tips and scripts related to data collection.

  1. Is a version of the model available in PyTorch?

Not at the moment, but if we come across an equivalent implementation, we will update this section. Please refer to the update on Sep 23 for inference on PyTorch.

  1. The code errors out.

Make sure that you have performed the patch as described above. If the error persists, please create a GitHub issue.

  1. The code generates non-sense irrespective of the prompt.

Make sure that you have (a) provided the right --model_dir and that the folder actually exists and has the checkpoint, (b) provided a valid source code as the first token, and (c) tried generating with a simple prompt such as Links I or Books From. If the error persists, please create a GitHub issue.

Get Involved

Please create a GitHub issue if you have any questions, suggestions, requests or bug-reports. We welcome PRs!