textgenrnn
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
Top Related Projects
An Open Source Machine Learning Framework for Everyone
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Code for the paper "Language Models are Unsupervised Multitask Learners"
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
Efficient, reusable RNNs and LSTMs for torch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Quick Overview
textgenrnn is a Python library for training and generating text using recurrent neural networks. It provides a simple interface for creating text generation models with customizable parameters, making it accessible for both beginners and advanced users. The library is built on top of Keras/TensorFlow and allows for easy fine-tuning of pre-trained models.
Pros
- Easy to use with minimal setup required
- Supports both character-level and word-level text generation
- Allows for fine-tuning of pre-trained models
- Provides options for customizing model architecture and training parameters
Cons
- Limited to recurrent neural networks, not supporting more recent architectures like Transformers
- May produce less coherent results compared to more advanced language models
- Requires significant computational resources for training large models
- Documentation could be more comprehensive for advanced use cases
Code Examples
- Basic text generation:
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.generate()
- Training a new model:
textgen = textgenrnn()
textgen.train_from_file('path/to/text/file.txt', num_epochs=10)
textgen.generate(5) # Generate 5 samples
- Fine-tuning a pre-trained model:
textgen = textgenrnn('textgenrnn_weights.hdf5')
textgen.train_from_file('new_data.txt', num_epochs=3)
textgen.generate_samples(temperature=0.5)
- Customizing model parameters:
textgen = textgenrnn(rnn_layers=3, rnn_size=128, rnn_bidirectional=True)
textgen.train_from_file('custom_data.txt', num_epochs=5)
textgen.generate(prefix='Once upon a time')
Getting Started
To get started with textgenrnn, follow these steps:
-
Install the library:
pip install textgenrnn
-
Import and create a textgenrnn object:
from textgenrnn import textgenrnn textgen = textgenrnn()
-
Generate text:
textgen.generate()
For training on custom data:
textgen.train_from_file('path/to/your/text/file.txt', num_epochs=10)
textgen.generate(5) # Generate 5 samples
For more advanced usage and customization options, refer to the project's documentation on GitHub.
Competitor Comparisons
An Open Source Machine Learning Framework for Everyone
Pros of TensorFlow
- Comprehensive, industry-standard deep learning framework
- Extensive ecosystem with tools, libraries, and community support
- Supports multiple programming languages and platforms
Cons of TensorFlow
- Steeper learning curve for beginners
- Can be overkill for simple text generation tasks
- Requires more setup and configuration
Code Comparison
TextGenRNN:
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_from_file('path/to/text.txt', num_epochs=1)
textgen.generate()
TensorFlow:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
tf.keras.layers.LSTM(units),
tf.keras.layers.Dense(vocab_size, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_train, y_train, epochs=1)
Summary
TextGenRNN is a specialized library for text generation, offering simplicity and ease of use. TensorFlow, on the other hand, is a powerful and versatile deep learning framework that provides more control and flexibility but requires more expertise to use effectively. TextGenRNN is suitable for quick text generation projects, while TensorFlow is better for complex, large-scale machine learning tasks.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Supports a wide range of state-of-the-art models and architectures
- Offers extensive documentation and community support
- Provides easy integration with PyTorch and TensorFlow
Cons of transformers
- Steeper learning curve for beginners
- Requires more computational resources for training and inference
- Can be overkill for simpler text generation tasks
Code comparison
textgenrnn:
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_from_file('input.txt', num_epochs=1)
textgen.generate()
transformers:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
input_ids = tokenizer.encode("Hello, I'm a language model,", return_tensors='pt')
output = model.generate(input_ids, max_length=50, num_return_sequences=1)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Code for the paper "Language Models are Unsupervised Multitask Learners"
Pros of GPT-2
- More advanced language model with better text generation capabilities
- Supports a wider range of tasks, including translation and summarization
- Larger model with more parameters, leading to higher quality outputs
Cons of GPT-2
- Requires more computational resources and training time
- More complex to fine-tune and deploy
- Potential for misuse due to its powerful generation capabilities
Code Comparison
textgenrnn:
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_on_texts(texts, num_epochs=1)
textgen.generate()
GPT-2:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")
input_ids = tokenizer.encode("Hello, I'm a language model,", return_tensors="pt")
output = model.generate(input_ids, max_length=50, num_return_sequences=1)
Both repositories offer text generation capabilities, but GPT-2 is more powerful and versatile. textgenrnn is simpler to use and requires less computational resources, making it suitable for smaller projects or quick experiments. GPT-2, on the other hand, provides state-of-the-art language modeling but requires more expertise and resources to implement effectively.
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
Pros of char-rnn
- Simpler architecture, easier to understand and modify
- More flexible, allowing for experimentation with different network structures
- Can be used for tasks beyond text generation (e.g., music composition)
Cons of char-rnn
- Requires more technical knowledge to set up and use
- Less user-friendly for beginners or those unfamiliar with deep learning
- May require more manual tuning to achieve optimal results
Code Comparison
char-rnn:
model = CharRNN(vocab_size, n_hidden=128, n_layers=2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
textgenrnn:
textgen = textgenrnn()
textgen.train_on_texts(texts, num_epochs=10)
textgen.generate(n=5, temperature=0.5)
Key Differences
- char-rnn provides a lower-level implementation, giving users more control over the model architecture and training process
- textgenrnn offers a higher-level API, making it easier for users to quickly generate text without deep understanding of the underlying model
- char-rnn is implemented in PyTorch, while textgenrnn uses Keras with a TensorFlow backend
- textgenrnn includes pre-trained models and weights, allowing for faster initial results
Efficient, reusable RNNs and LSTMs for torch
Pros of torch-rnn
- Written in Lua/Torch, offering potential performance benefits for certain use cases
- Provides more low-level control over the RNN architecture and training process
- Supports CUDA acceleration for faster training on GPUs
Cons of torch-rnn
- Less user-friendly, requiring more technical knowledge to set up and use
- Lacks some of the higher-level features and abstractions provided by textgenrnn
- Not actively maintained, with the last update several years ago
Code Comparison
torch-rnn:
local model = nn.Sequential()
model:add(nn.LookupTable(vocab_size, rnn_size))
model:add(nn.LSTM(rnn_size, rnn_size))
model:add(nn.Linear(rnn_size, vocab_size))
textgenrnn:
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_on_texts(texts, num_epochs=10)
textgen.generate()
The code snippets highlight the difference in complexity and abstraction level between the two libraries. torch-rnn requires manual model definition, while textgenrnn provides a high-level API for easy text generation.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Pros of PyTorch
- Comprehensive deep learning framework with extensive capabilities
- Large, active community and ecosystem of tools/libraries
- Flexible and intuitive for research and production
Cons of PyTorch
- Steeper learning curve for beginners
- More complex setup and configuration
- Requires more code for basic text generation tasks
Code Comparison
textgenrnn:
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.train_from_file('texts.txt', num_epochs=1)
textgen.generate()
PyTorch:
import torch
import torch.nn as nn
class TextGenerator(nn.Module):
def __init__(self):
super().__init__()
# Define model architecture
def forward(self, x):
# Implement forward pass
Summary
textgenrnn is a specialized library for text generation, offering simplicity and ease of use for beginners. PyTorch, on the other hand, is a comprehensive deep learning framework that provides more flexibility and power but requires more expertise to use effectively. textgenrnn is ideal for quick text generation projects, while PyTorch is better suited for complex, customizable deep learning tasks across various domains.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
textgenrnn
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly train on a text using a pretrained model.
textgenrnn is a Python 3 module on top of Keras/TensorFlow for creating char-rnns, with many cool features:
- A modern neural network architecture which utilizes new techniques as attention-weighting and skip-embedding to accelerate training and improve model quality.
- Train on and generate text at either the character-level or word-level.
- Configure RNN size, the number of RNN layers, and whether to use bidirectional RNNs.
- Train on any generic input text file, including large files.
- Train models on a GPU and then use them to generate text with a CPU.
- Utilize a powerful CuDNN implementation of RNNs when trained on the GPU, which massively speeds up training time as opposed to typical LSTM implementations.
- Train the model using contextual labels, allowing it to learn faster and produce better results in some cases.
You can play with textgenrnn and train any text file with a GPU for free in this Colaboratory Notebook! Read this blog post or watch this video for more information!
Examples
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.generate()
[Spoiler] Anyone else find this post and their person that was a little more than I really like the Star Wars in the fire or health and posting a personal house of the 2016 Letter for the game in a report of my backyard.
The included model can easily be trained on new texts, and can generate appropriate text even after a single pass of the input data.
textgen.train_from_file('hacker_news_2000.txt', num_epochs=1)
textgen.generate()
Project State Project Firefox
The model weights are relatively small (2 MB on disk), and they can easily be saved and loaded into a new textgenrnn instance. As a result, you can play with models which have been trained on hundreds of passes through the data. (in fact, textgenrnn learns so well that you have to increase the temperature significantly for creative output!)
textgen_2 = textgenrnn('/weights/hacker_news.hdf5')
textgen_2.generate(3, temperature=1.0)
Why we got money âregular alterâ
Urburg to Firefox acquires Nelf Multi Shamn
Kubernetes by Googleâs Bern
You can also train a new model, with support for word level embeddings and bidirectional RNN layers by adding new_model=True
to any train function.
Interactive Mode
It's also possible to get involved in how the output unfolds, step by step. Interactive mode will suggest you the top N options for the next char/word, and allows you to pick one.
When running textgenrnn in the terminal, pass interactive=True
and top=N
to generate
. N defaults to 3.
from textgenrnn import textgenrnn
textgen = textgenrnn()
textgen.generate(interactive=True, top_n=5)
This can add a human touch to the output; it feels like you're the writer! (reference)
Usage
textgenrnn can be installed from pypi via pip
:
pip3 install textgenrnn
For the latest textgenrnn, you must have a minimum TensorFlow version of 2.1.0.
You can view a demo of common features and model configuration options in this Jupyter Notebook.
/datasets
contains example datasets using Hacker News/Reddit data for training textgenrnn.
/weights
contains further-pretrained models on the aforementioned datasets which can be loaded into textgenrnn.
/outputs
contains examples of text generated from the above pretrained models.
Neural Network Architecture and Implementation
textgenrnn is based off of the char-rnn project by Andrej Karpathy with a few modern optimizations, such as the ability to work with very small text sequences.
The included pretrained-model follows a neural network architecture inspired by DeepMoji. For the default model, textgenrnn takes in an input of up to 40 characters, converts each character to a 100-D character embedding vector, and feeds those into a 128-cell long-short-term-memory (LSTM) recurrent layer. Those outputs are then fed into another 128-cell LSTM. All three layers are then fed into an Attention layer to weight the most important temporal features and average them together (and since the embeddings + 1st LSTM are skip-connected into the attention layer, the model updates can backpropagate to them more easily and prevent vanishing gradients). That output is mapped to probabilities for up to 394 different characters that they are the next character in the sequence, including uppercase characters, lowercase, punctuation, and emoji. (if training a new model on a new dataset, all of the numeric parameters above can be configured)
Alternatively, if context labels are provided with each text document, the model can be trained in a contextual mode, where the model learns the text given the context so the recurrent layers learn the decontextualized language. The text-only path can piggy-back off the decontextualized layers; in all, this results in much faster training and better quantitative and qualitative model performance than just training the model gien the text alone.
The model weights included with the package are trained on hundreds of thousands of text documents from Reddit submissions (via BigQuery), from a very diverse variety of subreddits. The network was also trained using the decontextual approach noted above in order to both improve training performance and mitigate authorial bias.
When fine-tuning the model on a new dataset of texts using textgenrnn, all layers are retrained. However, since the original pretrained network has a much more robust "knowledge" initially, the new textgenrnn trains faster and more accurately in the end, and can potentially learn new relationships not present in the original dataset (e.g. the pretrained character embeddings include the context for the character for all possible types of modern internet grammar).
Additionally, the retraining is done with a momentum-based optimizer and a linearly decaying learning rate, both of which prevent exploding gradients and makes it much less likely that the model diverges after training for a long time.
Notes
-
You will not get quality generated text 100% of the time, even with a heavily-trained neural network. That's the primary reason viral blog posts/Twitter tweets utilizing NN text generation often generate lots of texts and curate/edit the best ones afterward.
-
Results will vary greatly between datasets. Because the pretrained neural network is relatively small, it cannot store as much data as RNNs typically flaunted in blog posts. For best results, use a dataset with at least 2,000-5,000 documents. If a dataset is smaller, you'll need to train it for longer by setting
num_epochs
higher when calling a training method and/or training a new model from scratch. Even then, there is currently no good heuristic for determining a "good" model. -
A GPU is not required to retrain textgenrnn, but it will take much longer to train on a CPU. If you do use a GPU, I recommend increasing the
batch_size
parameter for better hardware utilization.
Future Plans for textgenrnn
-
More formal documentation
-
A web-based implementation using tensorflow.js (works especially well due to the network's small size)
-
A way to visualize the attention-layer outputs to see how the network "learns."
-
A mode to allow the model architecture to be used for chatbot conversations (may be released as a separate project)
-
More depth toward context (positional context + allowing multiple context labels)
-
A larger pretrained network which can accommodate longer character sequences and a more indepth understanding of language, creating better generated sentences.
-
Hierarchical softmax activation for word-level models (once Keras has good support for it).
-
FP16 for superfast training on Volta/TPUs (once Keras has good support for it).
Articles/Projects using textgenrnn
Articles
- Lifehacker: How to Train Your Own Neural Network by Beth Skwarecki
- New York Times: Let Our Algorithm Choose Your Halloween Costume by Janelle Shane
- CNN Business: This quirky experiment highlights AI's biggest challenges by Rachel Metz
Projects
- Tweet Generator â Train a neural network optimized for generating tweets based off of any number of Twitter users
- Hacker News Simulator â Twitter bot trained on 300,000+ Hacker News submissions using textgenrnn.
- SubredditRNN â Reddit Subreddit where all submitted content is from textgenrnn bots.
- Human-AI Collaborated Pizzas â Pizza recepies generated with textgenrnn and made in real life.
- Board Game Titles
- Video Game Discussion Forum Titles
- A.I Created Cakes
- AI Created Cookies
- AI Generated Songs
Tweets
- BuzzFeed YouTube Videos
- AWS Services
- Recipes + D&D Spells + Heavy Metal Names
- RPG Adventure Names
- The Onion + Cosmopolitan
- Google Conference Room Names
- Sith Lords
Maintainer/Creator
Max Woolf (@minimaxir)
Max's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.
Credits
Andrej Karpathy for the original proposal of the char-rnn via the blog post The Unreasonable Effectiveness of Recurrent Neural Networks.
Daniel Grijalva for contributing an interactive mode.
License
MIT
Attention-layer code used from DeepMoji (MIT Licensed)
Top Related Projects
An Open Source Machine Learning Framework for Everyone
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Code for the paper "Language Models are Unsupervised Multitask Learners"
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
Efficient, reusable RNNs and LSTMs for torch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot