char-rnn-tensorflow
Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
Top Related Projects
Models and examples built with TensorFlow
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
Quick Overview
The sherjilozair/char-rnn-tensorflow
repository is a TensorFlow implementation of a character-level recurrent neural network (RNN) for text generation. It allows users to train a model on a corpus of text and then generate new text that mimics the style and content of the original text.
Pros
- Flexible and Customizable: The project provides a modular and configurable architecture, allowing users to easily experiment with different model configurations, hyperparameters, and training data.
- Comprehensive Documentation: The repository includes detailed documentation, including instructions for training, generating text, and evaluating the model's performance.
- Active Development: The project is actively maintained, with regular updates and bug fixes.
- Supports Multiple Datasets: The project can be used with a variety of text datasets, including books, articles, and scripts.
Cons
- Limited to Character-level Modeling: The project is focused on character-level text generation, which may not be suitable for all use cases that require higher-level semantic understanding.
- Computational Complexity: Training large-scale character-level RNNs can be computationally intensive and may require significant hardware resources, especially for longer sequences.
- Potential for Biased or Offensive Output: As with any text generation model, the output of the char-rnn-tensorflow model may reflect biases or offensive content present in the training data.
- Lack of Automatic Evaluation Metrics: The project does not provide built-in support for automatic evaluation of the generated text, requiring users to manually assess the quality and coherence of the output.
Code Examples
Here are a few code examples from the sherjilozair/char-rnn-tensorflow
repository:
- Training the Model:
import tensorflow as tf
from char_rnn_tensorflow.model import CharRNN
model = CharRNN(
num_classes=len(vocab),
batch_size=64,
num_steps=50,
lstm_size=128,
num_layers=2,
learning_rate=0.002,
)
model.train(
data_loader,
num_epochs=50,
save_every=1000,
log_every=10,
sample_every=1000,
checkpoint_dir='./checkpoints'
)
This code sets up a CharRNN model and trains it on the provided data, saving checkpoints and generating sample text at regular intervals.
- Generating Text:
import tensorflow as tf
from char_rnn_tensorflow.model import CharRNN
model = CharRNN(
num_classes=len(vocab),
batch_size=1,
num_steps=1,
lstm_size=128,
num_layers=2,
sampling=True,
)
model.load('./checkpoints/model.ckpt')
print(model.sample(100, vocab, prime='The '))
This code loads a pre-trained CharRNN model and uses it to generate 100 characters of text, primed with the phrase "The ".
- Evaluating the Model:
import tensorflow as tf
from char_rnn_tensorflow.model import CharRNN
model = CharRNN(
num_classes=len(vocab),
batch_size=64,
num_steps=50,
lstm_size=128,
num_layers=2,
)
model.load('./checkpoints/model.ckpt')
perplexity = model.perplexity(data_loader)
print(f'Perplexity: {perplexity:.2f}')
This code loads a pre-trained CharRNN model and evaluates its performance on the provided data using perplexity as the metric.
Getting Started
To get started with the sherjilozair/char-rnn-tensorflow
project, follow these steps:
- Clone the repository:
git clone https://github.com/sherjilozair/char-rnn-tensorflow.git
- Install the required dependencies:
cd char-rnn-tensorflow
pip install -r requirements.txt
- Prepare your training data:
- The project expects the
Competitor Comparisons
Models and examples built with TensorFlow
Pros of tensorflow/models
- Comprehensive collection of state-of-the-art models and examples for various machine learning tasks
- Active community with frequent updates and contributions
- Extensive documentation and tutorials for each model
Cons of tensorflow/models
- Larger codebase and more complex to navigate compared to a single-purpose project like char-rnn-tensorflow
- May require more setup and configuration to get a specific model running
Code Comparison
tensorflow/models (Transformer model):
def create_masks(inp, tar):
# Encoder padding mask
enc_padding_mask = create_padding_mask(inp)
# Used in the 2nd attention block in the decoder.
# This padding mask is used to mask the encoder outputs.
dec_padding_mask = create_padding_mask(inp)
# Used in the 1st attention block in the decoder.
# It is used to pad and mask future tokens in the input received by
# the decoder.
look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
dec_target_padding_mask = create_padding_mask(tar)
combined_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)
return enc_padding_mask, combined_mask, dec_padding_mask
sherjilozair/char-rnn-tensorflow (Character-level RNN):
def sample(self, n=200, prime='The '):
"""
Generate sample text (n characters) from the model.
"""
states = self.sess.run(self.initial_state)
x = np.array([self.char_to_ix[c] for c in prime])
txt = prime
for i in range(n):
feed = {self.x: x[None, None], self.initial_state: states}
preds, states = self.sess.run([self.proba, self.final_state], feed)
p = preds[0, -1]
x = np.random.choice(len(p), p=p)
txt += self.ix_to_char[x]
return txt
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
Pros of char-rnn
- Supports multiple programming languages (Python, Lua, and Torch)
- Provides a more comprehensive set of features, including support for different types of RNNs (LSTM, GRU) and optimization methods
- Includes more detailed documentation and examples
Cons of char-rnn
- Requires more setup and configuration compared to char-rnn-tensorflow
- May be more complex for beginners to understand and use
- Potentially slower performance due to the overhead of the Torch framework
Code Comparison
Here's a brief comparison of the code for training a character-level language model in both repositories:
char-rnn-tensorflow:
model = CharRNNModel(
num_classes=len(vocab),
batch_size=batch_size,
num_steps=num_steps,
lstm_size=lstm_size,
num_layers=num_layers,
learning_rate=learning_rate)
model.train(data_loader.train_data, data_loader.valid_data)
char-rnn:
local opt = {
data_dir = 'data/tinyshakespeare',
rnn_size = 128,
num_layers = 2,
batch_size = 50,
seq_length = 50,
num_epochs = 50,
learning_rate = 2e-3,
decay_rate = 0.97,
dropout = 0.5,
}
local model = require 'model'(opt)
model:train()
The char-rnn-tensorflow code is more concise and straightforward, while the char-rnn code is more verbose and requires more configuration options. However, the char-rnn code provides more flexibility and customization options.
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
Pros of TensorFlow-Examples
- Covers a wide range of TensorFlow examples, from basic to advanced, making it a comprehensive resource for learning and experimentation.
- Provides clear and well-documented code, making it easier for beginners to understand and follow along.
- Includes examples for various machine learning tasks, such as classification, regression, and generative models.
Cons of TensorFlow-Examples
- May not be as focused or specialized as char-rnn-tensorflow, which is dedicated to a specific task (character-level language modeling).
- The examples may not be as up-to-date with the latest TensorFlow versions and best practices compared to a more focused project.
- The project may not have the same level of active maintenance and community support as a more specialized repository.
Code Comparison
Here's a brief comparison of the code structure between the two repositories:
char-rnn-tensorflow
def build_rnn_graph(num_classes, num_seqs=50, num_steps=50):
"""
Builds the computation graph for the character-level language model.
"""
# ...
cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
initial_state = cell.zero_state(num_seqs, tf.float32)
# ...
TensorFlow-Examples
def conv_net(x_dict, n_classes, dropout, reuse, is_training):
"""
Convolution Neural Network Model.
"""
# Create the network
with tf.variable_scope('ConvNet', reuse=reuse):
# Convolution Layer
conv1 = tf.layers.conv2d(x_dict['images'], 32, 5, activation=tf.nn.relu)
# Max Pooling (down-sampling)
conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
# ...
The char-rnn-tensorflow code focuses on building a character-level language model using an LSTM-based recurrent neural network, while the TensorFlow-Examples code demonstrates a convolutional neural network for image classification.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
char-rnn-tensorflow
Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow.
Inspired from Andrej Karpathy's char-rnn.
Requirements
Basic Usage
To train with default parameters on the tinyshakespeare corpus, run python train.py
. To access all the parameters use python train.py --help
.
To sample from a checkpointed model, python sample.py
.
Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU.
To force CPU mode, use export CUDA_VISIBLE_DEVICES=""
and unset CUDA_VISIBLE_DEVICES
afterward
(resp. set CUDA_VISIBLE_DEVICES=""
and set CUDA_VISIBLE_DEVICES=
on Windows).
To continue training after interruption or to run on more epochs, python train.py --init_from=save
Datasets
You can use any plain text file as input. For example you could download The complete Sherlock Holmes as such:
cd data
mkdir sherlock
cd sherlock
wget https://sherlock-holm.es/stories/plain-text/cnus.txt
mv cnus.txt input.txt
Then start train from the top level directory using python train.py --data_dir=./data/sherlock/
A quick tip to concatenate many small disparate .txt
files into one large training file: ls *.txt | xargs -L 1 cat >> input.txt
.
Tuning
Tuning your models is kind of a "dark art" at this point. In general:
- Start with as much clean input.txt as possible e.g. 50MiB
- Start by establishing a baseline using the default settings.
- Use tensorboard to compare all of your runs visually to aid in experimenting.
- Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
- Tweak --num_layers from 2 to 3 but no higher unless you have experience.
- Tweak --seq_length up from 50 based on the length of a valid input string (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
- Finally once you've done all that, only then would I suggest adding some dropout. Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.
Tensorboard
To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your log_dir
. E.g.:
$ tensorboard --logdir=./logs/
Then open a browser to http://localhost:6006 or the correct IP/Port specified.
Roadmap
- Add explanatory comments
- Expose more command-line arguments
- Compare accuracy and performance with char-rnn
- More Tensorboard instrumentation
Contributing
Please feel free to:
- Leave feedback in the issues
- Open a Pull Request
- Join the gittr chat
- Share your success stories and data sets!
Top Related Projects
Models and examples built with TensorFlow
Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot