char-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow

2,662

960

2,662

View on GitHub

Top Related Projects

models

77,497

Models and examples built with TensorFlow

char-rnn

11,804

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

TensorFlow-Examples

43,663

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Quick Overview

The sherjilozair/char-rnn-tensorflow repository is a TensorFlow implementation of a character-level recurrent neural network (RNN) for text generation. It allows users to train a model on a corpus of text and then generate new text that mimics the style and content of the original text.

Pros

Flexible and Customizable: The project provides a modular and configurable architecture, allowing users to easily experiment with different model configurations, hyperparameters, and training data.
Comprehensive Documentation: The repository includes detailed documentation, including instructions for training, generating text, and evaluating the model's performance.
Active Development: The project is actively maintained, with regular updates and bug fixes.
Supports Multiple Datasets: The project can be used with a variety of text datasets, including books, articles, and scripts.

Cons

Limited to Character-level Modeling: The project is focused on character-level text generation, which may not be suitable for all use cases that require higher-level semantic understanding.
Computational Complexity: Training large-scale character-level RNNs can be computationally intensive and may require significant hardware resources, especially for longer sequences.
Potential for Biased or Offensive Output: As with any text generation model, the output of the char-rnn-tensorflow model may reflect biases or offensive content present in the training data.
Lack of Automatic Evaluation Metrics: The project does not provide built-in support for automatic evaluation of the generated text, requiring users to manually assess the quality and coherence of the output.

Code Examples

Here are a few code examples from the sherjilozair/char-rnn-tensorflow repository:

Training the Model:

import tensorflow as tf
from char_rnn_tensorflow.model import CharRNN

model = CharRNN(
    num_classes=len(vocab),
    batch_size=64,
    num_steps=50,
    lstm_size=128,
    num_layers=2,
    learning_rate=0.002,
)

model.train(
    data_loader,
    num_epochs=50,
    save_every=1000,
    log_every=10,
    sample_every=1000,
    checkpoint_dir='./checkpoints'
)

This code sets up a CharRNN model and trains it on the provided data, saving checkpoints and generating sample text at regular intervals.

Generating Text:

import tensorflow as tf
from char_rnn_tensorflow.model import CharRNN

model = CharRNN(
    num_classes=len(vocab),
    batch_size=1,
    num_steps=1,
    lstm_size=128,
    num_layers=2,
    sampling=True,
)

model.load('./checkpoints/model.ckpt')
print(model.sample(100, vocab, prime='The '))

This code loads a pre-trained CharRNN model and uses it to generate 100 characters of text, primed with the phrase "The ".

Evaluating the Model:

import tensorflow as tf
from char_rnn_tensorflow.model import CharRNN

model = CharRNN(
    num_classes=len(vocab),
    batch_size=64,
    num_steps=50,
    lstm_size=128,
    num_layers=2,
)

model.load('./checkpoints/model.ckpt')
perplexity = model.perplexity(data_loader)
print(f'Perplexity: {perplexity:.2f}')

This code loads a pre-trained CharRNN model and evaluates its performance on the provided data using perplexity as the metric.

Getting Started

To get started with the sherjilozair/char-rnn-tensorflow project, follow these steps:

Clone the repository:

git clone https://github.com/sherjilozair/char-rnn-tensorflow.git

Install the required dependencies:

cd char-rnn-tensorflow
pip install -r requirements.txt

Prepare your training data:
- The project expects the

Competitor Comparisons

models

77,497

Models and examples built with TensorFlow

Pros of tensorflow/models

Comprehensive collection of state-of-the-art models and examples for various machine learning tasks
Active community with frequent updates and contributions
Extensive documentation and tutorials for each model

Cons of tensorflow/models

Larger codebase and more complex to navigate compared to a single-purpose project like char-rnn-tensorflow
May require more setup and configuration to get a specific model running

Code Comparison

tensorflow/models (Transformer model):

def create_masks(inp, tar):
    # Encoder padding mask
    enc_padding_mask = create_padding_mask(inp)

    # Used in the 2nd attention block in the decoder.
    # This padding mask is used to mask the encoder outputs.
    dec_padding_mask = create_padding_mask(inp)

    # Used in the 1st attention block in the decoder.
    # It is used to pad and mask future tokens in the input received by
    # the decoder.
    look_ahead_mask = create_look_ahead_mask(tf.shape(tar)[1])
    dec_target_padding_mask = create_padding_mask(tar)
    combined_mask = tf.maximum(dec_target_padding_mask, look_ahead_mask)

    return enc_padding_mask, combined_mask, dec_padding_mask

sherjilozair/char-rnn-tensorflow (Character-level RNN):

def sample(self, n=200, prime='The '):
    """
    Generate sample text (n characters) from the model.
    """
    states = self.sess.run(self.initial_state)
    x = np.array([self.char_to_ix[c] for c in prime])
    txt = prime
    for i in range(n):
        feed = {self.x: x[None, None], self.initial_state: states}
        preds, states = self.sess.run([self.proba, self.final_state], feed)
        p = preds[0, -1]
        x = np.random.choice(len(p), p=p)
        txt += self.ix_to_char[x]
    return txt

char-rnn

11,804

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

Pros of char-rnn

Supports multiple programming languages (Python, Lua, and Torch)
Provides a more comprehensive set of features, including support for different types of RNNs (LSTM, GRU) and optimization methods
Includes more detailed documentation and examples

Cons of char-rnn

Requires more setup and configuration compared to char-rnn-tensorflow
May be more complex for beginners to understand and use
Potentially slower performance due to the overhead of the Torch framework

Code Comparison

Here's a brief comparison of the code for training a character-level language model in both repositories:

char-rnn-tensorflow:

model = CharRNNModel(
    num_classes=len(vocab),
    batch_size=batch_size,
    num_steps=num_steps,
    lstm_size=lstm_size,
    num_layers=num_layers,
    learning_rate=learning_rate)
model.train(data_loader.train_data, data_loader.valid_data)

char-rnn:

local opt = {
  data_dir = 'data/tinyshakespeare',
  rnn_size = 128,
  num_layers = 2,
  batch_size = 50,
  seq_length = 50,
  num_epochs = 50,
  learning_rate = 2e-3,
  decay_rate = 0.97,
  dropout = 0.5,
}

local model = require 'model'(opt)
model:train()

The char-rnn-tensorflow code is more concise and straightforward, while the char-rnn code is more verbose and requires more configuration options. However, the char-rnn code provides more flexibility and customization options.

TensorFlow-Examples

43,663

TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)

Pros of TensorFlow-Examples

Covers a wide range of TensorFlow examples, from basic to advanced, making it a comprehensive resource for learning and experimentation.
Provides clear and well-documented code, making it easier for beginners to understand and follow along.
Includes examples for various machine learning tasks, such as classification, regression, and generative models.

Cons of TensorFlow-Examples

May not be as focused or specialized as char-rnn-tensorflow, which is dedicated to a specific task (character-level language modeling).
The examples may not be as up-to-date with the latest TensorFlow versions and best practices compared to a more focused project.
The project may not have the same level of active maintenance and community support as a more specialized repository.

Code Comparison

Here's a brief comparison of the code structure between the two repositories:

char-rnn-tensorflow

def build_rnn_graph(num_classes, num_seqs=50, num_steps=50):
    """
    Builds the computation graph for the character-level language model.
    """
    # ...
    cell = tf.nn.rnn_cell.BasicLSTMCell(rnn_size)
    initial_state = cell.zero_state(num_seqs, tf.float32)
    # ...

TensorFlow-Examples

def conv_net(x_dict, n_classes, dropout, reuse, is_training):
    """
    Convolution Neural Network Model.
    """
    # Create the network
    with tf.variable_scope('ConvNet', reuse=reuse):
        # Convolution Layer
        conv1 = tf.layers.conv2d(x_dict['images'], 32, 5, activation=tf.nn.relu)
        # Max Pooling (down-sampling)
        conv1 = tf.layers.max_pooling2d(conv1, 2, 2)
    # ...

The char-rnn-tensorflow code focuses on building a character-level language model using an LSTM-based recurrent neural network, while the TensorFlow-Examples code demonstrates a convolutional neural network for image classification.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

char-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow.

Inspired from Andrej Karpathy's char-rnn.

Requirements

Tensorflow 1.0

Basic Usage

To train with default parameters on the tinyshakespeare corpus, run python train.py. To access all the parameters use python train.py --help.

To sample from a checkpointed model, python sample.py. Sampling while the learning is still in progress (to check last checkpoint) works only in CPU or using another GPU. To force CPU mode, use export CUDA_VISIBLE_DEVICES="" and unset CUDA_VISIBLE_DEVICES afterward (resp. set CUDA_VISIBLE_DEVICES="" and set CUDA_VISIBLE_DEVICES= on Windows).

To continue training after interruption or to run on more epochs, python train.py --init_from=save

Datasets

You can use any plain text file as input. For example you could download The complete Sherlock Holmes as such:

cd data
mkdir sherlock
cd sherlock
wget https://sherlock-holm.es/stories/plain-text/cnus.txt
mv cnus.txt input.txt

Then start train from the top level directory using python train.py --data_dir=./data/sherlock/

A quick tip to concatenate many small disparate .txt files into one large training file: ls *.txt | xargs -L 1 cat >> input.txt.

Tuning

Tuning your models is kind of a "dark art" at this point. In general:

Start with as much clean input.txt as possible e.g. 50MiB
Start by establishing a baseline using the default settings.
Use tensorboard to compare all of your runs visually to aid in experimenting.
Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
Tweak --num_layers from 2 to 3 but no higher unless you have experience.
Tweak --seq_length up from 50 based on the length of a valid input string (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
Finally once you've done all that, only then would I suggest adding some dropout. Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.

Tensorboard

To visualize training progress, model graphs, and internal state histograms: fire up Tensorboard and point it at your log_dir. E.g.:

$ tensorboard --logdir=./logs/

Then open a browser to http://localhost:6006 or the correct IP/Port specified.

Roadmap

Add explanatory comments
Expose more command-line arguments
Compare accuracy and performance with char-rnn
More Tensorboard instrumentation

Contributing

Please feel free to:

Leave feedback in the issues
Open a Pull Request
Join the gittr chat
Share your success stories and data sets!

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot