torch-rnn

Efficient, reusable RNNs and LSTMs for torch

2,542

511

2,542

111

View on GitHub

Top Related Projects

char-rnn

11,804

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

char-rnn-tensorflow

2,662

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow

biaxial-rnn-music-composition

1,925

A recurrent neural network designed to generate classical music.

models

77,497

Models and examples built with TensorFlow

examples

23,172

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Quick Overview

torch-rnn is a character-level language model implemented in Torch. It uses recurrent neural networks (RNNs) to generate text one character at a time, allowing for the creation of synthetic text that mimics the style and structure of the training data. This project provides a flexible and efficient implementation of character-level RNNs for text generation tasks.

Pros

Efficient implementation using Torch, allowing for fast training and generation
Supports both CPU and GPU computation
Includes preprocessing scripts for preparing training data
Offers flexibility in model architecture and hyperparameters

Cons

Requires Torch, which is less popular than other deep learning frameworks
Limited documentation and examples for advanced usage
Not actively maintained (last commit was in 2017)
May require some technical expertise to set up and use effectively

Code Examples

Training a model:

require 'torch'
require 'nn'
require 'optim'

local model = require 'model'
local loader = require 'data.DataLoader'

local opt = {
    batch_size = 50,
    seq_length = 50,
    layers = 2,
    hidden_size = 128,
    dropout = 0,
    learning_rate = 2e-3,
    lr_decay_factor = 0.5,
    max_epochs = 50
}

local train_loader = loader.create(opt.batch_size, opt.seq_length)
local rnn = model.LSTM(train_loader.vocab_size, opt.hidden_size, opt.layers, opt.dropout)

model.train(rnn, train_loader, opt)

Generating text:

require 'torch'
require 'nn'

local model = require 'model'
local utils = require 'util.utils'

local rnn = torch.load('checkpoint.t7')
local sample = model.sample(rnn, 1000)
print(utils.decode_chars(sample))

Preprocessing data:

python scripts/preprocess.py \
    --input_txt my_input.txt \
    --output_h5 my_data.h5 \
    --output_json my_data.json

Getting Started

Install Torch and required dependencies:

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch && bash install-deps && ./install.sh

Clone the repository:

git clone https://github.com/jcjohnson/torch-rnn.git
cd torch-rnn

Preprocess your data:

python scripts/preprocess.py --input_txt my_input.txt --output_h5 my_data.h5 --output_json my_data.json

Train the model:

th train.lua -input_h5 my_data.h5 -input_json my_data.json

Generate text:

th sample.lua -checkpoint cv/checkpoint_10000.t7 -length 2000

Competitor Comparisons

char-rnn

11,804

Multi-layer Recurrent Neural Networks (LSTM, GRU, RNN) for character-level language models in Torch

Pros of char-rnn

Simpler implementation, making it easier to understand and modify
More widely used and recognized in the community
Supports both Torch and TensorFlow implementations

Cons of char-rnn

Generally slower training and generation compared to torch-rnn
Less optimized for GPU usage
Lacks some advanced features present in torch-rnn

Code Comparison

char-rnn:

local LSTM = {}
function LSTM:forward(input, prev_c, prev_h)
  local i2h = self.i2h:forward(input)
  local h2h = self.h2h:forward(prev_h)
  local all_input_sums = i2h + h2h

torch-rnn:

local LSTM = torch.class('LSTM')
function LSTM:updateOutput(input)
  self.cell:forward{input, self.prevOutput, self.prevCell}
  self.output = self.cell.output
  return self.output

The code snippets show that torch-rnn uses a more modular and optimized approach, potentially contributing to its improved performance. char-rnn's implementation is more straightforward, which can be beneficial for learning and customization purposes.

char-rnn-tensorflow

2,662

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow

Pros of char-rnn-tensorflow

Uses TensorFlow, which offers better GPU acceleration and distributed computing capabilities
More actively maintained with recent updates and contributions
Supports both Python 2 and Python 3

Cons of char-rnn-tensorflow

May have a steeper learning curve for those unfamiliar with TensorFlow
Potentially slower training speed on CPU compared to Torch-based implementations

Code Comparison

torch-rnn:

local LSTM = torch.class('LSTM')

function LSTM:__init(input_dim, hidden_dim)
  self.hidden_dim = hidden_dim
  self.input_dim = input_dim
end

char-rnn-tensorflow:

class CharRNN(object):
    def __init__(self, args):
        self.args = args
        if args.model == 'rnn':
            cell_fn = rnn_cell.BasicRNNCell
        elif args.model == 'gru':
            cell_fn = rnn_cell.GRUCell

The code snippets show the initialization of the RNN models in both implementations. torch-rnn uses Lua and defines an LSTM class, while char-rnn-tensorflow uses Python and allows for different RNN cell types (RNN or GRU) based on user input.

biaxial-rnn-music-composition

1,925

A recurrent neural network designed to generate classical music.

Pros of biaxial-rnn-music-composition

Specifically designed for music composition, offering more tailored features for this task
Implements a biaxial LSTM model, which can capture both time and pitch relationships in music
Provides tools for working with MIDI files and music-specific data preprocessing

Cons of biaxial-rnn-music-composition

Less general-purpose than torch-rnn, focusing primarily on music generation
May require more domain-specific knowledge to use effectively
Has fewer stars and forks on GitHub, potentially indicating a smaller community

Code Comparison

biaxial-rnn-music-composition:

def get_piece(self):
    while len(self.current_sequence) < self.num_timesteps:
        self.current_sequence = np.concatenate((self.current_sequence,
            self.predict_and_sample(self.current_sequence)))
    return self.current_sequence

torch-rnn:

local model = torch.load(opt.model)
local sample = model:sample(opt)
print(sample)

The biaxial-rnn-music-composition code shows a method for generating a music piece, while torch-rnn demonstrates a simpler sampling process for text generation.

models

77,497

Models and examples built with TensorFlow

Pros of TensorFlow Models

Broader scope with implementations of various ML models and techniques
Actively maintained by Google and a large community
Extensive documentation and tutorials

Cons of TensorFlow Models

Steeper learning curve due to its comprehensive nature
Potentially overwhelming for beginners focused on specific tasks
Larger codebase to navigate

Code Comparison

torch-rnn:

local LSTM = torch.class('LSTM')

function LSTM:__init(input_dim, hidden_dim)
  self.input_dim = input_dim
  self.hidden_dim = hidden_dim
  -- Initialize parameters
end

TensorFlow Models:

class LSTM(tf.keras.Model):
  def __init__(self, hidden_dim):
    super(LSTM, self).__init__()
    self.lstm = tf.keras.layers.LSTM(hidden_dim)
    self.dense = tf.keras.layers.Dense(1)

  def call(self, inputs):
    x = self.lstm(inputs)
    return self.dense(x)

Summary

torch-rnn is a focused implementation for character-level language models using RNNs in Torch, while TensorFlow Models is a comprehensive repository containing various ML models and techniques implemented in TensorFlow. torch-rnn may be more suitable for specific RNN tasks, while TensorFlow Models offers a wider range of options but with a steeper learning curve.

examples

23,172

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Pros of examples

Broader scope with multiple neural network architectures and tasks
Regularly updated with newer PyTorch features and best practices
More extensive documentation and community support

Cons of examples

Less focused on RNN-specific implementations
May require more setup and configuration for specific RNN tasks
Potentially overwhelming for beginners due to the variety of examples

Code Comparison

torch-rnn (Lua):

local LSTM = torch.class('LSTM')

function LSTM:__init(input_dim, hidden_dim)
  self.hidden_dim = hidden_dim
  self.input_dim = input_dim
end

examples (PyTorch):

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(LSTM, self).__init__()
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size)

Summary

examples offers a more comprehensive set of neural network examples and benefits from active development within the PyTorch ecosystem. However, torch-rnn provides a more focused implementation for RNN tasks, which may be preferable for specific use cases. The code comparison highlights the transition from Lua-based Torch to Python-based PyTorch, reflecting the broader shift in the deep learning community.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

torch-rnn

torch-rnn provides high-performance, reusable RNN and LSTM modules for torch7, and uses these modules for character-level language modeling similar to char-rnn.

You can find documentation for the RNN and LSTM modules here; they have no dependencies other than torch and nn, so they should be easy to integrate into existing projects.

Compared to char-rnn, torch-rnn is up to 1.9x faster and uses up to 7x less memory. For more details see the Benchmark section below.

Installation

Docker Images

Cristian Baldi has prepared Docker images for both CPU-only mode and GPU mode; you can find them here.

System setup

You'll need to install the header files for Python 2.7 and the HDF5 library. On Ubuntu you should be able to install like this:

sudo apt-get -y install python2.7-dev
sudo apt-get install libhdf5-dev

Python setup

The preprocessing script is written in Python 2.7; its dependencies are in the file requirements.txt. You can install these dependencies in a virtual environment like this:

virtualenv .env                  # Create the virtual environment
source .env/bin/activate         # Activate the virtual environment
pip install -r requirements.txt  # Install Python dependencies
# Work for a while ...
deactivate                       # Exit the virtual environment

Lua setup

The main modeling code is written in Lua using torch; you can find installation instructions here. You'll need the following Lua packages:

After installing torch, you can install / update these packages by running the following:

# Install most things using luarocks
luarocks install torch
luarocks install nn
luarocks install optim
luarocks install lua-cjson

# We need to install torch-hdf5 from GitHub
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec

CUDA support (Optional)

To enable GPU acceleration with CUDA, you'll need to install CUDA 6.5 or higher and the following Lua packages:

You can install / update them by running:

luarocks install cutorch
luarocks install cunn

OpenCL support (Optional)

To enable GPU acceleration with OpenCL, you'll need to install the following Lua packages:

cltorch
clnn

You can install / update them by running:

luarocks install cltorch
luarocks install clnn

OSX Installation

Jeff Thompson has written a very detailed installation guide for OSX that you can find here.

Usage

To train a model and use it to generate new text, you'll need to follow three simple steps:

Step 1: Preprocess the data

You can use any text file for training models. Before training, you'll need to preprocess the data using the script scripts/preprocess.py; this will generate an HDF5 file and JSON file containing a preprocessed version of the data.

If you have training data stored in my_data.txt, you can run the script like this:

python scripts/preprocess.py \
  --input_txt my_data.txt \
  --output_h5 my_data.h5 \
  --output_json my_data.json

This will produce files my_data.h5 and my_data.json that will be passed to the training script.

There are a few more flags you can use to configure preprocessing; read about them here

Step 2: Train the model

After preprocessing the data, you'll need to train the model using the train.lua script. This will be the slowest step. You can run the training script like this:

th train.lua -input_h5 my_data.h5 -input_json my_data.json

This will read the data stored in my_data.h5 and my_data.json, run for a while, and save checkpoints to files with names like cv/checkpoint_1000.t7.

You can change the RNN model type, hidden state size, and number of RNN layers like this:

th train.lua -input_h5 my_data.h5 -input_json my_data.json -model_type rnn -num_layers 3 -rnn_size 256

By default this will run in GPU mode using CUDA; to run in CPU-only mode, add the flag -gpu -1.

To run with OpenCL, add the flag -gpu_backend opencl.

There are many more flags you can use to configure training; read about them here.

Step 3: Sample from the model

After training a model, you can generate new text by sampling from it using the script sample.lua. Run it like this:

th sample.lua -checkpoint cv/checkpoint_10000.t7 -length 2000

This will load the trained checkpoint cv/checkpoint_10000.t7 from the previous step, sample 2000 characters from it, and print the results to the console.

By default the sampling script will run in GPU mode using CUDA; to run in CPU-only mode add the flag -gpu -1 and to run in OpenCL mode add the flag -gpu_backend opencl.

There are more flags you can use to configure sampling; read about them here.

Benchmarks

To benchmark torch-rnn against char-rnn, we use each to train LSTM language models for the tiny-shakespeare dataset with 1, 2 or 3 layers and with an RNN size of 64, 128, 256, or 512. For each we use a minibatch size of 50, a sequence length of 50, and no dropout. For each model size and for both implementations, we record the forward/backward times and GPU memory usage over the first 100 training iterations, and use these measurements to compute the mean time and memory usage.

All benchmarks were run on a machine with an Intel i7-4790k CPU, 32 GB main memory, and a Titan X GPU.

Below we show the forward/backward times for both implementations, as well as the mean speedup of torch-rnn over char-rnn. We see that torch-rnn is faster than char-rnn at all model sizes, with smaller models giving a larger speedup; for a single-layer LSTM with 128 hidden units, we achieve a 1.9x speedup; for larger models we achieve about a 1.4x speedup.

Below we show the GPU memory usage for both implementations, as well as the mean memory saving of torch-rnn over char-rnn. Again torch-rnn outperforms char-rnn at all model sizes, but here the savings become more significant for larger models: for models with 512 hidden units, we use 7x less memory than char-rnn.

TODOs

Get rid of Python / JSON / HDF5 dependencies?

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot