keras-rl

Deep Reinforcement Learning for Keras.

5,542

1,365

5,542

View on GitHub

Top Related Projects

baselines

16,242

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

stable-baselines

4,257

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

agents

2,901

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

pfrl

1,226

PFRL: a PyTorch-based deep reinforcement learning library

garage

1,965

A toolkit for reproducible reinforcement learning research.

Quick Overview

keras-rl is a deep reinforcement learning library that extends Keras, a popular deep learning framework. It provides implementations of various reinforcement learning algorithms, allowing users to easily train and evaluate RL agents on different environments. The library is designed to be modular and extensible, making it suitable for both research and practical applications.

Pros

Easy integration with Keras and TensorFlow
Implements a wide range of popular RL algorithms
Supports both discrete and continuous action spaces
Highly customizable and extensible architecture

Cons

Limited documentation and examples
Not actively maintained (last update was in 2019)
May not be compatible with the latest versions of Keras and TensorFlow
Lacks some advanced RL techniques and algorithms

Code Examples

Creating and training a DQN agent:

from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

model = create_model(num_actions)
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=num_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

Defining a custom environment:

from rl.core import Env

class CustomEnv(Env):
    def step(self, action):
        # Implement environment dynamics
        return next_state, reward, done, {}

    def reset(self):
        # Reset environment to initial state
        return initial_state

    def render(self, mode='human'):
        # Implement rendering logic
        pass

Using a pre-trained agent:

from rl.agents import DQNAgent

dqn = DQNAgent.load('dqn_weights.h5f', model)
dqn.test(env, nb_episodes=5, visualize=True)

Getting Started

To get started with keras-rl, follow these steps:

Install the library:

pip install keras-rl

Import required modules:

import gym
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

Create an environment and model:

env = gym.make('CartPole-v0')
model = Sequential([
    Flatten(input_shape=(1,) + env.observation_space.shape),
    Dense(16),
    Activation('relu'),
    Dense(16),
    Activation('relu'),
    Dense(16),
    Activation('relu'),
    Dense(env.action_space.n),
    Activation('linear')
])

Create and train an agent:

memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=env.action_space.n, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

Competitor Comparisons

baselines

16,242

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Pros of baselines

More comprehensive set of RL algorithms implemented
Better documentation and examples
Active development and maintenance by OpenAI

Cons of baselines

Steeper learning curve for beginners
Less focus on integration with Keras

Code Comparison

baselines:

from baselines import deepq
from baselines.common.atari_wrappers import wrap_deepmind

env = wrap_deepmind(gym.make("PongNoFrameskip-v4"))
model = deepq.learn(env, network='cnn', total_timesteps=100000)

keras-rl:

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

model = create_model(input_shape, nb_actions)
memory = SequentialMemory(limit=50000, window_length=1)
policy = EpsGreedyQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, policy=policy)

Both repositories provide implementations of reinforcement learning algorithms, but baselines offers a wider range of algorithms and more extensive documentation. However, keras-rl may be more suitable for those already familiar with Keras and looking for a simpler integration. The code examples demonstrate the different approaches to implementing a DQN agent in each library.

stable-baselines

4,257

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Pros of stable-baselines

More comprehensive and actively maintained library
Supports a wider range of algorithms and environments
Better documentation and community support

Cons of stable-baselines

Steeper learning curve for beginners
Requires more computational resources for some algorithms

Code Comparison

stable-baselines:

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

env = DummyVecEnv([lambda: gym.make("CartPole-v1")])
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

keras-rl:

from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))

memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=2)

The code comparison shows that stable-baselines offers a more concise and straightforward implementation, while keras-rl requires more manual setup and configuration.

agents

2,901

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Pros of TF-Agents

More comprehensive and actively maintained
Better integration with TensorFlow ecosystem
Supports both TensorFlow and TensorFlow Probability

Cons of TF-Agents

Steeper learning curve for beginners
More complex setup and configuration
Potentially slower development for simple projects

Code Comparison

keras-rl:

from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=BoltzmannQPolicy())
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

TF-Agents:

from tf_agents.agents.dqn import dqn_agent
from tf_agents.networks import q_network
from tf_agents.replay_buffers import tf_uniform_replay_buffer

q_net = q_network.QNetwork(train_env.observation_spec(), train_env.action_spec())
agent = dqn_agent.DqnAgent(train_env.time_step_spec(), train_env.action_spec(), q_network=q_net,
                           optimizer=tf.compat.v1.train.AdamOptimizer(learning_rate=1e-3))

pfrl

1,226

PFRL: a PyTorch-based deep reinforcement learning library

Pros of pfrl

More comprehensive and up-to-date implementation of reinforcement learning algorithms
Better support for distributed training and parallel environments
More extensive documentation and examples

Cons of pfrl

Steeper learning curve due to its more complex architecture
Less integration with Keras, which may be a drawback for Keras users
Requires more setup and configuration compared to keras-rl

Code Comparison

keras-rl example:

from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

model = create_model()
memory = SequentialMemory(limit=50000, window_length=1)
dqn = DQNAgent(model=model, memory=memory, policy=BoltzmannQPolicy())
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

pfrl example:

import pfrl

q_func = create_q_function()
optimizer = torch.optim.Adam(q_func.parameters(), lr=1e-3)
replay_buffer = pfrl.replay_buffers.ReplayBuffer(capacity=10**6)
explorer = pfrl.explorers.LinearDecayEpsilonGreedy(start_epsilon=1.0, end_epsilon=0.1, decay_steps=50000)
agent = pfrl.agents.DoubleDQN(q_func, optimizer, replay_buffer, gamma=0.99, explorer=explorer)
pfrl.experiments.train_agent_with_evaluation(agent, env, steps=50000, eval_n_steps=None, eval_n_episodes=10)

garage

1,965

A toolkit for reproducible reinforcement learning research.

Pros of garage

More comprehensive and flexible framework for RL research
Supports a wider range of algorithms and environments
Better documentation and examples for advanced users

Cons of garage

Steeper learning curve for beginners
Less focus on simplicity and ease of use
Requires more setup and configuration

Code Comparison

keras-rl:

from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=BoltzmannQPolicy())
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

garage:

from garage import wrap_experiment
from garage.tf.algos import TRPO
from garage.tf.policies import GaussianMLPPolicy

@wrap_experiment
def trpo_cartpole(ctxt=None):
    policy = GaussianMLPPolicy(env_spec=env.spec, hidden_sizes=(32, 32))
    algo = TRPO(env=env, policy=policy, baseline=baseline, max_path_length=100)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Deep Reinforcement Learning for Keras

What is it?

keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seamlessly integrates with the deep learning library Keras.

Furthermore, keras-rl works with OpenAI Gym out of the box. This means that evaluating and playing around with different algorithms is easy.

Of course you can extend keras-rl according to your own needs. You can use built-in Keras callbacks and metrics or define your own. Even more so, it is easy to implement your own environments and even algorithms by simply extending some simple abstract classes. Documentation is available online.

What is included?

As of today, the following algorithms have been implemented:

Deep Q Learning (DQN) [1], [2]
Double DQN [3]
Deep Deterministic Policy Gradient (DDPG) [4]
Continuous DQN (CDQN or NAF) [6]
Cross-Entropy Method (CEM) [7], [8]
Dueling network DQN (Dueling DQN) [9]
Deep SARSA [10]
Asynchronous Advantage Actor-Critic (A3C) [5]
Proximal Policy Optimization Algorithms (PPO) [11]

You can find more information on each agent in the doc.

Installation

Install Keras-RL from Pypi (recommended):

pip install keras-rl

Install from Github source:

git clone https://github.com/keras-rl/keras-rl.git
cd keras-rl
python setup.py install

Examples

If you want to run the examples, you'll also have to install:

gym by OpenAI: Installation instruction
h5py: simply run pip install h5py

For atari example you will also need:

Pillow: pip install Pillow
gym[atari]: Atari module for gym. Use pip install gym[atari]

Once you have installed everything, you can try out a simple example:

python examples/dqn_cartpole.py

This is a very simple example and it should converge relatively quickly, so it's a great way to get started! It also visualizes the game during training, so you can watch it learn. How cool is that?

Some sample weights are available on keras-rl-weights.

If you have questions or problems, please file an issue or, even better, fix the problem yourself and submit a pull request!

External Projects

Starcraft II Learning Environment

You're using Keras-RL on a project? Open a PR and share it!

Visualizing Training Metrics

To see graphs of your training progress and compare across runs, run pip install wandb and add the WandbLogger callback to your agent's fit() call:

from rl.callbacks import WandbLogger

...

agent.fit(env, nb_steps=50000, callbacks=[WandbLogger()])

For more info and options, see the W&B docs.

Citing

If you use keras-rl in your research, you can cite it as follows:

@misc{plappert2016kerasrl,
    author = {Matthias Plappert},
    title = {keras-rl},
    year = {2016},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/keras-rl/keras-rl}},
}

References

Playing Atari with Deep Reinforcement Learning, Mnih et al., 2013
Human-level control through deep reinforcement learning, Mnih et al., 2015
Deep Reinforcement Learning with Double Q-learning, van Hasselt et al., 2015
Continuous control with deep reinforcement learning, Lillicrap et al., 2015
Asynchronous Methods for Deep Reinforcement Learning, Mnih et al., 2016
Continuous Deep Q-Learning with Model-based Acceleration, Gu et al., 2016
Learning Tetris Using the Noisy Cross-Entropy Method, Szita et al., 2006
Deep Reinforcement Learning (MLSS lecture notes), Schulman, 2016
Dueling Network Architectures for Deep Reinforcement Learning, Wang et al., 2016
Reinforcement learning: An introduction, Sutton and Barto, 2011
Proximal Policy Optimization Algorithms, Schulman et al., 2017

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot