pfrl

PFRL: a PyTorch-based deep reinforcement learning library

1,226

159

1,226

View on GitHub

Top Related Projects

baselines

16,242

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

stable-baselines

4,257

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

agents

2,901

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

acme

3,661

A library of reinforcement learning components and agents

softlearning

1,293

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Quick Overview

PFRL (PFnet Reinforcement Learning) is a deep reinforcement learning library built on top of PyTorch. It provides a comprehensive set of algorithms and utilities for implementing and experimenting with reinforcement learning tasks. PFRL is designed to be modular, extensible, and easy to use for both researchers and practitioners.

Pros

Wide range of implemented algorithms, including DQN, A3C, PPO, and DDPG
Built on PyTorch, allowing for easy integration with other deep learning projects
Extensive documentation and examples for quick start and reference
Supports both discrete and continuous action spaces

Cons

Relatively newer library compared to some alternatives, potentially less battle-tested
Limited community support compared to more established libraries
May have a steeper learning curve for beginners in reinforcement learning
Some advanced features might require deeper understanding of the underlying concepts

Code Examples

Creating a DQN agent:

import pfrl
import torch
import gym

env = gym.make('CartPole-v0')
q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(
    env.observation_space.low.size, env.action_space.n, n_hidden_layers=2, n_hidden_channels=50)
optimizer = torch.optim.Adam(q_func.parameters())
explorer = pfrl.explorers.ConstantEpsilonGreedy(epsilon=0.3, random_action_func=env.action_space.sample)
replay_buffer = pfrl.replay_buffers.ReplayBuffer(capacity=10**6)

agent = pfrl.agents.DQN(
    q_func, optimizer, replay_buffer, gamma=0.99,
    explorer=explorer, replay_start_size=500,
    target_update_interval=100)

Training the agent:

pfrl.experiments.train_agent_with_evaluation(
    agent,
    env,
    steps=200000,
    eval_n_steps=None,
    eval_n_episodes=10,
    train_max_episode_len=200,
    eval_interval=10000,
    outdir='result')

Using a trained agent:

obs = env.reset()
done = False
while not done:
    action = agent.act(obs)
    obs, reward, done, _ = env.step(action)
    env.render()

Getting Started

To get started with PFRL, follow these steps:

Install PFRL:

pip install pfrl

Import necessary modules:

import pfrl
import gym
import torch

Create an environment and agent:

env = gym.make('CartPole-v0')
q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(
    env.observation_space.low.size, env.action_space.n)
optimizer = torch.optim.Adam(q_func.parameters())
agent = pfrl.agents.DQN(q_func, optimizer, replay_buffer=pfrl.replay_buffers.ReplayBuffer(10000))

Train the agent:

pfrl.experiments.train_agent_with_evaluation(agent, env, steps=10000, eval_n_episodes=5)

Competitor Comparisons

baselines

16,242

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Pros of Baselines

Widely adopted and well-established in the RL community
Extensive documentation and examples
Supports a broader range of RL algorithms

Cons of Baselines

Less actively maintained (last update in 2020)
More complex codebase, potentially harder for beginners
Lacks some modern RL techniques and optimizations

Code Comparison

PFRL example (PPO implementation):

def update(self, experiences, errors_out=None):
    states = self.batch_states([exp.state for exp in experiences], self.device, self.phi)
    actions = torch.tensor([exp.action for exp in experiences], device=self.device)
    rewards = torch.tensor([exp.reward for exp in experiences], device=self.device)
    next_states = self.batch_states([exp.next_state for exp in experiences], self.device, self.phi)

Baselines example (PPO implementation):

def update(obs, returns, masks, actions, values, neglogpacs, states=None):
    advs = returns - values
    advs = (advs - advs.mean()) / (advs.std() + 1e-8)
    td_map = {train_model.X:obs, A:actions, ADV:advs, R:returns, PG_LR:cur_lr}
    if states is not None:
        td_map[train_model.S] = states

Both repositories provide implementations of popular RL algorithms, but PFRL offers a more modern and actively maintained codebase with cleaner implementations. Baselines, while more established, has a wider range of algorithms but may be more challenging for newcomers to navigate.

stable-baselines

4,257

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Pros of stable-baselines

More extensive documentation and tutorials
Wider range of implemented algorithms
Larger community and more frequent updates

Cons of stable-baselines

Less focus on distributed training
Potentially slower execution due to TensorFlow backend

Code Comparison

PFRL example:

import pfrl
agent = pfrl.agents.PPO(
    policy, optimizer, obs_space, action_space,
    gpu=0, update_interval=2048, minibatch_size=64
)

stable-baselines example:

from stable_baselines3 import PPO
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

Both libraries offer similar functionality for implementing reinforcement learning algorithms. PFRL provides more flexibility in terms of customization and distributed training, while stable-baselines offers a more user-friendly API with extensive documentation. The choice between the two depends on specific project requirements and user preferences.

agents

2,901

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Pros of agents

Built on TensorFlow, offering seamless integration with the popular deep learning framework
Comprehensive suite of reinforcement learning algorithms and tools
Strong community support and regular updates from Google's TensorFlow team

Cons of agents

Steeper learning curve for those not familiar with TensorFlow
Can be more resource-intensive due to TensorFlow's overhead
Less flexibility for customization compared to PFRL's modular design

Code Comparison

agents:

import tensorflow as tf
from tf_agents.agents.dqn import dqn_agent
from tf_agents.networks import q_network
from tf_agents.environments import tf_py_environment

q_net = q_network.QNetwork(observation_spec, action_spec)
agent = dqn_agent.DqnAgent(time_step_spec, action_spec, q_network=q_net)

PFRL:

import pfrl
from pfrl import agents, explorers, replay_buffers

q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(obs_size, n_actions)
explorer = explorers.ConstantEpsilonGreedy(epsilon=0.3, random_action_func=env.action_space.sample)
agent = agents.DQN(q_func, optimizer, replay_buffer, gamma, explorer)

acme

3,661

A library of reinforcement learning components and agents

Pros of Acme

More comprehensive and flexible framework for RL research
Better support for distributed training and multi-agent scenarios
Stronger integration with TensorFlow and JAX ecosystems

Cons of Acme

Steeper learning curve due to higher complexity
Less focus on practical applications compared to PFRL
Potentially slower development cycle for simple RL tasks

Code Comparison

PFRL (PyTorch-based):

import torch
import pfrl

q_func = torch.nn.Sequential(
    torch.nn.Linear(obs_size, 64),
    torch.nn.ReLU(),
    torch.nn.Linear(64, n_actions)
)
optimizer = torch.optim.Adam(q_func.parameters())
explorer = pfrl.explorers.ConstantEpsilonGreedy(epsilon=0.1, random_action_func=env.action_space.sample)
agent = pfrl.agents.DQN(q_func, optimizer, replay_buffer, explorer, gamma=0.99)

Acme (TensorFlow-based):

import acme
from acme import specs
from acme.agents import dqn

environment_spec = specs.make_environment_spec(environment)
network = snt.Sequential([
    snt.Linear(64),
    tf.nn.relu,
    snt.Linear(environment_spec.actions.num_values)
])
agent = dqn.DQN(environment_spec, network)

Both frameworks offer similar functionality for implementing RL algorithms, but Acme provides a more modular and flexible approach, while PFRL focuses on simplicity and ease of use for PyTorch users.

softlearning

1,293

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Pros of softlearning

Focuses on soft actor-critic (SAC) algorithms, providing specialized implementations
Includes a variety of environment wrappers for popular RL benchmarks
Offers a modular design for easy customization of algorithm components

Cons of softlearning

Limited to SAC-based algorithms, less versatile than PFRL's broader range
Less actively maintained, with fewer recent updates compared to PFRL
Smaller community and fewer resources available for support

Code Comparison

softlearning:

from softlearning.algorithms.sac import SAC
from softlearning.environments.utils import get_environment

env = get_environment('gym', 'HalfCheetah-v2')
algorithm = SAC(env, Q_lr=3e-4, policy_lr=3e-4, alpha_lr=3e-4)
algorithm.train()

PFRL:

import gym
from pfrl.agents import SAC
from pfrl import experiments, replay_buffers, utils

env = gym.make('HalfCheetah-v2')
obs_space = env.observation_space
action_space = env.action_space
agent = SAC(obs_space, action_space)
experiments.train_agent_with_evaluation(agent, env)

Both libraries offer implementations of SAC, but PFRL provides a more comprehensive set of algorithms and tools for reinforcement learning research and development.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

PFRL

PFRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using PyTorch.

Boxing Humanoid Grasping Atlas SlimeVolley

Installation

PFRL is tested with Python 3.7.7. For other requirements, see requirements.txt.

PFRL can be installed via PyPI:

pip install pfrl

It can also be installed from the source code:

python setup.py install

Refer to Installation for more information on installation.

Getting started

You can try PFRL Quickstart Guide first, or check the examples ready for Atari 2600 and Open AI Gym.

For more information, you can refer to PFRL's documentation.

Blog Posts

Algorithms

Algorithm	Discrete Action	Continuous Action	Recurrent Model	Batch Training	CPU Async Training	Pretrained models^*
DQN (including DoubleDQN etc.)	â	â (NAF)	â	â	x	â
Categorical DQN	â	x	â	â	x	x
Rainbow	â	x	â	â	x	â
IQN	â	x	â	â	x	â
DDPG	x	â	x	â	x	â
A3C	â	â	â	â (A2C)	â	â
ACER	â	â	â	x	â	x
PPO	â	â	â	â	x	â
TRPO	â	â	â	â	x	â
TD3	x	â	x	â	x	â
SAC	x	â	x	â	x	â

^*Note on Pretrained models: PFRL provides pretrained models (sometimes called a 'model zoo') for our reproducibility scripts on Atari environments (DQN, IQN, Rainbow, and A3C) and Mujoco environments (DDPG, TRPO, PPO, TD3, SAC), for each benchmarked environment.

Following algorithms have been implemented in PFRL:

A2C (Synchronous variant of A3C)
- examples: [atari (batched)]
A3C (Asynchronous Advantage Actor-Critic)
- examples: [atari reproduction] [atari]
ACER (Actor-Critic with Experience Replay)
- examples: [atari]
Categorical DQN
- examples: [atari] [general gym]
DQN (Deep Q-Network) (including Double DQN, Persistent Advantage Learning (PAL), Double PAL, Dynamic Policy Programming (DPP))
- examples: [atari reproduction] [atari] [atari (batched)] [flickering atari] [general gym]
DDPG (Deep Deterministic Policy Gradients) (including SVG(0))
- examples: [mujoco reproduction]
IQN (Implicit Quantile Networks)
- examples: [atari reproduction]
PPO (Proximal Policy Optimization)
- examples: [mujoco reproduction] [atari]
Rainbow
- examples: [atari reproduction] [Slime volleyball]
REINFORCE
- examples: [general gym]
SAC (Soft Actor-Critic)
- examples: [mujoco reproduction] [Atlas walk]
TRPO (Trust Region Policy Optimization) with GAE (Generalized Advantage Estimation)
- examples: [mujoco reproduction]
TD3 (Twin Delayed Deep Deterministic policy gradient algorithm)
- examples: [mujoco reproduction]

Following useful techniques have been also implemented in PFRL:

NoisyNet
- examples: [Rainbow] [DQN/DoubleDQN/PAL]
Prioritized Experience Replay
- examples: [Rainbow] [DQN/DoubleDQN/PAL]
Dueling Network
- examples: [Rainbow] [DQN/DoubleDQN/PAL]
Normalized Advantage Function
- examples: [DQN] (for continuous-action envs only)
Deep Recurrent Q-Network
- examples: [DQN]

Environments

Environments that support the subset of OpenAI Gym's interface (reset and step methods) can be used.

Contributing

Any kind of contribution to PFRL would be highly appreciated! If you are interested in contributing to PFRL, please read CONTRIBUTING.md.

License

MIT License.

Citations

To cite PFRL in publications, please cite our paper on ChainerRL, the library on which PFRL is based:

@article{JMLR:v22:20-376,
  author  = {Yasuhiro Fujita and Prabhat Nagarajan and Toshiki Kataoka and Takahiro Ishikawa},
  title   = {ChainerRL: A Deep Reinforcement Learning Library},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {77},
  pages   = {1-14},
  url     = {http://jmlr.org/papers/v22/20-376.html}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot