Top Related Projects
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
A library of reinforcement learning components and agents
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.
Quick Overview
PFRL (PFnet Reinforcement Learning) is a deep reinforcement learning library built on top of PyTorch. It provides a comprehensive set of algorithms and utilities for implementing and experimenting with reinforcement learning tasks. PFRL is designed to be modular, extensible, and easy to use for both researchers and practitioners.
Pros
- Wide range of implemented algorithms, including DQN, A3C, PPO, and DDPG
- Built on PyTorch, allowing for easy integration with other deep learning projects
- Extensive documentation and examples for quick start and reference
- Supports both discrete and continuous action spaces
Cons
- Relatively newer library compared to some alternatives, potentially less battle-tested
- Limited community support compared to more established libraries
- May have a steeper learning curve for beginners in reinforcement learning
- Some advanced features might require deeper understanding of the underlying concepts
Code Examples
- Creating a DQN agent:
import pfrl
import torch
import gym
env = gym.make('CartPole-v0')
q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(
env.observation_space.low.size, env.action_space.n, n_hidden_layers=2, n_hidden_channels=50)
optimizer = torch.optim.Adam(q_func.parameters())
explorer = pfrl.explorers.ConstantEpsilonGreedy(epsilon=0.3, random_action_func=env.action_space.sample)
replay_buffer = pfrl.replay_buffers.ReplayBuffer(capacity=10**6)
agent = pfrl.agents.DQN(
q_func, optimizer, replay_buffer, gamma=0.99,
explorer=explorer, replay_start_size=500,
target_update_interval=100)
- Training the agent:
pfrl.experiments.train_agent_with_evaluation(
agent,
env,
steps=200000,
eval_n_steps=None,
eval_n_episodes=10,
train_max_episode_len=200,
eval_interval=10000,
outdir='result')
- Using a trained agent:
obs = env.reset()
done = False
while not done:
action = agent.act(obs)
obs, reward, done, _ = env.step(action)
env.render()
Getting Started
To get started with PFRL, follow these steps:
- Install PFRL:
pip install pfrl
- Import necessary modules:
import pfrl
import gym
import torch
- Create an environment and agent:
env = gym.make('CartPole-v0')
q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(
env.observation_space.low.size, env.action_space.n)
optimizer = torch.optim.Adam(q_func.parameters())
agent = pfrl.agents.DQN(q_func, optimizer, replay_buffer=pfrl.replay_buffers.ReplayBuffer(10000))
- Train the agent:
pfrl.experiments.train_agent_with_evaluation(agent, env, steps=10000, eval_n_episodes=5)
Competitor Comparisons
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
Pros of Baselines
- Widely adopted and well-established in the RL community
- Extensive documentation and examples
- Supports a broader range of RL algorithms
Cons of Baselines
- Less actively maintained (last update in 2020)
- More complex codebase, potentially harder for beginners
- Lacks some modern RL techniques and optimizations
Code Comparison
PFRL example (PPO implementation):
def update(self, experiences, errors_out=None):
states = self.batch_states([exp.state for exp in experiences], self.device, self.phi)
actions = torch.tensor([exp.action for exp in experiences], device=self.device)
rewards = torch.tensor([exp.reward for exp in experiences], device=self.device)
next_states = self.batch_states([exp.next_state for exp in experiences], self.device, self.phi)
Baselines example (PPO implementation):
def update(obs, returns, masks, actions, values, neglogpacs, states=None):
advs = returns - values
advs = (advs - advs.mean()) / (advs.std() + 1e-8)
td_map = {train_model.X:obs, A:actions, ADV:advs, R:returns, PG_LR:cur_lr}
if states is not None:
td_map[train_model.S] = states
Both repositories provide implementations of popular RL algorithms, but PFRL offers a more modern and actively maintained codebase with cleaner implementations. Baselines, while more established, has a wider range of algorithms but may be more challenging for newcomers to navigate.
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
Pros of stable-baselines
- More extensive documentation and tutorials
- Wider range of implemented algorithms
- Larger community and more frequent updates
Cons of stable-baselines
- Less focus on distributed training
- Potentially slower execution due to TensorFlow backend
Code Comparison
PFRL example:
import pfrl
agent = pfrl.agents.PPO(
policy, optimizer, obs_space, action_space,
gpu=0, update_interval=2048, minibatch_size=64
)
stable-baselines example:
from stable_baselines3 import PPO
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)
Both libraries offer similar functionality for implementing reinforcement learning algorithms. PFRL provides more flexibility in terms of customization and distributed training, while stable-baselines offers a more user-friendly API with extensive documentation. The choice between the two depends on specific project requirements and user preferences.
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Pros of agents
- Built on TensorFlow, offering seamless integration with the popular deep learning framework
- Comprehensive suite of reinforcement learning algorithms and tools
- Strong community support and regular updates from Google's TensorFlow team
Cons of agents
- Steeper learning curve for those not familiar with TensorFlow
- Can be more resource-intensive due to TensorFlow's overhead
- Less flexibility for customization compared to PFRL's modular design
Code Comparison
agents:
import tensorflow as tf
from tf_agents.agents.dqn import dqn_agent
from tf_agents.networks import q_network
from tf_agents.environments import tf_py_environment
q_net = q_network.QNetwork(observation_spec, action_spec)
agent = dqn_agent.DqnAgent(time_step_spec, action_spec, q_network=q_net)
PFRL:
import pfrl
from pfrl import agents, explorers, replay_buffers
q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(obs_size, n_actions)
explorer = explorers.ConstantEpsilonGreedy(epsilon=0.3, random_action_func=env.action_space.sample)
agent = agents.DQN(q_func, optimizer, replay_buffer, gamma, explorer)
A library of reinforcement learning components and agents
Pros of Acme
- More comprehensive and flexible framework for RL research
- Better support for distributed training and multi-agent scenarios
- Stronger integration with TensorFlow and JAX ecosystems
Cons of Acme
- Steeper learning curve due to higher complexity
- Less focus on practical applications compared to PFRL
- Potentially slower development cycle for simple RL tasks
Code Comparison
PFRL (PyTorch-based):
import torch
import pfrl
q_func = torch.nn.Sequential(
torch.nn.Linear(obs_size, 64),
torch.nn.ReLU(),
torch.nn.Linear(64, n_actions)
)
optimizer = torch.optim.Adam(q_func.parameters())
explorer = pfrl.explorers.ConstantEpsilonGreedy(epsilon=0.1, random_action_func=env.action_space.sample)
agent = pfrl.agents.DQN(q_func, optimizer, replay_buffer, explorer, gamma=0.99)
Acme (TensorFlow-based):
import acme
from acme import specs
from acme.agents import dqn
environment_spec = specs.make_environment_spec(environment)
network = snt.Sequential([
snt.Linear(64),
tf.nn.relu,
snt.Linear(environment_spec.actions.num_values)
])
agent = dqn.DQN(environment_spec, network)
Both frameworks offer similar functionality for implementing RL algorithms, but Acme provides a more modular and flexible approach, while PFRL focuses on simplicity and ease of use for PyTorch users.
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.
Pros of softlearning
- Focuses on soft actor-critic (SAC) algorithms, providing specialized implementations
- Includes a variety of environment wrappers for popular RL benchmarks
- Offers a modular design for easy customization of algorithm components
Cons of softlearning
- Limited to SAC-based algorithms, less versatile than PFRL's broader range
- Less actively maintained, with fewer recent updates compared to PFRL
- Smaller community and fewer resources available for support
Code Comparison
softlearning:
from softlearning.algorithms.sac import SAC
from softlearning.environments.utils import get_environment
env = get_environment('gym', 'HalfCheetah-v2')
algorithm = SAC(env, Q_lr=3e-4, policy_lr=3e-4, alpha_lr=3e-4)
algorithm.train()
PFRL:
import gym
from pfrl.agents import SAC
from pfrl import experiments, replay_buffers, utils
env = gym.make('HalfCheetah-v2')
obs_space = env.observation_space
action_space = env.action_space
agent = SAC(obs_space, action_space)
experiments.train_agent_with_evaluation(agent, env)
Both libraries offer implementations of SAC, but PFRL provides a more comprehensive set of algorithms and tools for reinforcement learning research and development.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
PFRL
PFRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using PyTorch.
Installation
PFRL is tested with Python 3.7.7. For other requirements, see requirements.txt.
PFRL can be installed via PyPI:
pip install pfrl
It can also be installed from the source code:
python setup.py install
Refer to Installation for more information on installation.
Getting started
You can try PFRL Quickstart Guide first, or check the examples ready for Atari 2600 and Open AI Gym.
For more information, you can refer to PFRL's documentation.
Blog Posts
Algorithms
Algorithm | Discrete Action | Continuous Action | Recurrent Model | Batch Training | CPU Async Training | Pretrained models* |
---|---|---|---|---|---|---|
DQN (including DoubleDQN etc.) | â | â (NAF) | â | â | x | â |
Categorical DQN | â | x | â | â | x | x |
Rainbow | â | x | â | â | x | â |
IQN | â | x | â | â | x | â |
DDPG | x | â | x | â | x | â |
A3C | â | â | â | â (A2C) | â | â |
ACER | â | â | â | x | â | x |
PPO | â | â | â | â | x | â |
TRPO | â | â | â | â | x | â |
TD3 | x | â | x | â | x | â |
SAC | x | â | x | â | x | â |
*Note on Pretrained models: PFRL provides pretrained models (sometimes called a 'model zoo') for our reproducibility scripts on Atari environments (DQN, IQN, Rainbow, and A3C) and Mujoco environments (DDPG, TRPO, PPO, TD3, SAC), for each benchmarked environment.
Following algorithms have been implemented in PFRL:
- A2C (Synchronous variant of A3C)
- examples: [atari (batched)]
- A3C (Asynchronous Advantage Actor-Critic)
- examples: [atari reproduction] [atari]
- ACER (Actor-Critic with Experience Replay)
- examples: [atari]
- Categorical DQN
- examples: [atari] [general gym]
- DQN (Deep Q-Network) (including Double DQN, Persistent Advantage Learning (PAL), Double PAL, Dynamic Policy Programming (DPP))
- DDPG (Deep Deterministic Policy Gradients) (including SVG(0))
- examples: [mujoco reproduction]
- IQN (Implicit Quantile Networks)
- examples: [atari reproduction]
- PPO (Proximal Policy Optimization)
- examples: [mujoco reproduction] [atari]
- Rainbow
- examples: [atari reproduction] [Slime volleyball]
- REINFORCE
- examples: [general gym]
- SAC (Soft Actor-Critic)
- examples: [mujoco reproduction] [Atlas walk]
- TRPO (Trust Region Policy Optimization) with GAE (Generalized Advantage Estimation)
- examples: [mujoco reproduction]
- TD3 (Twin Delayed Deep Deterministic policy gradient algorithm)
- examples: [mujoco reproduction]
Following useful techniques have been also implemented in PFRL:
- NoisyNet
- examples: [Rainbow] [DQN/DoubleDQN/PAL]
- Prioritized Experience Replay
- examples: [Rainbow] [DQN/DoubleDQN/PAL]
- Dueling Network
- examples: [Rainbow] [DQN/DoubleDQN/PAL]
- Normalized Advantage Function
- examples: [DQN] (for continuous-action envs only)
- Deep Recurrent Q-Network
- examples: [DQN]
Environments
Environments that support the subset of OpenAI Gym's interface (reset
and step
methods) can be used.
Contributing
Any kind of contribution to PFRL would be highly appreciated! If you are interested in contributing to PFRL, please read CONTRIBUTING.md.
License
Citations
To cite PFRL in publications, please cite our paper on ChainerRL, the library on which PFRL is based:
@article{JMLR:v22:20-376,
author = {Yasuhiro Fujita and Prabhat Nagarajan and Toshiki Kataoka and Takahiro Ishikawa},
title = {ChainerRL: A Deep Reinforcement Learning Library},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {77},
pages = {1-14},
url = {http://jmlr.org/papers/v22/20-376.html}
}
Top Related Projects
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
A library of reinforcement learning components and agents
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot