Convert Figma logo to code with AI

pfnet logopfrl

PFRL: a PyTorch-based deep reinforcement learning library

1,186
157
1,186
35

Top Related Projects

15,725

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

2,788

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

3,485

A library of reinforcement learning components and agents

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Quick Overview

PFRL (PFnet Reinforcement Learning) is a deep reinforcement learning library built on top of PyTorch. It provides a comprehensive set of algorithms and utilities for implementing and experimenting with reinforcement learning tasks. PFRL is designed to be modular, extensible, and easy to use for both researchers and practitioners.

Pros

  • Wide range of implemented algorithms, including DQN, A3C, PPO, and DDPG
  • Built on PyTorch, allowing for easy integration with other deep learning projects
  • Extensive documentation and examples for quick start and reference
  • Supports both discrete and continuous action spaces

Cons

  • Relatively newer library compared to some alternatives, potentially less battle-tested
  • Limited community support compared to more established libraries
  • May have a steeper learning curve for beginners in reinforcement learning
  • Some advanced features might require deeper understanding of the underlying concepts

Code Examples

  1. Creating a DQN agent:
import pfrl
import torch
import gym

env = gym.make('CartPole-v0')
q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(
    env.observation_space.low.size, env.action_space.n, n_hidden_layers=2, n_hidden_channels=50)
optimizer = torch.optim.Adam(q_func.parameters())
explorer = pfrl.explorers.ConstantEpsilonGreedy(epsilon=0.3, random_action_func=env.action_space.sample)
replay_buffer = pfrl.replay_buffers.ReplayBuffer(capacity=10**6)

agent = pfrl.agents.DQN(
    q_func, optimizer, replay_buffer, gamma=0.99,
    explorer=explorer, replay_start_size=500,
    target_update_interval=100)
  1. Training the agent:
pfrl.experiments.train_agent_with_evaluation(
    agent,
    env,
    steps=200000,
    eval_n_steps=None,
    eval_n_episodes=10,
    train_max_episode_len=200,
    eval_interval=10000,
    outdir='result')
  1. Using a trained agent:
obs = env.reset()
done = False
while not done:
    action = agent.act(obs)
    obs, reward, done, _ = env.step(action)
    env.render()

Getting Started

To get started with PFRL, follow these steps:

  1. Install PFRL:
pip install pfrl
  1. Import necessary modules:
import pfrl
import gym
import torch
  1. Create an environment and agent:
env = gym.make('CartPole-v0')
q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(
    env.observation_space.low.size, env.action_space.n)
optimizer = torch.optim.Adam(q_func.parameters())
agent = pfrl.agents.DQN(q_func, optimizer, replay_buffer=pfrl.replay_buffers.ReplayBuffer(10000))
  1. Train the agent:
pfrl.experiments.train_agent_with_evaluation(agent, env, steps=10000, eval_n_episodes=5)

Competitor Comparisons

15,725

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Pros of Baselines

  • Widely adopted and well-established in the RL community
  • Extensive documentation and examples
  • Supports a broader range of RL algorithms

Cons of Baselines

  • Less actively maintained (last update in 2020)
  • More complex codebase, potentially harder for beginners
  • Lacks some modern RL techniques and optimizations

Code Comparison

PFRL example (PPO implementation):

def update(self, experiences, errors_out=None):
    states = self.batch_states([exp.state for exp in experiences], self.device, self.phi)
    actions = torch.tensor([exp.action for exp in experiences], device=self.device)
    rewards = torch.tensor([exp.reward for exp in experiences], device=self.device)
    next_states = self.batch_states([exp.next_state for exp in experiences], self.device, self.phi)

Baselines example (PPO implementation):

def update(obs, returns, masks, actions, values, neglogpacs, states=None):
    advs = returns - values
    advs = (advs - advs.mean()) / (advs.std() + 1e-8)
    td_map = {train_model.X:obs, A:actions, ADV:advs, R:returns, PG_LR:cur_lr}
    if states is not None:
        td_map[train_model.S] = states

Both repositories provide implementations of popular RL algorithms, but PFRL offers a more modern and actively maintained codebase with cleaner implementations. Baselines, while more established, has a wider range of algorithms but may be more challenging for newcomers to navigate.

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Pros of stable-baselines

  • More extensive documentation and tutorials
  • Wider range of implemented algorithms
  • Larger community and more frequent updates

Cons of stable-baselines

  • Less focus on distributed training
  • Potentially slower execution due to TensorFlow backend

Code Comparison

PFRL example:

import pfrl
agent = pfrl.agents.PPO(
    policy, optimizer, obs_space, action_space,
    gpu=0, update_interval=2048, minibatch_size=64
)

stable-baselines example:

from stable_baselines3 import PPO
model = PPO("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

Both libraries offer similar functionality for implementing reinforcement learning algorithms. PFRL provides more flexibility in terms of customization and distributed training, while stable-baselines offers a more user-friendly API with extensive documentation. The choice between the two depends on specific project requirements and user preferences.

2,788

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Pros of agents

  • Built on TensorFlow, offering seamless integration with the popular deep learning framework
  • Comprehensive suite of reinforcement learning algorithms and tools
  • Strong community support and regular updates from Google's TensorFlow team

Cons of agents

  • Steeper learning curve for those not familiar with TensorFlow
  • Can be more resource-intensive due to TensorFlow's overhead
  • Less flexibility for customization compared to PFRL's modular design

Code Comparison

agents:

import tensorflow as tf
from tf_agents.agents.dqn import dqn_agent
from tf_agents.networks import q_network
from tf_agents.environments import tf_py_environment

q_net = q_network.QNetwork(observation_spec, action_spec)
agent = dqn_agent.DqnAgent(time_step_spec, action_spec, q_network=q_net)

PFRL:

import pfrl
from pfrl import agents, explorers, replay_buffers

q_func = pfrl.q_functions.FCStateQFunctionWithDiscreteAction(obs_size, n_actions)
explorer = explorers.ConstantEpsilonGreedy(epsilon=0.3, random_action_func=env.action_space.sample)
agent = agents.DQN(q_func, optimizer, replay_buffer, gamma, explorer)
3,485

A library of reinforcement learning components and agents

Pros of Acme

  • More comprehensive and flexible framework for RL research
  • Better support for distributed training and multi-agent scenarios
  • Stronger integration with TensorFlow and JAX ecosystems

Cons of Acme

  • Steeper learning curve due to higher complexity
  • Less focus on practical applications compared to PFRL
  • Potentially slower development cycle for simple RL tasks

Code Comparison

PFRL (PyTorch-based):

import torch
import pfrl

q_func = torch.nn.Sequential(
    torch.nn.Linear(obs_size, 64),
    torch.nn.ReLU(),
    torch.nn.Linear(64, n_actions)
)
optimizer = torch.optim.Adam(q_func.parameters())
explorer = pfrl.explorers.ConstantEpsilonGreedy(epsilon=0.1, random_action_func=env.action_space.sample)
agent = pfrl.agents.DQN(q_func, optimizer, replay_buffer, explorer, gamma=0.99)

Acme (TensorFlow-based):

import acme
from acme import specs
from acme.agents import dqn

environment_spec = specs.make_environment_spec(environment)
network = snt.Sequential([
    snt.Linear(64),
    tf.nn.relu,
    snt.Linear(environment_spec.actions.num_values)
])
agent = dqn.DQN(environment_spec, network)

Both frameworks offer similar functionality for implementing RL algorithms, but Acme provides a more modular and flexible approach, while PFRL focuses on simplicity and ease of use for PyTorch users.

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Pros of softlearning

  • Focuses on soft actor-critic (SAC) algorithms, providing specialized implementations
  • Includes a variety of environment wrappers for popular RL benchmarks
  • Offers a modular design for easy customization of algorithm components

Cons of softlearning

  • Limited to SAC-based algorithms, less versatile than PFRL's broader range
  • Less actively maintained, with fewer recent updates compared to PFRL
  • Smaller community and fewer resources available for support

Code Comparison

softlearning:

from softlearning.algorithms.sac import SAC
from softlearning.environments.utils import get_environment

env = get_environment('gym', 'HalfCheetah-v2')
algorithm = SAC(env, Q_lr=3e-4, policy_lr=3e-4, alpha_lr=3e-4)
algorithm.train()

PFRL:

import gym
from pfrl.agents import SAC
from pfrl import experiments, replay_buffers, utils

env = gym.make('HalfCheetah-v2')
obs_space = env.observation_space
action_space = env.action_space
agent = SAC(obs_space, action_space)
experiments.train_agent_with_evaluation(agent, env)

Both libraries offer implementations of SAC, but PFRL provides a more comprehensive set of algorithms and tools for reinforcement learning research and development.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

PFRL

Documentation Status PyPI

PFRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using PyTorch.

Boxing Humanoid Grasping Atlas SlimeVolley

Installation

PFRL is tested with Python 3.7.7. For other requirements, see requirements.txt.

PFRL can be installed via PyPI:

pip install pfrl

It can also be installed from the source code:

python setup.py install

Refer to Installation for more information on installation.

Getting started

You can try PFRL Quickstart Guide first, or check the examples ready for Atari 2600 and Open AI Gym.

For more information, you can refer to PFRL's documentation.

Blog Posts

Algorithms

AlgorithmDiscrete ActionContinuous ActionRecurrent ModelBatch TrainingCPU Async TrainingPretrained models*
DQN (including DoubleDQN etc.)✓✓ (NAF)✓✓x✓
Categorical DQN✓x✓✓xx
Rainbow✓x✓✓x✓
IQN✓x✓✓x✓
DDPGx✓x✓x✓
A3C✓✓✓✓ (A2C)✓✓
ACER✓✓✓x✓x
PPO✓✓✓✓x✓
TRPO✓✓✓✓x✓
TD3x✓x✓x✓
SACx✓x✓x✓

*Note on Pretrained models: PFRL provides pretrained models (sometimes called a 'model zoo') for our reproducibility scripts on Atari environments (DQN, IQN, Rainbow, and A3C) and Mujoco environments (DDPG, TRPO, PPO, TD3, SAC), for each benchmarked environment.

Following algorithms have been implemented in PFRL:

Following useful techniques have been also implemented in PFRL:

Environments

Environments that support the subset of OpenAI Gym's interface (reset and step methods) can be used.

Contributing

Any kind of contribution to PFRL would be highly appreciated! If you are interested in contributing to PFRL, please read CONTRIBUTING.md.

License

MIT License.

Citations

To cite PFRL in publications, please cite our paper on ChainerRL, the library on which PFRL is based:

@article{JMLR:v22:20-376,
  author  = {Yasuhiro Fujita and Prabhat Nagarajan and Toshiki Kataoka and Takahiro Ishikawa},
  title   = {ChainerRL: A Deep Reinforcement Learning Library},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {77},
  pages   = {1-14},
  url     = {http://jmlr.org/papers/v22/20-376.html}
}