spinningup
An educational resource to help anyone learn deep reinforcement learning.
Top Related Projects
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Collection of reinforcement learning algorithms
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Quick Overview
The OpenAI Spinning Up project is a deep reinforcement learning (RL) educational resource that provides high-quality implementations of RL algorithms, as well as tutorials and documentation to help users understand the foundations and applications of RL.
Pros
- Comprehensive Implementations: The project provides well-documented and tested implementations of several state-of-the-art RL algorithms, including PPO, TRPO, and SAC, among others.
- Educational Resources: The project includes detailed tutorials, documentation, and reference materials to help users understand the fundamentals of RL and how to apply it to real-world problems.
- Active Development: The project is actively maintained and updated by the OpenAI team, ensuring that the content remains relevant and up-to-date.
- Community Support: The project has a large and active community of users and contributors, providing a wealth of support and resources for those new to RL.
Cons
- Limited Scope: The project primarily focuses on classic RL algorithms and may not cover more recent advancements in the field, such as deep RL for complex environments or multi-agent systems.
- Steep Learning Curve: Reinforcement learning can be a challenging and complex topic, and the project may not be the best starting point for complete beginners to the field.
- Dependency on External Libraries: The project relies on several external libraries, such as TensorFlow and PyTorch, which may require additional setup and configuration.
- Limited Platform Support: The project is primarily focused on Python and may not provide support for other programming languages or platforms.
Code Examples
Here are a few code examples from the Spinning Up project:
- Proximal Policy Optimization (PPO) Implementation:
import spinup.algos.pytorch.ppo.core as core
from spinup.utils.mpi_pytorch import setup_pytorch_for_mpi, sync_params, mpi_avg_grads
from spinup.utils.logx import EpochLogger
from spinup.utils.mpi_tools import mpi_fork, mpi_avg, proc_id, mpi_statistics_scalar, num_procs
def ppo(env_fn, actor_critic=core.MLPActorCritic, ac_kwargs=dict(), seed=0,
steps_per_epoch=4000, epochs=50, gamma=0.99, clip_ratio=0.2, pi_lr=3e-4,
vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
logger_kwargs=dict(), save_freq=10):
"""
Proximal Policy Optimization (by clipping),
with early stopping based on approximate KL
"""
# ...
- Soft Actor-Critic (SAC) Implementation:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from spinup.utils.logx import EpochLogger
from spinup.utils.mpi_pytorch import setup_pytorch_for_mpi, sync_params, mpi_avg_grads
from spinup.utils.mpi_tools import mpi_fork, mpi_avg, proc_id, mpi_statistics_scalar, num_procs
def sac(env_fn, actor_critic=core.MLPActorCritic, ac_kwargs=dict(), seed=0,
steps_per_epoch=4000, epochs=100, replay_size=int(1e6), gamma=0.99,
polyak=0.995, pi_lr=3e-4, q_lr=3e-4, alpha_lr=3e-4, batch_size=100, start_steps=10000,
update_after=1000, update_every=50, num_test_episodes=10, max_ep_len=1000,
logger_kwargs=dict(), save_freq=1):
"""
Soft Actor-Critic (SAC)
"""
# ...
- Vanilla Policy Gradient Implementation:
import numpy as np
import torch
import torch.nn as nn
import
Competitor Comparisons
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
Pros of Agents
- More comprehensive library with a wider range of algorithms and features
- Better integration with TensorFlow ecosystem and tools
- More active development and community support
Cons of Agents
- Steeper learning curve due to complexity and extensive features
- Potentially slower execution compared to SpinningUp's focused implementation
- Less beginner-friendly documentation and tutorials
Code Comparison
SpinningUp:
import spinup
spinup.ppo(env_fn=lambda : gym.make('CartPole-v1'),
ac_kwargs=dict(hidden_sizes=[32,32]))
Agents:
from tf_agents.agents.ppo import ppo_agent
agent = ppo_agent.PPOAgent(
time_step_spec,
action_spec,
optimizer=tf.compat.v1.train.AdamOptimizer(learning_rate=1e-3),
actor_net=actor_net,
value_net=value_net)
Summary
Agents offers a more comprehensive toolkit for reinforcement learning with better TensorFlow integration, while SpinningUp provides a simpler, more focused approach for beginners. The choice between them depends on the user's experience level and specific project requirements.
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
Pros of Stable-baselines
- More comprehensive library with a wider range of algorithms
- Better documentation and more active community support
- Easier to use for practical applications and production environments
Cons of Stable-baselines
- Less focused on educational aspects compared to Spinning Up
- May be more complex for beginners to understand the underlying concepts
- Requires more setup and dependencies
Code Comparison
Spinning Up (PPO implementation):
def ppo(env_fn, actor_critic=core.mlp_actor_critic, ac_kwargs=dict(), seed=0,
steps_per_epoch=4000, epochs=50, gamma=0.99, clip_ratio=0.2, pi_lr=3e-4,
vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
target_kl=0.01, logger_kwargs=dict(), save_freq=10):
# ... (implementation details)
Stable-baselines (PPO usage):
from stable_baselines3 import PPO
model = PPO("MlpPolicy", "CartPole-v1", verbose=1)
model.learn(total_timesteps=10000)
obs = env.reset()
for i in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, done, info = env.step(action)
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Pros of stable-baselines3
- More comprehensive and actively maintained library with a wider range of algorithms
- Better documentation and user-friendly API for easier implementation
- Supports both PyTorch and TensorFlow backends
Cons of stable-baselines3
- Steeper learning curve for beginners due to its extensive features
- Less focus on educational aspects compared to SpinningUp's tutorial-style approach
Code Comparison
SpinningUp:
import spinup
spinup.ppo(env_fn=lambda : gym.make('CartPole-v1'),
ac_kwargs=dict(hidden_sizes=[64,64]), steps_per_epoch=5000,
epochs=10)
stable-baselines3:
from stable_baselines3 import PPO
model = PPO("MlpPolicy", "CartPole-v1", verbose=1)
model.learn(total_timesteps=50000)
Both repositories provide implementations of reinforcement learning algorithms, but stable-baselines3 offers a more comprehensive and production-ready toolkit, while SpinningUp focuses on educational aspects and simplicity for beginners. The code comparison shows that stable-baselines3 has a more concise API for training models, while SpinningUp provides more explicit configuration options.
Collection of reinforcement learning algorithms
Pros of rlkit
- More extensive algorithm implementations, including off-policy and model-based RL
- Greater flexibility and customization options for experiments
- Active development with frequent updates and contributions
Cons of rlkit
- Steeper learning curve due to more complex codebase
- Less comprehensive documentation compared to Spinning Up
- Requires more setup and configuration for running experiments
Code Comparison
rlkit example:
from rlkit.torch.sac.sac import SACTrainer
from rlkit.torch.networks import FlattenMlp
from rlkit.launchers.launcher_util import setup_logger
variant = dict(
algorithm="SAC",
version="normal",
layer_size=256,
replay_buffer_size=int(1E6),
algorithm_kwargs=dict(
num_epochs=3000,
num_eval_steps_per_epoch=5000,
num_trains_per_train_loop=1000,
num_expl_steps_per_train_loop=1000,
min_num_steps_before_training=1000,
max_path_length=1000,
batch_size=256,
),
trainer_kwargs=dict(
discount=0.99,
soft_target_tau=5e-3,
target_update_period=1,
policy_lr=3E-4,
qf_lr=3E-4,
reward_scale=1,
use_automatic_entropy_tuning=True,
),
)
Spinning Up example:
from spinup import sac_pytorch
import gym
env_fn = lambda : gym.make('HalfCheetah-v2')
ac_kwargs = dict(hidden_sizes=[256,256])
sac_pytorch(env_fn=env_fn, ac_kwargs=ac_kwargs, steps_per_epoch=4000, epochs=100,
replay_size=int(1e6), gamma=0.99, polyak=0.995, lr=1e-3, alpha=0.2,
batch_size=100, start_steps=10000, update_after=1000, update_every=50,
num_test_episodes=10, max_ep_len=1000, logger_kwargs=dict(output_dir='sac_pytorch_data'),
use_cuda=True)
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Pros of Dopamine
- More comprehensive library with support for various RL algorithms
- Better integration with TensorFlow and other Google ML tools
- Extensive documentation and examples for different use cases
Cons of Dopamine
- Steeper learning curve for beginners compared to SpinningUp
- Less focus on educational aspects and more on research/production use
- May be overkill for simple RL projects or learning purposes
Code Comparison
SpinningUp (simple PPO implementation):
def ppo(env_fn, actor_critic=core.mlp_actor_critic, ac_kwargs=dict(), seed=0,
steps_per_epoch=4000, epochs=50, gamma=0.99, clip_ratio=0.2, pi_lr=3e-4,
vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
target_kl=0.01, logger_kwargs=dict(), save_freq=10):
# ... (implementation details)
Dopamine (DQN agent initialization):
agent = dqn_agent.DQNAgent(
sess,
num_actions=environment.action_space.n,
observation_shape=environment.observation_space.shape,
observation_dtype=environment.observation_space.dtype,
stack_size=config.stack_size,
network=atari_lib.NatureDQNNetwork
)
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Status: Maintenance (expect bug fixes and minor updates)
Welcome to Spinning Up in Deep RL!
This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).
For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learning.
This module contains a variety of helpful resources, including:
- a short introduction to RL terminology, kinds of algorithms, and basic theory,
- an essay about how to grow into an RL research role,
- a curated list of important papers organized by topic,
- a well-documented code repo of short, standalone implementations of key algorithms,
- and a few exercises to serve as warm-ups.
Get started at spinningup.openai.com!
Citing Spinning Up
If you reference or use Spinning Up in your research, please cite:
@article{SpinningUp2018,
author = {Achiam, Joshua},
title = {{Spinning Up in Deep Reinforcement Learning}},
year = {2018}
}
Top Related Projects
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
Collection of reinforcement learning algorithms
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot