spinningup

An educational resource to help anyone learn deep reinforcement learning.

10,709

2,299

10,709

244

View on GitHub

Top Related Projects

agents

2,901

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

stable-baselines

4,257

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

stable-baselines3

10,506

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

rlkit

2,668

Collection of reinforcement learning algorithms

dopamine

10,720

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Quick Overview

The OpenAI Spinning Up project is a deep reinforcement learning (RL) educational resource that provides high-quality implementations of RL algorithms, as well as tutorials and documentation to help users understand the foundations and applications of RL.

Pros

Comprehensive Implementations: The project provides well-documented and tested implementations of several state-of-the-art RL algorithms, including PPO, TRPO, and SAC, among others.
Educational Resources: The project includes detailed tutorials, documentation, and reference materials to help users understand the fundamentals of RL and how to apply it to real-world problems.
Active Development: The project is actively maintained and updated by the OpenAI team, ensuring that the content remains relevant and up-to-date.
Community Support: The project has a large and active community of users and contributors, providing a wealth of support and resources for those new to RL.

Cons

Limited Scope: The project primarily focuses on classic RL algorithms and may not cover more recent advancements in the field, such as deep RL for complex environments or multi-agent systems.
Steep Learning Curve: Reinforcement learning can be a challenging and complex topic, and the project may not be the best starting point for complete beginners to the field.
Dependency on External Libraries: The project relies on several external libraries, such as TensorFlow and PyTorch, which may require additional setup and configuration.
Limited Platform Support: The project is primarily focused on Python and may not provide support for other programming languages or platforms.

Code Examples

Here are a few code examples from the Spinning Up project:

Proximal Policy Optimization (PPO) Implementation:

import spinup.algos.pytorch.ppo.core as core
from spinup.utils.mpi_pytorch import setup_pytorch_for_mpi, sync_params, mpi_avg_grads
from spinup.utils.logx import EpochLogger
from spinup.utils.mpi_tools import mpi_fork, mpi_avg, proc_id, mpi_statistics_scalar, num_procs

def ppo(env_fn, actor_critic=core.MLPActorCritic, ac_kwargs=dict(), seed=0,
        steps_per_epoch=4000, epochs=50, gamma=0.99, clip_ratio=0.2, pi_lr=3e-4,
        vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
        logger_kwargs=dict(), save_freq=10):
    """
    Proximal Policy Optimization (by clipping), 
    with early stopping based on approximate KL
    """
    # ...

Soft Actor-Critic (SAC) Implementation:

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from spinup.utils.logx import EpochLogger
from spinup.utils.mpi_pytorch import setup_pytorch_for_mpi, sync_params, mpi_avg_grads
from spinup.utils.mpi_tools import mpi_fork, mpi_avg, proc_id, mpi_statistics_scalar, num_procs

def sac(env_fn, actor_critic=core.MLPActorCritic, ac_kwargs=dict(), seed=0,
        steps_per_epoch=4000, epochs=100, replay_size=int(1e6), gamma=0.99,
        polyak=0.995, pi_lr=3e-4, q_lr=3e-4, alpha_lr=3e-4, batch_size=100, start_steps=10000,
        update_after=1000, update_every=50, num_test_episodes=10, max_ep_len=1000,
        logger_kwargs=dict(), save_freq=1):
    """
    Soft Actor-Critic (SAC)
    """
    # ...

Vanilla Policy Gradient Implementation:

import numpy as np
import torch
import torch.nn as nn
import

Competitor Comparisons

agents

2,901

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Pros of Agents

More comprehensive library with a wider range of algorithms and features
Better integration with TensorFlow ecosystem and tools
More active development and community support

Cons of Agents

Steeper learning curve due to complexity and extensive features
Potentially slower execution compared to SpinningUp's focused implementation
Less beginner-friendly documentation and tutorials

Code Comparison

SpinningUp:

import spinup
spinup.ppo(env_fn=lambda : gym.make('CartPole-v1'),
           ac_kwargs=dict(hidden_sizes=[32,32]))

Agents:

from tf_agents.agents.ppo import ppo_agent
agent = ppo_agent.PPOAgent(
    time_step_spec,
    action_spec,
    optimizer=tf.compat.v1.train.AdamOptimizer(learning_rate=1e-3),
    actor_net=actor_net,
    value_net=value_net)

Summary

Agents offers a more comprehensive toolkit for reinforcement learning with better TensorFlow integration, while SpinningUp provides a simpler, more focused approach for beginners. The choice between them depends on the user's experience level and specific project requirements.

stable-baselines

4,257

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Pros of Stable-baselines

More comprehensive library with a wider range of algorithms
Better documentation and more active community support
Easier to use for practical applications and production environments

Cons of Stable-baselines

Less focused on educational aspects compared to Spinning Up
May be more complex for beginners to understand the underlying concepts
Requires more setup and dependencies

Code Comparison

Spinning Up (PPO implementation):

def ppo(env_fn, actor_critic=core.mlp_actor_critic, ac_kwargs=dict(), seed=0,
        steps_per_epoch=4000, epochs=50, gamma=0.99, clip_ratio=0.2, pi_lr=3e-4,
        vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
        target_kl=0.01, logger_kwargs=dict(), save_freq=10):
    # ... (implementation details)

Stable-baselines (PPO usage):

from stable_baselines3 import PPO

model = PPO("MlpPolicy", "CartPole-v1", verbose=1)
model.learn(total_timesteps=10000)
obs = env.reset()
for i in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, done, info = env.step(action)

stable-baselines3

10,506

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Pros of stable-baselines3

More comprehensive and actively maintained library with a wider range of algorithms
Better documentation and user-friendly API for easier implementation
Supports both PyTorch and TensorFlow backends

Cons of stable-baselines3

Steeper learning curve for beginners due to its extensive features
Less focus on educational aspects compared to SpinningUp's tutorial-style approach

Code Comparison

SpinningUp:

import spinup
spinup.ppo(env_fn=lambda : gym.make('CartPole-v1'),
           ac_kwargs=dict(hidden_sizes=[64,64]), steps_per_epoch=5000,
           epochs=10)

stable-baselines3:

from stable_baselines3 import PPO
model = PPO("MlpPolicy", "CartPole-v1", verbose=1)
model.learn(total_timesteps=50000)

Both repositories provide implementations of reinforcement learning algorithms, but stable-baselines3 offers a more comprehensive and production-ready toolkit, while SpinningUp focuses on educational aspects and simplicity for beginners. The code comparison shows that stable-baselines3 has a more concise API for training models, while SpinningUp provides more explicit configuration options.

rlkit

2,668

Collection of reinforcement learning algorithms

Pros of rlkit

More extensive algorithm implementations, including off-policy and model-based RL
Greater flexibility and customization options for experiments
Active development with frequent updates and contributions

Cons of rlkit

Steeper learning curve due to more complex codebase
Less comprehensive documentation compared to Spinning Up
Requires more setup and configuration for running experiments

Code Comparison

rlkit example:

from rlkit.torch.sac.sac import SACTrainer
from rlkit.torch.networks import FlattenMlp
from rlkit.launchers.launcher_util import setup_logger

variant = dict(
    algorithm="SAC",
    version="normal",
    layer_size=256,
    replay_buffer_size=int(1E6),
    algorithm_kwargs=dict(
        num_epochs=3000,
        num_eval_steps_per_epoch=5000,
        num_trains_per_train_loop=1000,
        num_expl_steps_per_train_loop=1000,
        min_num_steps_before_training=1000,
        max_path_length=1000,
        batch_size=256,
    ),
    trainer_kwargs=dict(
        discount=0.99,
        soft_target_tau=5e-3,
        target_update_period=1,
        policy_lr=3E-4,
        qf_lr=3E-4,
        reward_scale=1,
        use_automatic_entropy_tuning=True,
    ),
)

Spinning Up example:

from spinup import sac_pytorch
import gym

env_fn = lambda : gym.make('HalfCheetah-v2')
ac_kwargs = dict(hidden_sizes=[256,256])

sac_pytorch(env_fn=env_fn, ac_kwargs=ac_kwargs, steps_per_epoch=4000, epochs=100,
            replay_size=int(1e6), gamma=0.99, polyak=0.995, lr=1e-3, alpha=0.2,
            batch_size=100, start_steps=10000, update_after=1000, update_every=50,
            num_test_episodes=10, max_ep_len=1000, logger_kwargs=dict(output_dir='sac_pytorch_data'),
            use_cuda=True)

dopamine

10,720

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Pros of Dopamine

More comprehensive library with support for various RL algorithms
Better integration with TensorFlow and other Google ML tools
Extensive documentation and examples for different use cases

Cons of Dopamine

Steeper learning curve for beginners compared to SpinningUp
Less focus on educational aspects and more on research/production use
May be overkill for simple RL projects or learning purposes

Code Comparison

SpinningUp (simple PPO implementation):

def ppo(env_fn, actor_critic=core.mlp_actor_critic, ac_kwargs=dict(), seed=0,
        steps_per_epoch=4000, epochs=50, gamma=0.99, clip_ratio=0.2, pi_lr=3e-4,
        vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
        target_kl=0.01, logger_kwargs=dict(), save_freq=10):
    # ... (implementation details)

Dopamine (DQN agent initialization):

agent = dqn_agent.DQNAgent(
    sess,
    num_actions=environment.action_space.n,
    observation_shape=environment.observation_space.shape,
    observation_dtype=environment.observation_space.dtype,
    stack_size=config.stack_size,
    network=atari_lib.NatureDQNNetwork
)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Status: Maintenance (expect bug fixes and minor updates)

Welcome to Spinning Up in Deep RL!

This is an educational resource produced by OpenAI that makes it easier to learn about deep reinforcement learning (deep RL).

For the unfamiliar: reinforcement learning (RL) is a machine learning approach for teaching agents how to solve tasks by trial and error. Deep RL refers to the combination of RL with deep learning.

This module contains a variety of helpful resources, including:

a short introduction to RL terminology, kinds of algorithms, and basic theory,
an essay about how to grow into an RL research role,
a curated list of important papers organized by topic,
a well-documented code repo of short, standalone implementations of key algorithms,
and a few exercises to serve as warm-ups.

Get started at spinningup.openai.com!

Citing Spinning Up

If you reference or use Spinning Up in your research, please cite:

@article{SpinningUp2018,
    author = {Achiam, Joshua},
    title = {{Spinning Up in Deep Reinforcement Learning}},
    year = {2018}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot