softlearning

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

1,323

244

1,323

View on GitHub

Top Related Projects

gym

36,310

A toolkit for developing and comparing reinforcement learning algorithms.

stable-baselines

4,300

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

agents

2,943

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

stable-baselines3

11,260

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

garage

2,009

A toolkit for reproducible reinforcement learning research.

Gymnasium

9,796

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Quick Overview

Softlearning is an open-source reinforcement learning (RL) framework developed by the Berkeley Artificial Intelligence Research (BAIR) lab. It focuses on implementing and experimenting with soft actor-critic (SAC) algorithms and their variants, designed for continuous control tasks and robotics applications.

Pros

Implements state-of-the-art soft actor-critic algorithms
Supports various environments, including OpenAI Gym and MuJoCo
Provides a flexible and modular codebase for easy customization
Includes pre-trained models and benchmarks for comparison

Cons

Limited documentation and tutorials for beginners
Primarily focused on SAC algorithms, which may not suit all RL problems
Requires specific dependencies and environment setups
May have a steeper learning curve compared to more general-purpose RL libraries

Code Examples

Creating and training an SAC agent:

from softlearning.algorithms import SAC
from softlearning.environments import gym

env = gym.make('HalfCheetah-v2')
algorithm = SAC(
    env,
    Q_lr=3e-4,
    policy_lr=3e-4,
    alpha_lr=3e-4,
    reward_scale=1.0,
    target_entropy='auto',
)

algorithm.train(n_epochs=1000, batch_size=256)

Evaluating a trained agent:

from softlearning.policies import TanhGaussianPolicy
from softlearning.environments import gym

env = gym.make('HalfCheetah-v2')
policy = TanhGaussianPolicy(env.observation_space, env.action_space)
policy.load_weights('path/to/saved/weights.h5')

total_reward = 0
observation = env.reset()
for _ in range(1000):
    action = policy.actions_np([observation])[0]
    observation, reward, done, _ = env.step(action)
    total_reward += reward
    if done:
        break

print(f"Total reward: {total_reward}")

Customizing the replay buffer:

from softlearning.replay_pools import SimpleReplayPool

custom_replay_pool = SimpleReplayPool(
    environment=env,
    max_size=1e6,
    save_path='/path/to/save/replay/buffer',
    load_path='/path/to/load/replay/buffer'
)

algorithm.replay_pool = custom_replay_pool

Getting Started

To get started with Softlearning:

Clone the repository:

git clone https://github.com/rail-berkeley/softlearning.git
cd softlearning

Install dependencies:
```
pip install -r requirements.txt
```
Install MuJoCo (if using MuJoCo environments): Follow instructions at https://github.com/openai/mujoco-py

Run an example experiment:

python examples/development/main.py --algorithm SAC --env HalfCheetah-v2

For more detailed instructions and advanced usage, refer to the project's README and documentation.

Competitor Comparisons

gym

36,310

A toolkit for developing and comparing reinforcement learning algorithms.

Pros of Gym

Widely adopted and supported by the RL community
Extensive documentation and tutorials available
Large variety of pre-built environments for different RL tasks

Cons of Gym

Limited focus on continuous control tasks
Less emphasis on scalable and efficient RL algorithms
Fewer built-in tools for experiment management and visualization

Code Comparison

Gym:

import gym
env = gym.make('CartPole-v1')
observation = env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)

Softlearning:

from softlearning.environments.gym import GymEnv
from softlearning.algorithms.sac import SAC
env = GymEnv('Swimmer-v2')
algorithm = SAC(env=env, Q_lr=3e-4, policy_lr=3e-4)
algorithm.train()

Softlearning focuses on implementing and evaluating deep RL algorithms, particularly for continuous control tasks. It provides a more specialized framework for advanced RL research, including implementations of algorithms like Soft Actor-Critic (SAC). Gym, on the other hand, offers a broader range of environments and is more accessible for beginners, but may require additional implementations for advanced algorithms and experiment management.

stable-baselines

4,300

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Pros of stable-baselines

Wider range of implemented algorithms (e.g., PPO, A2C, DDPG, SAC)
More extensive documentation and tutorials
Active community support and regular updates

Cons of stable-baselines

Less focus on soft actor-critic (SAC) variants
May be less optimized for specific robotics tasks
Potentially steeper learning curve for beginners

Code Comparison

softlearning:

from softlearning.environments.utils import get_environment
from softlearning.algorithms.sac import SAC

env = get_environment('gym', 'HalfCheetah-v2')
algorithm = SAC(env, Q_lr=3e-4, policy_lr=3e-4)
algorithm.train()

stable-baselines:

from stable_baselines3 import SAC
from stable_baselines3.common.env_util import make_vec_env

env = make_vec_env('HalfCheetah-v2', n_envs=4)
model = SAC('MlpPolicy', env, verbose=1)
model.learn(total_timesteps=1000000)

Both repositories provide implementations of reinforcement learning algorithms, but they have different focuses. softlearning specializes in soft actor-critic and its variants, particularly for robotics applications. stable-baselines offers a broader range of algorithms and is more general-purpose. The code examples show how to set up and train a SAC agent in each framework, highlighting the differences in API design and usage.

agents

2,943

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Pros of TensorFlow Agents

Broader scope, covering a wide range of RL algorithms and environments
Better integration with TensorFlow ecosystem and tools
More active development and larger community support

Cons of TensorFlow Agents

Steeper learning curve due to its comprehensive nature
Less focused on specific soft actor-critic implementations
May be overkill for projects solely focused on soft actor-critic algorithms

Code Comparison

SoftLearning:

from softlearning.environments.utils import get_environment
from softlearning.algorithms.sac import SAC

env = get_environment('gym', 'HalfCheetah-v2')
algorithm = SAC(env)

TensorFlow Agents:

from tf_agents.environments import suite_gym
from tf_agents.agents.sac import sac_agent

env = suite_gym.load('HalfCheetah-v2')
agent = sac_agent.SacAgent(env.time_step_spec(), env.action_spec())

Both repositories provide implementations of soft actor-critic algorithms, but SoftLearning is more focused on this specific approach, while TensorFlow Agents offers a broader range of reinforcement learning tools and algorithms. SoftLearning may be more suitable for projects specifically targeting soft actor-critic methods, while TensorFlow Agents provides a more comprehensive toolkit for various RL applications within the TensorFlow ecosystem.

stable-baselines3

11,260

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

Pros of stable-baselines3

More comprehensive documentation and tutorials
Wider range of implemented algorithms
Active community and frequent updates

Cons of stable-baselines3

Less focus on specific soft actor-critic implementations
May have higher computational overhead for some algorithms

Code Comparison

softlearning:

from softlearning.environments.utils import get_environment
from softlearning.algorithms.sac import SAC

env = get_environment('gym', 'HalfCheetah-v2')
algorithm = SAC(env)

stable-baselines3:

from stable_baselines3 import SAC
from gym import make

env = make('HalfCheetah-v2')
model = SAC('MlpPolicy', env, verbose=1)

Both repositories provide implementations of reinforcement learning algorithms, with a focus on soft actor-critic (SAC). softlearning is more specialized in SAC variants, while stable-baselines3 offers a broader range of algorithms. stable-baselines3 has more extensive documentation and a larger community, making it potentially easier for beginners. However, softlearning may be more suitable for researchers focusing specifically on SAC implementations. The code comparison shows that both libraries offer similar ease of use, with stable-baselines3 having a slightly more streamlined API.

garage

2,009

A toolkit for reproducible reinforcement learning research.

Pros of garage

Broader range of algorithms: Supports a wider variety of RL algorithms, including policy gradient methods, Q-learning, and more
Modular design: Offers a more flexible and extensible architecture, allowing easier integration of custom components
Better documentation: Provides more comprehensive documentation and examples for users

Cons of garage

Steeper learning curve: May be more complex for beginners due to its broader scope and flexibility
Less focus on soft actor-critic: While it supports SAC, it's not as specialized in this algorithm as softlearning

Code Comparison

softlearning:

from softlearning.environments.utils import get_environment
from softlearning.algorithms.sac import SAC

env = get_environment('gym', 'HalfCheetah-v2')
algorithm = SAC(env=env, Q_lr=3e-4, policy_lr=3e-4)

garage:

from garage import wrap_experiment
from garage.envs import GymEnv
from garage.experiment import LocalRunner
from garage.tf.algos import SAC

@wrap_experiment
def sac_halfcheetah(ctxt=None):
    env = GymEnv('HalfCheetah-v2')
    algo = SAC(env_spec=env.spec, qf_lr=3e-4, policy_lr=3e-4)
    runner = LocalRunner(ctxt)
    runner.setup(algo, env)
    runner.train(n_epochs=1000, batch_size=256)

Gymnasium

9,796

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Pros of Gymnasium

Broader scope and wider range of environments for reinforcement learning
More active development and larger community support
Better documentation and tutorials for beginners

Cons of Gymnasium

May have a steeper learning curve for complex environments
Less focused on specific robotics applications compared to Softlearning

Code Comparison

Gymnasium:

import gymnasium as gym
env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42)
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

Softlearning:

from softlearning.environments.gym import GymEnv
env = GymEnv('CartPole-v1')
observation = env.reset()
for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, done, info = env.step(action)

The main differences are in the import statement and the step function return values. Gymnasium provides more detailed information with terminated and truncated flags, while Softlearning uses a single done flag.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Softlearning

Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is fairly thin and primarily optimized for our own development purposes. It utilizes the tf.keras modules for most of the model classes (e.g. policies and value functions). We use Ray for the experiment orchestration. Ray Tune and Autoscaler implement several neat features that enable us to seamlessly run the same experiment scripts that we use for local prototyping to launch large-scale experiments on any chosen cloud service (e.g. GCP or AWS), and intelligently parallelize and distribute training for effective resource allocation.

This implementation uses Tensorflow. For a PyTorch implementation of soft actor-critic, take a look at rlkit.

Getting Started

Prerequisites

The environment can be run either locally using conda or inside a docker container. For conda installation, you need to have Conda installed. For docker installation you will need to have Docker and Docker Compose installed. Also, most of our environments currently require a MuJoCo license.

Conda Installation

Download and install MuJoCo 1.50 and 2.00 from the MuJoCo website. We assume that the MuJoCo files are extracted to the default location (~/.mujoco/mjpro150 and ~/.mujoco/mujoco200_{platform}). Unfortunately, gym and dm_control expect different paths for MuJoCo 2.00 installation, which is why you will need to have it installed both in ~/.mujoco/mujoco200_{platform} and ~/.mujoco/mujoco200. The easiest way is to create a symlink from ~/.mujoco/mujoco200_{plaftorm} -> ~/.mujoco/mujoco200 with: ln -s ~/.mujoco/mujoco200_{platform} ~/.mujoco/mujoco200.
Copy your MuJoCo license key (mjkey.txt) to ~/.mujoco/mjkey.txt:
Clone softlearning

git clone https://github.com/rail-berkeley/softlearning.git ${SOFTLEARNING_PATH}

Create and activate conda environment, install softlearning to enable command line interface.

cd ${SOFTLEARNING_PATH}
conda env create -f environment.yml
conda activate softlearning
pip install -e ${SOFTLEARNING_PATH}

The environment should be ready to run. See examples section for examples of how to train and simulate the agents.

Finally, to deactivate and remove the conda environment:

conda deactivate
conda remove --name softlearning --all

Docker Installation

docker-compose

To build the image and run the container:

export MJKEY="$(cat ~/.mujoco/mjkey.txt)" \
    && docker-compose \
        -f ./docker/docker-compose.dev.cpu.yml \
        up \
        -d \
        --force-recreate

You can access the container with the typical Docker exec-command, i.e.

docker exec -it softlearning bash

See examples section for examples of how to train and simulate the agents.

Finally, to clean up the docker setup:

docker-compose \
    -f ./docker/docker-compose.dev.cpu.yml \
    down \
    --rmi all \
    --volumes

Examples

Training and simulating an agent

To train the agent

softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000  # Save the checkpoint to resume training later

To simulate the resulting policy: First, find the absolute path that the checkpoint is saved to. By default (i.e. without specifying the log-dir argument to the previous script), the data is saved under ~/ray_results/<universe>/<domain>/<task>/<datatimestamp>-<exp-name>/<trial-id>/<checkpoint-id>. For example: ~/ray_results/gym/HalfCheetah/v3/2018-12-12T16-48-37-my-sac-experiment-1-0/mujoco-runner_0_seed=7585_2018-12-12_16-48-37xuadh9vd/checkpoint_1000/. The next command assumes that this path is found from ${SAC_CHECKPOINT_DIR} environment variable.

python -m examples.development.simulate_policy \
    ${SAC_CHECKPOINT_DIR} \
    --max-path-length 1000 \
    --num-rollouts 1 \
    --render-kwargs '{"mode": "human"}'

examples.development.main contains several different environments and there are more example scripts available in the /examples folder. For more information about the agents and configurations, run the scripts with --help flag: python ./examples/development/main.py --help

optional arguments:
  -h, --help            show this help message and exit
  --universe {robosuite,dm_control,gym}
  --domain DOMAIN
  --task TASK
  --checkpoint-replay-pool CHECKPOINT_REPLAY_POOL
                        Whether a checkpoint should also saved the replay
                        pool. If set, takes precedence over
                        variant['run_params']['checkpoint_replay_pool']. Note
                        that the replay pool is saved (and constructed) piece
                        by piece so that each experience is saved only once.
  --algorithm ALGORITHM
  --policy {gaussian}
  --exp-name EXP_NAME
  --mode MODE
  --run-eagerly RUN_EAGERLY
                        Whether to run tensorflow in eager mode.
  --local-dir LOCAL_DIR
                        Destination local folder to save training results.
  --confirm-remote [CONFIRM_REMOTE]
                        Whether or not to query yes/no on remote run.
  --video-save-frequency VIDEO_SAVE_FREQUENCY
                        Save frequency for videos.
  --cpus CPUS           Cpus to allocate to ray process. Passed to `ray.init`.
  --gpus GPUS           Gpus to allocate to ray process. Passed to `ray.init`.
  --resources RESOURCES
                        Resources to allocate to ray process. Passed to
                        `ray.init`.
  --include-webui INCLUDE_WEBUI
                        Boolean flag indicating whether to start theweb UI,
                        which is a Jupyter notebook. Passed to `ray.init`.
  --temp-dir TEMP_DIR   If provided, it will specify the root temporary
                        directory for the Ray process. Passed to `ray.init`.
  --resources-per-trial RESOURCES_PER_TRIAL
                        Resources to allocate for each trial. Passed to
                        `tune.run`.
  --trial-cpus TRIAL_CPUS
                        CPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for CPUs. Passed to
                        `tune.run`.
  --trial-gpus TRIAL_GPUS
                        GPUs to allocate for each trial. Note: this is only
                        used for Ray's internal scheduling bookkeeping, and is
                        not an actual hard limit for GPUs. Passed to
                        `tune.run`.
  --trial-extra-cpus TRIAL_EXTRA_CPUS
                        Extra CPUs to reserve in case the trials need to
                        launch additional Ray actors that use CPUs.
  --trial-extra-gpus TRIAL_EXTRA_GPUS
                        Extra GPUs to reserve in case the trials need to
                        launch additional Ray actors that use GPUs.
  --num-samples NUM_SAMPLES
                        Number of times to repeat each trial. Passed to
                        `tune.run`.
  --upload-dir UPLOAD_DIR
                        Optional URI to sync training results to (e.g.
                        s3://<bucket> or gs://<bucket>). Passed to `tune.run`.
  --trial-name-template TRIAL_NAME_TEMPLATE
                        Optional string template for trial name. For example:
                        '{trial.trial_id}-seed={trial.config[run_params][seed]
                        }' Passed to `tune.run`.
  --checkpoint-frequency CHECKPOINT_FREQUENCY
                        How many training iterations between checkpoints. A
                        value of 0 (default) disables checkpointing. If set,
                        takes precedence over
                        variant['run_params']['checkpoint_frequency']. Passed
                        to `tune.run`.
  --checkpoint-at-end CHECKPOINT_AT_END
                        Whether to checkpoint at the end of the experiment. If
                        set, takes precedence over
                        variant['run_params']['checkpoint_at_end']. Passed to
                        `tune.run`.
  --max-failures MAX_FAILURES
                        Try to recover a trial from its last checkpoint at
                        least this many times. Only applies if checkpointing
                        is enabled. Passed to `tune.run`.
  --restore RESTORE     Path to checkpoint. Only makes sense to set if running
                        1 trial. Defaults to None. Passed to `tune.run`.
  --server-port SERVER_PORT
                        Port number for launching TuneServer. Passed to
                        `tune.run`.

Resume training from a saved checkpoint

This feature is currently broken!

In order to resume training from previous checkpoint, run the original example main-script, with an additional --restore flag. For example, the previous example can be resumed as follows:

softlearning run_example_local examples.development \
    --algorithm SAC \
    --universe gym \
    --domain HalfCheetah \
    --task v3 \
    --exp-name my-sac-experiment-1 \
    --checkpoint-frequency 1000 \
    --restore ${SAC_CHECKPOINT_PATH}

References

The algorithms are based on the following papers:

Soft Actor-Critic Algorithms and Applications.
Tuomas Haarnoja*, Aurick Zhou*, Kristian Hartikainen*, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, and Sergey Levine. arXiv preprint, 2018.
paper | videos

Latent Space Policies for Hierarchical Reinforcement Learning.
Tuomas Haarnoja*, Kristian Hartikainen*, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. International Conference on Machine Learning (ICML), 2018.
paper | videos

Composable Deep Reinforcement Learning for Robotic Manipulation.
Tuomas Haarnoja, Vitchyr Pong, Aurick Zhou, Murtaza Dalal, Pieter Abbeel, Sergey Levine. International Conference on Robotics and Automation (ICRA), 2018.
paper | videos

Reinforcement Learning with Deep Energy-Based Policies.
Tuomas Haarnoja*, Haoran Tang*, Pieter Abbeel, Sergey Levine. International Conference on Machine Learning (ICML), 2017.
paper | videos

If Softlearning helps you in your academic research, you are encouraged to cite our paper. Here is an example bibtex:

@techreport{haarnoja2018sacapps,
  title={Soft Actor-Critic Algorithms and Applications},
  author={Tuomas Haarnoja and Aurick Zhou and Kristian Hartikainen and George Tucker and Sehoon Ha and Jie Tan and Vikash Kumar and Henry Zhu and Abhishek Gupta and Pieter Abbeel and Sergey Levine},
  journal={arXiv preprint arXiv:1812.05905},
  year={2018}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot