trfl

TensorFlow Reinforcement Learning

3,135

388

3,135

View on GitHub

Top Related Projects

acme

3,761

A library of reinforcement learning components and agents

baselines

16,374

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

agents

2,943

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

stable-baselines

4,300

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Gymnasium

9,796

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

ignite

4,682

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

Quick Overview

TRFL (pronounced "truffle") is a library built on top of TensorFlow that provides building blocks for reinforcement learning algorithms. It offers a collection of useful functions and classes that can be combined to implement various RL algorithms, making it easier for researchers and practitioners to experiment with and develop new RL techniques.

Pros

Provides a wide range of RL-specific operations and loss functions
Built on top of TensorFlow, allowing for easy integration with existing TensorFlow projects
Offers flexibility in combining different components to create custom RL algorithms
Well-documented with clear examples and explanations

Cons

Requires a good understanding of reinforcement learning concepts
May have a steeper learning curve for those new to TensorFlow
Limited compared to more comprehensive RL frameworks like OpenAI Gym or RLlib
Not actively maintained (last update was in 2020)

Code Examples

Calculating Q-learning loss:

import tensorflow as tf
import trfl

q_values = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
actions = tf.constant([1, 2])
rewards = tf.constant([0.5, 1.0])
pcontinues = tf.constant([0.9, 0.8])
target_q_values = tf.constant([[1.1, 2.1, 3.1], [4.1, 5.1, 6.1]])

loss, _ = trfl.qlearning(q_values, actions, rewards, pcontinues, target_q_values)

Implementing a policy gradient loss:

import tensorflow as tf
import trfl

logits = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
actions = tf.constant([1, 2])
advantages = tf.constant([0.5, -0.3])

loss, _ = trfl.policy_gradient(logits, actions, advantages)

Calculating n-step Sarsa loss:

import tensorflow as tf
import trfl

q_values = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
actions = tf.constant([1, 2])
rewards = tf.constant([0.5, 1.0, 0.7])
pcontinues = tf.constant([0.9, 0.8, 0.7])

loss, _ = trfl.sarsa_n(q_values, actions, rewards, pcontinues, 2)

Getting Started

To get started with TRFL, follow these steps:

Install TRFL using pip:
```
pip install trfl
```
Import TRFL in your Python script:
```
import tensorflow as tf
import trfl
```

Use TRFL functions in your RL algorithm implementation:

# Example: Q-learning loss calculation
q_values = tf.placeholder(tf.float32, [None, num_actions])
actions = tf.placeholder(tf.int32, [None])
rewards = tf.placeholder(tf.float32, [None])
pcontinues = tf.placeholder(tf.float32, [None])
target_q_values = tf.placeholder(tf.float32, [None, num_actions])

loss, _ = trfl.qlearning(q_values, actions, rewards, pcontinues, target_q_values)

Competitor Comparisons

acme

3,761

A library of reinforcement learning components and agents

Pros of Acme

More comprehensive framework for RL research, offering a wider range of tools and components
Better support for distributed and parallel computing, enabling more efficient large-scale experiments
More active development and maintenance, with regular updates and contributions

Cons of Acme

Steeper learning curve due to its more complex architecture and broader scope
Potentially overkill for simpler RL projects or beginners in the field
May require more computational resources for full utilization of its features

Code Comparison

TRFL example (loss computation):

loss = trfl.dpg(policy, q_values, action, dqda_clipping=None, clip_norm=False)

Acme example (agent creation):

agent = td3.TD3(
    environment_spec=environment_spec,
    policy_network=policy_network,
    critic_network=critic_network,
    observation_network=observation_network,
)

Summary

Acme offers a more comprehensive and scalable framework for reinforcement learning research, with better support for distributed computing and a wider range of tools. However, it may be more complex and resource-intensive compared to TRFL. TRFL focuses on providing specific RL operations and loss functions, making it potentially easier to use for simpler projects or beginners. The choice between the two depends on the scale and complexity of the RL project at hand.

baselines

16,374

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Pros of Baselines

Broader scope, covering various RL algorithms and environments
More extensive documentation and examples
Active community and frequent updates

Cons of Baselines

Less focus on specific RL components
Potentially steeper learning curve for beginners
May require more setup and configuration

Code Comparison

TRFL (Loss function example):

loss = trfl.dpg(policy, q_values, action, dqda_clipping=None, clip_norm=False)

Baselines (DQN implementation snippet):

act, train, update_target, debug = deepq.build_train(
    make_obs_ph=lambda name: U.BatchInput(env.observation_space.shape, name=name),
    q_func=model,
    num_actions=env.action_space.n,
    optimizer=tf.train.AdamOptimizer(learning_rate=1e-4),
)

Summary

TRFL focuses on providing modular RL components, while Baselines offers a more comprehensive suite of RL algorithms and tools. TRFL may be more suitable for researchers looking to experiment with specific RL elements, whereas Baselines is better suited for those seeking ready-to-use implementations of complete RL algorithms.

agents

2,943

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Pros of agents

More comprehensive and actively maintained
Includes full RL algorithms, not just building blocks
Better documentation and examples

Cons of agents

Steeper learning curve due to more complex architecture
Potentially slower execution compared to trfl's focused approach

Code Comparison

trfl:

loss = trfl.dpg(policy, values, target_values, action_dims)

agents:

agent = tf_agents.agents.DdpgAgent(
    time_step_spec,
    action_spec,
    actor_network=actor_net,
    critic_network=critic_net,
    actor_optimizer=tf.compat.v1.train.AdamOptimizer(),
    critic_optimizer=tf.compat.v1.train.AdamOptimizer()
)

trfl provides low-level building blocks for RL algorithms, while agents offers complete, ready-to-use agent implementations. trfl's approach is more flexible but requires more work to build full algorithms. agents is more suitable for those who want to quickly implement and experiment with established RL methods.

Both libraries are built on TensorFlow, but agents has a stronger focus on TensorFlow 2.x compatibility. trfl's development seems to have slowed down, while agents is actively maintained and updated.

Choose trfl for fine-grained control over RL components, or agents for a more comprehensive, production-ready RL toolkit.

stable-baselines

4,300

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Pros of stable-baselines

More comprehensive and user-friendly documentation
Wider range of implemented algorithms (e.g., PPO, SAC, TD3)
Active community support and regular updates

Cons of stable-baselines

Less flexibility for customizing individual components
Higher-level API, which may limit fine-grained control
Potentially slower execution due to additional abstraction layers

Code Comparison

stable-baselines:

from stable_baselines3 import PPO

model = PPO("MlpPolicy", "CartPole-v1", verbose=1)
model.learn(total_timesteps=10000)

trfl:

import trfl

q_values = network(state)
action = tf.argmax(q_values, axis=-1)
loss = trfl.qlearning(q_values, action, reward, discount, next_q_values).loss

Key Differences

stable-baselines offers a higher-level API, making it easier to get started
trfl provides more granular control over individual reinforcement learning components
stable-baselines includes pre-implemented algorithms, while trfl focuses on building blocks for custom implementations

Gymnasium

9,796

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Pros of Gymnasium

More active development and community support
Broader range of environments and tools for reinforcement learning
Better documentation and tutorials for beginners

Cons of Gymnasium

Less focused on specific reinforcement learning algorithms
May require additional libraries for advanced RL techniques

Code Comparison

Gymnasium example:

import gymnasium as gym

env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)

TRFL example:

import tensorflow as tf
import trfl

q_values = tf.constant([[1, 2, 3], [4, 5, 6]], dtype=tf.float32)
actions = tf.constant([0, 1], dtype=tf.int32)
q_learning = trfl.qlearning(q_values, actions, 0.9)

Gymnasium provides a more general-purpose framework for reinforcement learning environments, while TRFL focuses on specific RL algorithms and TensorFlow integration. Gymnasium is better suited for beginners and those looking for a wide range of environments, while TRFL may be more appropriate for researchers working on specific RL techniques with TensorFlow.

ignite

4,682

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

Pros of Ignite

More general-purpose, supporting a wide range of deep learning tasks beyond reinforcement learning
Larger community and more frequent updates, leading to better support and documentation
Seamless integration with PyTorch ecosystem, making it easier to use with existing PyTorch projects

Cons of Ignite

Less specialized for reinforcement learning tasks compared to TRFL
May require more setup and configuration for specific RL algorithms
Steeper learning curve for users primarily focused on reinforcement learning

Code Comparison

TRFL (Reinforcement Learning specific):

import tensorflow as tf
import trfl

q_values = tf.constant([[1, 2, 3], [4, 5, 6]], dtype=tf.float32)
actions = tf.constant([0, 1], dtype=tf.int32)
ql_loss, _ = trfl.qlearning(q_values, actions, 0.9)

Ignite (General-purpose training loop):

from ignite.engine import Engine, Events

def train_step(engine, batch):
    # Training logic here
    return loss

trainer = Engine(train_step)
trainer.run(data_loader, max_epochs=10)

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

TRFL

TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Learning agents.

Installation

TRFL can be installed from pip with the following command: pip install trfl

TRFL will work with both the CPU and GPU version of tensorflow, but to allow for that it does not list Tensorflow as a requirement, so you need to install Tensorflow and Tensorflow-probability separately if you haven't already done so.

Usage Example

import tensorflow as tf
import trfl

# Q-values for the previous and next timesteps, shape [batch_size, num_actions].
q_tm1 = tf.get_variable(
    "q_tm1", initializer=[[1., 1., 0.], [1., 2., 0.]], dtype=tf.float32)
q_t = tf.get_variable(
    "q_t", initializer=[[0., 1., 0.], [1., 2., 0.]], dtype=tf.float32)

# Action indices, discounts and rewards, shape [batch_size].
a_tm1 = tf.constant([0, 1], dtype=tf.int32)
r_t = tf.constant([1, 1], dtype=tf.float32)
pcont_t = tf.constant([0, 1], dtype=tf.float32)  # the discount factor

# Q-learning loss, and auxiliary data.
loss, q_learning = trfl.qlearning(q_tm1, a_tm1, r_t, pcont_t, q_t)

loss is the tensor representing the loss. For Q-learning, it is half the squared difference between the predicted Q-values and the TD targets, shape [batch_size]. Extra information is in the q_learning namedtuple, including q_learning.td_error and q_learning.target.

The loss tensor can be differentiated to derive the corresponding RL update.

reduced_loss = tf.reduce_mean(loss)
optimizer = tf.train.AdamOptimizer(learning_rate=0.1)
train_op = optimizer.minimize(reduced_loss)

All loss functions in the package return both a loss tensor and a namedtuple with extra information, using the above convention, but different functions may have different extra fields. Check the documentation of each function below for more information.

Documentation

Check out the full documentation page here.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot