On Choosing a Deep Reinforcement Learning Library

Tech Blog Thomas Simonini

As deep reinforcement learning continues to become one of the most hyped strategies to achieve AGI (aka Artificial General Intelligence) more and more libraries are being developed. But choosing the best for your needs can be a daunting task.

In recent years, we’ve seen an acceleration of innovations in deep reinforcement learning. Examples include beating the champion of the game Go with AlphaGo in 2016, OpenAI and the PPO in 2017, the resurgence of curiosity-driven learning agents in 2018 with UberAI GoExplore and OpenAI RND, and finally, the OpenAI Five that beats the best Dota players in the world.

 
benchmarking deep reinforcement learning libraries

Deep Reinforcement Learning Library Criteria

book walking gif

KerasRL (2.3/5)

State of the art RL methods ✅ ✅ ❌ ❌ ❌

Easy to start ✅ ✅ ✅ ✅ ❌

The code is full of comments which helps you to understand even the most obscure functions.

The code is really easy to read and demonstrates a good separation between agents, policy, and memory. There is documentation, but it remains incomplete. The explanations of each definable parameter are missing.

Easy to plug your own environment ✅ ❌ ❌ ❌ ❌

Easy to modify the agents ✅ ✅ ✅ ✅ ✅

Community and updates ✅ ❌ ❌ ❌ ❌

Tensorboard support ❌ ❌ ❌ ❌ ❌

Other features ✅ ✅ ✅ ❌ ❌

Tensorforce (4.1/5)

 
reinforce.io logo

State of the art RL methods ✅ ✅ ✅ ✅ ❌

Easy to start ✅ ✅ ✅ ❌ ❌

Easy to plug into your own environment ✅ ✅ ✅ ✅ ✅

Easy to understand the code and modify it ✅ ✅ ✅ ✅ ❌

Community and updates ✅ ✅ ✅ ❌ ❌

Tensorboard support ✅ ✅ ✅ ✅ ✅

OpenAI Baselines (2.2/5)

 
openAI logo

State of the art RL methods ✅ ✅ ✅ ✅ ❌

Easy to start ✅ ✅ ❌ ❌ ❌

python -m baselines.run — alg=ppo2 — env=BipedalWalker-v2 — num_timesteps=2500 — num_env=1 — save_path=./models/bipedalwalker — save_video_interval=100

Easy to plug your own environment ✅ ❌ ❌ ❌ ❌

Easy to understand the code and modify it ✅ ❌ ❌ ❌ ❌

Community and updates ✅ ✅ ✅ ❌ ❌

Tensorboard support ✅ ❌ ❌ ❌ ❌

Stable Baselines (4.6/5)

 
Baymax Big Hero 6 calculating a basketball dunk

State of the art RL methods ✅ ✅ ✅ ✅ ✅

Easy to start ✅ ✅ ✅ ✅ ✅

from stable_baselines import PPO2

model = PPO2('MlpPolicy', 'CartPole-v1').learn(10000)
 
Stable Baselines RL training in one line

Easy to plug your own environment ✅ ✅ ✅ ❌ ❌

Easy to understand the code and modify it ✅ ✅ ✅ ❌ ❌

Community and updates ✅ ✅ ✅ ✅ ✅

Tensorboard support ✅ ✅ ✅ ✅ ✅

Other features (vec env…) ✅ ✅ ✅ ✅ ✅

TF Agents (4.3/5)

State of the art RL methods ✅ ✅ ✅ ✅ ✅

Easy to start ✅ ✅ ✅ ❌ ❌

Easy to plug our own environment ✅ ✅ ✅ ❌ ❌

Easy to understand the code and modify it ✅ ✅ ✅ ✅ ❌

Community and updates ✅ ✅ ✅ ✅ ✅

Tensorboard support ✅ ✅ ✅ ✅ ✅

Part 2: Implement an Agent That Learns to Walk With BipedalWalker-v2

BipedalWalker-v2 With TF Agents

BipedalWalker-v2 With Stable Baselines

BipedalWalker documentation
Look at this beautiful documentation 😍
Tensorboard visualization of results
agent that learns to walk using the two best deep reinforcement learning libraries

You May Also Like

Taming LLM Outputs: Your Guide to Structured Text Generation

Read More

A Tour of Popular Open Source Frameworks for LLM-Powered Agents

Read More

With Context Windows Expanding So Rapidly, Is RAG Obsolete?

Read More

Retrieval Augmented ML: How Can You Best Leverage a Data Lake?

Read More