Stable-Baselines3 Docs — Reliable Reinforcement Learning Implementations
Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable Baselines.
RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.
Main Features
- Unified structure for all algorithms
- PEP8 compliant (unified code style)
- Documented functions and classes
- Tests, high code coverage and type hints
- Clean code
- Tensorboard support
- The performance of each algorithm was tested (see Results section in their respective page)
- Installation
- Prerequisites
- Bleeding-edge version
- Development version
- Using Docker Images
- General advice when using Reinforcement Learning
- Which algorithm should I use?
- Tips and Tricks when creating a custom environment
- Tips and Tricks when implementing an RL algorithm
- Reproducibility
- Try it online with Colab Notebooks!
- Basic Usage: Training, Saving, Loading
- Multiprocessing: Unleashing the Power of Vectorized Environments
- Multiprocessing with off-policy algorithms
- Dict Observations
- Callbacks: Monitoring Training
- Callbacks: Evaluate Agent Performance
- Atari Games
- PyBullet: Normalizing input features
- Hindsight Experience Replay (HER)
- Learning Rate Schedule
- Advanced Saving and Loading
- Accessing and modifying model parameters
- SB3 and ProcgenEnv
- SB3 with EnvPool or Isaac Gym
- Record a Video
- Bonus: Make a GIF of a Trained Agent
- VecEnv API vs Gym API
- Vectorized Environments Wrappers
- VecEnv
- DummyVecEnv
- SubprocVecEnv
- Wrappers
- SB3 Policy
- Default Network Architecture
- Custom Network Architecture
- Custom Feature Extractor
- Multiple Inputs and Dictionary Observations
- On-Policy Algorithms
- Off-Policy Algorithms
- Custom Callback
- Event Callback
- Callback Collection
- Basic Usage
- Logging More Values
- Logging Images
- Logging Figures/Plots
- Logging Videos
- Logging Hyperparameters
- Directly Accessing The Summary Writer
- Weights & Biases
- Hugging Face 🤗
- MLFLow
- Installation
- Train an Agent
- Enjoy a Trained Agent
- Hyperparameter Optimization
- Colab Notebook: Try it Online!
- Why create this repository?
- Overview
- How to migrate?
- Breaking Changes
- New Features (SB3 vs SB2)
- How and why?
- Anomaly detection with PyTorch
- Numpy parameters
- VecCheckNan Wrapper
- RL Model hyperparameters
- Missing values from datasets
- Algorithms Structure
- Where to start?
- Pre-Processing
- Policy Structure
- Probability distributions
- State-Dependent Exploration
- Misc
- Zip-archive
- Background
- Export to ONNX
- Trace/Export to C++
- Export to tensorflowjs / ONNX-JS
- Export to TFLite / Coral (Edge TPU)
- Manual export
Getting Started
Stable-Baselines3 (SB3) uses vectorized environments (VecEnv) internally. Please read the associated section to learn more about its features and differences compared to a single Gym environment.
Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.
Here is a quick example of how to train and run A2C on a CartPole environment:
import gymnasium as gym from stable_baselines3 import A2C env = gym.make("CartPole-v1", render_mode="rgb_array") model = A2C("MlpPolicy", env, verbose=1) model.learn(total_timesteps=10_000) vec_env = model.get_env() obs = vec_env.reset() for i in range(1000): action, _state = model.predict(obs, deterministic=True) obs, reward, done, info = vec_env.step(action) vec_env.render("human") # VecEnv resets automatically # if done: # obs = vec_env.reset()
You can find explanations about the logger output and names in the Logger section.
Or just train a model with a one line if the environment is registered in Gymnasium and if the policy is registered:
from stable_baselines3 import A2C model = A2C("MlpPolicy", "CartPole-v1").learn(10000)
© Copyright 2022, Stable Baselines3. Revision ba77dd7c .
Installation
We recommend using Anaconda for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.6 or above.
For a quick start you can move straight to installing Stable-Baselines3 in the next step.
Trying to create Atari environments may result to vague errors related to missing DLL files and modules. This is an issue with atari-py package. See this discussion for more information.
Stable Release
To install Stable Baselines3 with pip, execute:
pip install stable-baselines3[extra]
Some shells such as Zsh require quotation marks around brackets, i.e. pip install ‘stable-baselines3[extra]’ More information.
This includes an optional dependencies like Tensorboard, OpenCV or ale-py to train on atari games. If you do not need those, you can use:
pip install stable-baselines3
If you need to work with OpenCV on a machine without a X-server (for instance inside a docker image), you will need to install opencv-python-headless , see issue #298.
Bleeding-edge version
pip install git+https://github.com/DLR-RM/stable-baselines3
pip install "stable_baselines3[extra,tests,docs] @ git+https://github.com/DLR-RM/stable-baselines3"
Development version
To contribute to Stable-Baselines3, with support for running tests and building the documentation.
git clone https://github.com/DLR-RM/stable-baselines3 && cd stable-baselines3 pip install -e .[docs,tests,extra]
Using Docker Images
If you are looking for docker images with stable-baselines already installed in it, we recommend using images from RL Baselines3 Zoo.
Otherwise, the following images contained all the dependencies for stable-baselines3 but not the stable-baselines3 package itself. They are made for development.
Use Built Images
docker pull stablebaselines/stable-baselines3
docker pull stablebaselines/stable-baselines3-cpu
Build the Docker Images
Build GPU image (with nvidia-docker):
Note: if you are using a proxy, you need to pass extra params during build and do some tweaks:
--network=host --build-arg HTTP_PROXY=http://your.proxy.fr:8080/ --build-arg http_proxy=http://your.proxy.fr:8080/ --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/
Run the images (CPU/GPU)
Run the nvidia-docker GPU image
docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/home/mamba/stable-baselines3,type=bind stablebaselines/stable-baselines3 bash -c 'cd /home/mamba/stable-baselines3/ && pytest tests/'
./scripts/run_docker_gpu.sh pytest tests/
docker run -it --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/home/mamba/stable-baselines3,type=bind stablebaselines/stable-baselines3-cpu bash -c 'cd /home/mamba/stable-baselines3/ && pytest tests/'
./scripts/run_docker_cpu.sh pytest tests/
Explanation of the docker command:
- docker run -it create an instance of an image (=container), and run it interactively (so ctrl+c will work)
- —rm option means to remove the container once it exits/stops (otherwise, you will have to use docker rm )
- —network host don’t use network isolation, this allow to use tensorboard/visdom on host machine
- —ipc=host Use the host system’s IPC namespace. IPC (POSIX/SysV IPC) namespace provides separation of named shared memory segments, semaphores and message queues.
- —name test give explicitly the name test to the container, otherwise it will be assigned a random name
- —mount src=. give access of the local directory ( pwd command) to the container (it will be map to /home/mamba/stable-baselines ), so all the logs created in the container in this folder will be kept
- bash -c ‘. ‘ Run command inside the docker image, here run the tests ( pytest tests/ )
© Copyright 2022, Stable Baselines3. Revision ba77dd7c .