Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable Baselines.

RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Main Features

  • Unified structure for all algorithms
  • PEP8 compliant (unified code style)
  • Documented functions and classes
  • Tests, high code coverage and type hints
  • Clean code
  • Tensorboard support
  • The performance of each algorithm was tested (see Results section in their respective page)
  • Installation
    • Prerequisites
    • Bleeding-edge version
    • Development version
    • Using Docker Images
    • General advice when using Reinforcement Learning
    • Which algorithm should I use?
    • Tips and Tricks when creating a custom environment
    • Tips and Tricks when implementing an RL algorithm
    • Reproducibility
    • Try it online with Colab Notebooks!
    • Basic Usage: Training, Saving, Loading
    • Multiprocessing: Unleashing the Power of Vectorized Environments
    • Multiprocessing with off-policy algorithms
    • Dict Observations
    • Callbacks: Monitoring Training
    • Callbacks: Evaluate Agent Performance
    • Atari Games
    • PyBullet: Normalizing input features
    • Hindsight Experience Replay (HER)
    • Learning Rate Schedule
    • Advanced Saving and Loading
    • Accessing and modifying model parameters
    • SB3 and ProcgenEnv
    • SB3 with EnvPool or Isaac Gym
    • Record a Video
    • Bonus: Make a GIF of a Trained Agent
    • VecEnv API vs Gym API
    • Vectorized Environments Wrappers
    • VecEnv
    • DummyVecEnv
    • SubprocVecEnv
    • Wrappers
    • SB3 Policy
    • Default Network Architecture
    • Custom Network Architecture
    • Custom Feature Extractor
    • Multiple Inputs and Dictionary Observations
    • On-Policy Algorithms
    • Off-Policy Algorithms
    • Custom Callback
    • Event Callback
    • Callback Collection
    • Basic Usage
    • Logging More Values
    • Logging Images
    • Logging Figures/Plots
    • Logging Videos
    • Logging Hyperparameters
    • Directly Accessing The Summary Writer
    • Weights & Biases
    • Hugging Face 🤗
    • MLFLow
    • Installation
    • Train an Agent
    • Enjoy a Trained Agent
    • Hyperparameter Optimization
    • Colab Notebook: Try it Online!
    • Why create this repository?
    • Overview
    • How to migrate?
    • Breaking Changes
    • New Features (SB3 vs SB2)
    • How and why?
    • Anomaly detection with PyTorch
    • Numpy parameters
    • VecCheckNan Wrapper
    • RL Model hyperparameters
    • Missing values from datasets
    • Algorithms Structure
    • Where to start?
    • Pre-Processing
    • Policy Structure
    • Probability distributions
    • State-Dependent Exploration
    • Misc
    • Zip-archive
    • Background
    • Export to ONNX
    • Trace/Export to C++
    • Export to tensorflowjs / ONNX-JS
    • Export to TFLite / Coral (Edge TPU)
    • Manual export


    Getting Started

    Stable-Baselines3 (SB3) uses vectorized environments (VecEnv) internally. Please read the associated section to learn more about its features and differences compared to a single Gym environment.

    Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.

    Here is a quick example of how to train and run A2C on a CartPole environment:

    import gymnasium as gym from stable_baselines3 import A2C env = gym.make("CartPole-v1", render_mode="rgb_array") model = A2C("MlpPolicy", env, verbose=1) model.learn(total_timesteps=10_000) vec_env = model.get_env() obs = vec_env.reset() for i in range(1000): action, _state = model.predict(obs, deterministic=True) obs, reward, done, info = vec_env.step(action) vec_env.render("human") # VecEnv resets automatically # if done: # obs = vec_env.reset() 

    You can find explanations about the logger output and names in the Logger section.

    Or just train a model with a one line if the environment is registered in Gymnasium and if the policy is registered:

    from stable_baselines3 import A2C model = A2C("MlpPolicy", "CartPole-v1").learn(10000) 

    We recommend using Anaconda for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.6 or above.

    For a quick start you can move straight to installing Stable-Baselines3 in the next step.

    Trying to create Atari environments may result to vague errors related to missing DLL files and modules. This is an issue with atari-py package. See this discussion for more information.

    Stable Release

    To install Stable Baselines3 with pip, execute:

    pip install stable-baselines3[extra] 

    Some shells such as Zsh require quotation marks around brackets, i.e. pip install ‘stable-baselines3[extra]’ More information.

    This includes an optional dependencies like Tensorboard, OpenCV or ale-py to train on atari games. If you do not need those, you can use:

    pip install stable-baselines3

    If you need to work with OpenCV on a machine without a X-server (for instance inside a docker image), you will need to install opencv-python-headless , see issue #298.

    Bleeding-edge version

    pip install git+
    pip install "stable_baselines3[extra,tests,docs] @ git+" 

    Development version

    To contribute to Stable-Baselines3, with support for running tests and building the documentation.

    git clone && cd stable-baselines3 pip install -e .[docs,tests,extra] 

    Using Docker Images

    If you are looking for docker images with stable-baselines already installed in it, we recommend using images from RL Baselines3 Zoo.

    Otherwise, the following images contained all the dependencies for stable-baselines3 but not the stable-baselines3 package itself. They are made for development.

    Use Built Images

    docker pull stablebaselines/stable-baselines3
    docker pull stablebaselines/stable-baselines3-cpu

    Build the Docker Images

    Build GPU image (with nvidia-docker):

    Note: if you are using a proxy, you need to pass extra params during build and do some tweaks:

    --network=host --build-arg HTTP_PROXY= --build-arg http_proxy= --build-arg HTTPS_PROXY= --build-arg https_proxy=

    Run the images (CPU/GPU)

    Run the nvidia-docker GPU image

    docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/home/mamba/stable-baselines3,type=bind stablebaselines/stable-baselines3 bash -c 'cd /home/mamba/stable-baselines3/ && pytest tests/' 
    ./scripts/ pytest tests/
    docker run -it --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/home/mamba/stable-baselines3,type=bind stablebaselines/stable-baselines3-cpu bash -c 'cd /home/mamba/stable-baselines3/ && pytest tests/' 
    ./scripts/ pytest tests/

    Explanation of the docker command:

    • docker run -it create an instance of an image (=container), and run it interactively (so ctrl+c will work)
    • —rm option means to remove the container once it exits/stops (otherwise, you will have to use docker rm )
    • —network host don’t use network isolation, this allow to use tensorboard/visdom on host machine
    • —ipc=host Use the host system’s IPC namespace. IPC (POSIX/SysV IPC) namespace provides separation of named shared memory segments, semaphores and message queues.
    • —name test give explicitly the name test to the container, otherwise it will be assigned a random name
    • —mount src=. give access of the local directory ( pwd command) to the container (it will be map to /home/mamba/stable-baselines ), so all the logs created in the container in this folder will be kept
    • bash -c ‘. ‘ Run command inside the docker image, here run the tests ( pytest tests/ )

