Stable baselines 3 python

Содержание

Stable-Baselines3 Docs — Reliable Reinforcement Learning Implementations
Main Features
Getting Started
Installation
Stable Release
Bleeding-edge version
Development version
Using Docker Images
Use Built Images
Build the Docker Images
Run the images (CPU/GPU)

Stable-Baselines3 Docs — Reliable Reinforcement Learning Implementations

Stable Baselines3 (SB3) is a set of reliable implementations of reinforcement learning algorithms in PyTorch. It is the next major version of Stable Baselines.

RL Baselines3 Zoo provides a collection of pre-trained agents, scripts for training, evaluating agents, tuning hyperparameters, plotting results and recording videos.

Main Features

Unified structure for all algorithms
PEP8 compliant (unified code style)
Documented functions and classes
Tests, high code coverage and type hints
Clean code
Tensorboard support
The performance of each algorithm was tested (see Results section in their respective page)

Installation
- Prerequisites
- Bleeding-edge version
- Development version
- Using Docker Images
- General advice when using Reinforcement Learning
- Which algorithm should I use?
- Tips and Tricks when creating a custom environment
- Tips and Tricks when implementing an RL algorithm
- Reproducibility
- Try it online with Colab Notebooks!
- Basic Usage: Training, Saving, Loading
- Multiprocessing: Unleashing the Power of Vectorized Environments
- Multiprocessing with off-policy algorithms
- Dict Observations
- Callbacks: Monitoring Training
- Callbacks: Evaluate Agent Performance
- Atari Games
- PyBullet: Normalizing input features
- Hindsight Experience Replay (HER)
- Learning Rate Schedule
- Advanced Saving and Loading
- Accessing and modifying model parameters
- SB3 and ProcgenEnv
- SB3 with EnvPool or Isaac Gym
- Record a Video
- Bonus: Make a GIF of a Trained Agent
- VecEnv API vs Gym API
- Vectorized Environments Wrappers
- VecEnv
- DummyVecEnv
- SubprocVecEnv
- Wrappers
- SB3 Policy
- Default Network Architecture
- Custom Network Architecture
- Custom Feature Extractor
- Multiple Inputs and Dictionary Observations
- On-Policy Algorithms
- Off-Policy Algorithms
- Custom Callback
- Event Callback
- Callback Collection
- Basic Usage
- Logging More Values
- Logging Images
- Logging Figures/Plots
- Logging Videos
- Logging Hyperparameters
- Directly Accessing The Summary Writer
- Weights & Biases
- Hugging Face 🤗
- MLFLow
- Installation
- Train an Agent
- Enjoy a Trained Agent
- Hyperparameter Optimization
- Colab Notebook: Try it Online!
- Why create this repository?
- Overview
- How to migrate?
- Breaking Changes
- New Features (SB3 vs SB2)
- How and why?
- Anomaly detection with PyTorch
- Numpy parameters
- VecCheckNan Wrapper
- RL Model hyperparameters
- Missing values from datasets
- Algorithms Structure
- Where to start?
- Pre-Processing
- Policy Structure
- Probability distributions
- State-Dependent Exploration
- Misc
- Zip-archive
- Background
- Export to ONNX
- Trace/Export to C++
- Export to tensorflowjs / ONNX-JS
- Export to TFLite / Coral (Edge TPU)
- Manual export
Источник

Getting Started

Stable-Baselines3 (SB3) uses vectorized environments (VecEnv) internally. Please read the associated section to learn more about its features and differences compared to a single Gym environment.

Most of the library tries to follow a sklearn-like syntax for the Reinforcement Learning algorithms.

Here is a quick example of how to train and run A2C on a CartPole environment:
```
import gymnasium as gym from stable_baselines3 import A2C env = gym.make("CartPole-v1", render_mode="rgb_array") model = A2C("MlpPolicy", env, verbose=1) model.learn(total_timesteps=10_000) vec_env = model.get_env() obs = vec_env.reset() for i in range(1000): action, _state = model.predict(obs, deterministic=True) obs, reward, done, info = vec_env.step(action) vec_env.render("human") # VecEnv resets automatically # if done: # obs = vec_env.reset() 
```
You can find explanations about the logger output and names in the Logger section.

Or just train a model with a one line if the environment is registered in Gymnasium and if the policy is registered:
```
from stable_baselines3 import A2C model = A2C("MlpPolicy", "CartPole-v1").learn(10000) 
```
© Copyright 2022, Stable Baselines3. Revision ba77dd7c .

Источник

Installation

We recommend using Anaconda for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.6 or above.

For a quick start you can move straight to installing Stable-Baselines3 in the next step.

Trying to create Atari environments may result to vague errors related to missing DLL files and modules. This is an issue with atari-py package. See this discussion for more information.

Stable Release

To install Stable Baselines3 with pip, execute:
```
pip install stable-baselines3[extra] 
```
Some shells such as Zsh require quotation marks around brackets, i.e. pip install ‘stable-baselines3[extra]’ More information.

This includes an optional dependencies like Tensorboard, OpenCV or ale-py to train on atari games. If you do not need those, you can use:
```
pip install stable-baselines3
```
If you need to work with OpenCV on a machine without a X-server (for instance inside a docker image), you will need to install opencv-python-headless , see issue #298.

Bleeding-edge version
```
pip install git+https://github.com/DLR-RM/stable-baselines3
```
```
pip install "stable_baselines3[extra,tests,docs] @ git+https://github.com/DLR-RM/stable-baselines3" 
```
Development version

To contribute to Stable-Baselines3, with support for running tests and building the documentation.
```
git clone https://github.com/DLR-RM/stable-baselines3 && cd stable-baselines3 pip install -e .[docs,tests,extra] 
```
Using Docker Images

If you are looking for docker images with stable-baselines already installed in it, we recommend using images from RL Baselines3 Zoo.

Otherwise, the following images contained all the dependencies for stable-baselines3 but not the stable-baselines3 package itself. They are made for development.

Use Built Images
```
docker pull stablebaselines/stable-baselines3
```
```
docker pull stablebaselines/stable-baselines3-cpu
```
Build the Docker Images

Build GPU image (with nvidia-docker):

Note: if you are using a proxy, you need to pass extra params during build and do some tweaks:
```
--network=host --build-arg HTTP_PROXY=http://your.proxy.fr:8080/ --build-arg http_proxy=http://your.proxy.fr:8080/ --build-arg HTTPS_PROXY=https://your.proxy.fr:8080/ --build-arg https_proxy=https://your.proxy.fr:8080/
```
Run the images (CPU/GPU)

Run the nvidia-docker GPU image
```
docker run -it --runtime=nvidia --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/home/mamba/stable-baselines3,type=bind stablebaselines/stable-baselines3 bash -c 'cd /home/mamba/stable-baselines3/ && pytest tests/' 
```
```
./scripts/run_docker_gpu.sh pytest tests/
```
```
docker run -it --rm --network host --ipc=host --name test --mount src="$(pwd)",target=/home/mamba/stable-baselines3,type=bind stablebaselines/stable-baselines3-cpu bash -c 'cd /home/mamba/stable-baselines3/ && pytest tests/' 
```
```
./scripts/run_docker_cpu.sh pytest tests/
```
Explanation of the docker command:
- docker run -it create an instance of an image (=container), and run it interactively (so ctrl+c will work)
- —rm option means to remove the container once it exits/stops (otherwise, you will have to use docker rm )
- —network host don’t use network isolation, this allow to use tensorboard/visdom on host machine
- —ipc=host Use the host system’s IPC namespace. IPC (POSIX/SysV IPC) namespace provides separation of named shared memory segments, semaphores and message queues.
- —name test give explicitly the name test to the container, otherwise it will be assigned a random name
- —mount src=. give access of the local directory ( pwd command) to the container (it will be map to /home/mamba/stable-baselines ), so all the logs created in the container in this folder will be kept
- bash -c ‘. ‘ Run command inside the docker image, here run the tests ( pytest tests/ )
© Copyright 2022, Stable Baselines3. Revision ba77dd7c .

Источник

Читайте также: Javascript отсчет времени до времени

Stable baselines 3 python

Stable-Baselines3 Docs — Reliable Reinforcement Learning Implementations

Main Features

Getting Started

Installation

Stable Release

Bleeding-edge version

Development version

Using Docker Images

Use Built Images

Build the Docker Images

Run the images (CPU/GPU)