Border is a reinforcement learning library in Rust.
Border is currently under development.
In order to run examples, install python>=3.7 and gym, for which the library provides a wrapper using PyO3.
The following command runs a random controller (policy) for 5 episodes in CartPole-v0:
bash
$ cargo run --example random_cartpole
It renders during the episodes and generates a csv file in examples/model
, including the sequences of observation and reward values in the episodes.
bash
$ head -n3 examples/model/random_cartpole_eval.csv
0,0,1.0,-0.012616985477507114,0.19292789697647095,0.04204097390174866,-0.2809212803840637
0,1,1.0,-0.008758427575230598,-0.0027677505277097225,0.036422546952962875,0.024719225242733955
0,2,1.0,-0.008813782595098019,-0.1983925849199295,0.036916933953762054,0.3286677300930023
The following command trains a DQN agent:
bash
$ cargo run --example dqn_cartpole
After training, the trained agent runs for 5 episodes. The parameters of the trained Q-network (and the target network) are saved in examples/model/dqn_cartpole
.
The following command trains a SAC agent on Pendulum-v0, which takes continuous action:
bash
$ cargo run --example sac_pendulum
The code defines an action filter that doubles the torque in the environment.
The following command trains a DQN agent on PongNoFrameskip-v4:
bash
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4
During training, the program will save the model parameters when the evaluation reward achieves its maximum value. The agent can be trained for other atari games (e.g., SeaquestNoFrameskip-v4
) by replacing the name of the environment in the above command.
For Pong, you can download a pretrained agent from my google drive and see how it plays with the following command:
bash
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_atari -- PongNoFrameskip-v4 --play-gdrive
The pretrained agent will be saved locally in $HOME/.border/model
.
(The code might be broken due to recent changes. It will be fixed in future. The below description is for an older version)
The following command trains a DQN agent in an vectorized environment of Pong:
bash
$ PYTHONPATH=$REPO/examples cargo run --release --example dqn_pong_vecenv
The code demonstrates how to use vectorized environments, in which 4 environments are running synchronously. It took about 11 hours for 2M steps (8M transition samples) on a g3s.xlarge
instance of EC2. Hyperparameter values, tuned specific to Pong instead of all Atari games, are adapted from the book Deep Reinforcement Learning Hands-On. The learning curve is as shown below.
After the training, you can see how the agent plays:
bash
$ PYTHONPATH=$REPO/examples cargo run --example dqn_pong_eval
Border is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0).