# Rust Reinforcement Learning Framework A modular and extensible reinforcement learning framework written in Rust, supporting algorithms such as Q-Learning, SARSA, MCTS, AlphaZero, and MuZero. This project allows benchmarking across different environments like CartPole and MountainCar with a unified interface. --- ## 📘 Description of the Algorithms and Theory This project implements the following reinforcement learning (RL) algorithms: (for more information, see RLbook: http://incompleteideas.net/book/the-book.html or AlphaZero paper: https://arxiv.org/abs/1911.08265) - **Q-Learning**: A model-free off-policy RL algorithm that learns the value of action-state pairs (Q-values) using the Bellman equation. It chooses actions greedily based on learned Q-values. - **SARSA**: An on-policy variant of Q-Learning. It updates its Q-values using the action actually taken, rather than the maximum possible next action. (In contrast to Q-Learning, which always assumes the agent will act optimally.) - **Monte Carlo Tree Search (MCTS)**: A search-based planning algorithm using tree expansion guided by simulations. It builds a tree using selection, expansion, simulation, and backpropagation to evaluate actions in large state spaces. Not a learning al - **AlphaZero**: A deep RL algorithm that integrates MCTS with a neural network-based value and policy model, trained from self-play, improving over time with gradient-based learning. The original AlphaZero uses a shared network with a value and a policy head. The policy head is used inside the MCTS to give priority to better nodes without needing to explore them. In my experiments this didn't lead to good results, so I just use the value head. - **MuZero**: An extension of AlphaZero that does not require a model of the environment's dynamics. Instead, it learns a latent space representation and internal transition model from raw observations, combining planning with model learning. --- ## 🧩 Description of the Implementation The project is structured with modularity and scalability in mind. Major modules include: - `environment`: Trait-based interface for RL environments (`CartPole`, `MountainCar`) that handle step logic and state transitions. - `agent`: Abstractions and concrete implementations for all agent types. - `logger`: Handles training logs, reward tracking, data export, and plotting. - `rl_agent`: Shared agent behavior like training, benchmarking, and evaluation. - `qlearning`, `sarsa`, `mcts`, `alphazero`, `muzero`: Algorithm-specific logic and data structures. - `main.rs`: Entry point handling CLI arguments, initializing environments/agents, and managing training and evaluation. Each agent conforms to a common `Agent` trait, ensuring interchangeable benchmarking and training pipelines. --- ## Environments The framework currently mainly works on `CartPole` .`MountainCar` is also supported but the algorithms usually fail to learn any meaningful policy. The `CartPole` environment is a classic control problem where the goal is to balance a pole on a cart by moving the cart left or right. --- ## Results Results were obtained by running each agent until convergence on the `CartPole` environment. Only runs where the reward function improved were The average rewards were calculated over 10 runs for each agent. | Agent | Avg Reward (10 runs) | | --------- | -------------------- | | QLearning | 1011.2 | | SARSA | 953.8 | | MCTS | 921.3 | | AlphaZero | 1192.0 | | MuZero | 923.0 | ## Example plots --- ## ⚙️ Installation and Startup Instructions **Prerequisites:** - Rust (edition 2021 or newer) **Example running the application:** ```bash cargo run -r -p mcrs2 -- sarsa --episodes 2000 ``` **CLI help:** ```bash cargo run -r -p mcrs2 -- --help ```