# Sudoku RL Solving sudoku puzzle with Reinforcement Learning Methods: - Deep Q-Learning - ~~Deep Policy Gradient~~ (dropped) Dateset: - https://www.kaggle.com/datasets/radcliffe/3-million-sudoku-puzzles-with-ratings - General training and validation dataset with sudoku puzzles - The minimum number of clues in the dataset is 19, and the maximum is 31 - Difficulty is calculated based on average search depth from 10 solver attempts - 43% of puzzles have a zero rating, solvable by scanning - https://www.sudokuwiki.org/sudoku.htm - Test dataset with handmade puzzles - Each puzzle require to know specific strategy to solve it - Some puzzles unsolvable even with Extreme strategies ## Setup We use python 3.9 Virtual environment, if needed ```bash python3 -m venv .venv source .venv/bin/activate ``` Dependencies ```bash pip install -r requirements.txt ``` ## Running For inference run ```bash PYTHONPATH=. python deep_q_learning/inference.py ``` You can set your puzzle via `--puzzle` and `--solution` options, e.g. ```bash PYTHONPATH=. python deep_q_learning/inference.py --puzzle 3..967..1.4.3.2.8..2.....7..7.....9....873...5...1...3..47.51..9.5...2.78..621..4 --solution 358967421741352689629184375173546892492873516586219743264795138915438267837621954 ``` For evaluation run ```bash PYTHONPATH=. python deep_q_learning/evaluate.py ``` ## Training Firstly, you need to download dataset from kaggle and use split.py. For simple Q-learning algorithm ```bash PYTHONPATH=. python q_learning/main.py ``` For our implementation run ```bash PYTHONPATH=. python deep_q_learning/main.py ``` For stable-baselines3 implementation run ```bash PYTHONPATH=. python deep_q_learning/baseline.py ```