# Sudoku RL

Solving sudoku puzzle with Reinforcement Learning

Methods:
- Deep Q-Learning
- ~~Deep Policy Gradient~~ (dropped)

Dateset:
- https://www.kaggle.com/datasets/radcliffe/3-million-sudoku-puzzles-with-ratings
  - General training and validation dataset with sudoku puzzles
  - The minimum number of clues in the dataset is 19, and the maximum is 31
  - Difficulty is calculated based on average search depth from 10 solver attempts
  - 43% of puzzles have a zero rating, solvable by scanning
- https://www.sudokuwiki.org/sudoku.htm
  - Test dataset with handmade puzzles
  - Each puzzle require to know specific strategy to solve it
  - Some puzzles unsolvable even with Extreme strategies

## Setup
We use python 3.9

Virtual environment, if needed
```bash
python3 -m venv .venv
source .venv/bin/activate
```

Dependencies
```bash
pip install -r requirements.txt
```

## Running

For inference run
```bash 
PYTHONPATH=. python deep_q_learning/inference.py
```
You can set your puzzle via `--puzzle` and `--solution` options, e.g.
```bash
PYTHONPATH=. python deep_q_learning/inference.py --puzzle 3..967..1.4.3.2.8..2.....7..7.....9....873...5...1...3..47.51..9.5...2.78..621..4 --solution 358967421741352689629184375173546892492873516586219743264795138915438267837621954
```

For evaluation run
```bash 
PYTHONPATH=. python deep_q_learning/evaluate.py
```


## Training

Firstly, you need to download dataset from kaggle and use split.py.

For simple Q-learning algorithm
```bash 
PYTHONPATH=. python q_learning/main.py
```

For our implementation run
```bash 
PYTHONPATH=. python deep_q_learning/main.py
```

For stable-baselines3 implementation run
```bash 
PYTHONPATH=. python deep_q_learning/baseline.py
```