As of the 1st of October 2025 I'm adopting a new scoring and grading system for Sudoku and variants. This is a potentially controversial topic so I though a page to explain it would be worth while. For four fifths of puzzles, roughly, the grade will be the same. This is not a new ball park and on a large scale a small difference. But on a case-by-case basis I think the new is more justified. What I'm hoping for is greater alignment with human solver's gut feel for a puzzle.
The Old Scheme
When I first thought about making puzzles I considered two aspects as being important. The strategies necessary to logically solve a puzzle and how many opportunities there were to spot a pattern or find a solution. The former is relatively easy to appreciate - there is a full spectrum of puzzle difficulty if you go beyond guessing. The second aspect is more subtle. Some puzzles collapse quite quickly. Others you have to chip away at candidate by candidate until you can break through, often multiple times.
In the old scheme I considered "rounds" which were based on finding solutions to cells. A new round started when you solved one or more cells. I created a weighting factor based on how strung out the solve path was. Each strategy used scored a number of points. I had points for candidates eliminated and extra points of these led to a solved cell. The sum of these points was multiplied by the weighting factor.
I also introduced a number of 'heuristics' – filtering rules which removed puzzles from the stock pool or promoted or demoted a puzzle by a grade. The most important one was recognising easy puzzles with a very hard bottleneck. The scoring system might give a puzzle a "tough" grade but it is trivial except for one or two 'diabolical' strategies in the middle. These are frustrating for puzzle solvers and we want to avoid them. Some puzzles might rack up a score with a dozen or so Pointing Pairs, say, but these are 'moderate' strategies.
Candidate Density
For the most part the old scheme was relatively successful. Although in the early years Fish strategies were over scored and boosted the difficulty rating. A Sword-Fish could eliminate a lot of candidates in one go, for example. I dialled back the scoring for these some years ago. But the other problem was that very hard puzzles that needed a lot of chains under scored. This didn't matter too much as most such puzzles ended up in the 'Extreme' category and didn't used in newspapers. These are interesting puzzles as they are the coal face for Sudoku theory and some people love to solve them.
The weighting factor based on 'rounds' has become difficult to maintain with so many puzzle variants. My recent insight in assessing the grading system is to look at candidate density - the total number of candidates on the board at any one step. More unsolved cells means more candidates to search for a pattern. The harder puzzles have more candidates per cell than simpler ones. This seems an ideal measure of the board.
This table is a typical solve path with the number of candidates from start to end.
Step
Candidates C
C / 727 * 5
C / 727 * 20
1
210
1.4
5.8
On a 9x9 board there are 727 total candidate slots. For Killer Sudoku all 727 are in play at the start. I create two factors by multiplying the fraction C/720 by 5 and by 20. The first factor is applied to all Naked and Hidden Singles. The second factor is applied to all other strategies. We assume that Singles are common and easy and don't warrant much of a contribution to the total score.
In the old scheme points were awarded for eliminations + extra points for solving a cell. I have come round to the idea that if you have found a pattern then it doesn't really matter how many candidates are removed. Dropping this notion is one of the big changes in the new scheme and it should avoid fruitful strategies inflating a grade.
Now the sum of the scores for each step is the puzzle score.
For vanilla Sudoku this gives a score from anywhere between 20 and 12,000+. To reduce any fixation on too many decimal places I normalize the score to a number between 1 and 10 with a log function.
Log5 (score) * 2
Currently the division of the spectrum of puzzles into the named grades is as follows - and might change in the near future. Most randomly produced puzzles will be easy with extremes being the fewest.
Puzzle
Kids
Gentle
Moderate
Tough
Diabolical
Extreme
Sudoku
< 3
3 to < 4
4 to < 6
6 to < 7
7 to < 9
9+
Sudoku X
< 3
3 to <4.5
4.5 to < 7
7 to < 8
8 to < 10
10+
Jigsaw Sudoku
< 2
2 to < 7
7 to < 8.4
8.4 to < 9.1
9.1 to < 11
11+
Killer Sudoku
x
<6.4
6.4 to < 6.6
6.6 to < 7
7 to < 8
8+
KenKen 6x6
x
<6.4
6.4 to < 6.6
6.6 to < 7
7 to < 8
8+
The current distribution of normal Sudoku in my stock table
I have grouped strategies on the solver into 'tough', 'diabolical' and 'extreme' but it will be appreciated that the use of a particular strategy does not immediately make that puzzle belong to the grade the strategy is grouped in. I do try and filter away puzzles the score oddly and in the majority of cases the hardest strategy will be in the same group as the grade, but not always. I believe the new scheme pushes the spectrum further towards matching.
Comments and suggestions as ever always appreciated!
Many thanks to Andy Potvin for our discussions on this topic.