A number of times in feedbacks I've discussed the difference, or utility, of a brute force approach to solving Sudoku versus logical strategies. By brute force I mean any back-tracking method that can fill a puzzle board and derive all possible solutions. The Solution Count on the solver is an example. The rest of this site is devoted to logical strategies.
I made a decision early on when exploring Sudoku that I was interested in logical strategies since they told me how to solve a puzzle using steps that showed why each cell could be solved in sequence just from the logical consequences of having clues. This felt much more satisfying than guessing, trial and error and simply placing numbers by intuition. However, I agree that intuition is one way a human can beat a computer: it is not to be dismissed.
Brute force is the domain of computers and people quickly developed optimal - or near optimal - ways of demolishing a puzzle by back-tracking. This is a proof by exhaustion since numbers are inserted in sequence, 1 to 9, top-left to bottom right until all ways of satisfying the rules are shown. The advantage is that you can also count the number of solutions, so partial or faulty Sudoku puzzles do not phase a brute-force algorithm. Which is why I employ one to check puzzles in the solver.
I don't pretend to know all Sudoku strategies and those I get stuck on are always solved by clever people in the Weekly Unsolvables providing me with new clues. This is why Sudoku is so fascinating. However, there is another aspect to the two solving methods that pertains to programmers, especially optimisers.
The diagram on the right is a crude impression of how brute-force stacks up against logic when the number of clues is taken into account. Simply put, brute force is much quicker when there are many clues - not surprising - whereas it is known that clue density does not - in general - affect the grade or difficulty of a puzzle. You can have very easy low clue puzzles and very hard high clue ones - although the bias to 'hard' is usually with fewer clues.
In a low-clue puzzle there is more to crunch through for a logical solver, but it can collapse quickly. Some strategies require an awful lot of searching to identify patterns, or are not yet optimised, so it depends on if they are needed. But generally the time to solve is flat against clue density.
However, brute force quickly escalates in time cost when clues are low. There are two factors. Density, just discussed, but also the orientation of the puzzle. My Solution Count gives a number which is the number of recursive calls made. However, since the brute force always starts in the top left, if the clues are stacked towards the top left, it will find the solutions more quickly. You can test this by rotating the puzzle and trying Solution Count.
For very extreme puzzles like this Jigsaw, I have never run the Solution Count to completion. It could take hours, or years? I don't know.
If your aim is to crunch through a set of puzzles in record time, then a mixture of both approaches will be optimal. Using just the basic strategies to start a puzzle and then back-tracking if/when this fails does work well. But I prefer to keep aside failures to inspire new logical strategies since they generate new knowledge.
A Brute Force Algorithm
The key is to express the rows, columns and boxes as bit arrays, which can be done in C or C++. At this point I'd like to credit G.M. Boynton who posted an algorithm way back in June 2005 (although I can't remember the language and if I ported it to bit arrays). We don't need to scan along a row or column to test if a number is present, we just need to know IF a number is present anywhere in the row, column or box.
unsigned short r; // bit array to signal presence of 1 to 9 on a row
unsigned short c; // bit array to signal presence of 1 to 9 on a columns
unsigned short b; // bit array to signal presence of 1 to 9 in a box
unsigned short vals;// clues and solutions of each cell
As well as being called in the main function to insert the next number, this function can be used to set up the initial values from the clues. We shift 1 by 'n' bits for whatever row, column and box it is a member of.
void put_number_back( sudoku *s, int y, int x, int n )
s->r[y] += (1 << n);
s->c[x] += (1 << n);
s->b[s->boxnum[y][x]-1] += (1 << n);
s->vals[y][x] = n+1;
Imagine you are starting in A1 which is empty and you want to insert 3. If there is a 3 clue already down in J1 then the flag s->c (first column) will contain a 1 at the third bit (1 shifted 2 bits). You can skip placing a 3 there and go onto 4.
bool bRecSlv( Sudoku s, int depth, unsigned int *solutions )
register unsigned short x,y, n;
for(x=0;x<9;x++) // For each cell
if( s.vals[y][x] == 0 ) // that is still empty
for(n=0;n<9;n++) // For each possible number:
// if any bit mask is set, skip n
if( s.r[y] & (1 << n) ) continue;
if( s.c[x] & (1 << n) ) continue;
if( s.b[s.boxnum[y][x]-1] & (1 << n) ) continue;
// Valid to place n in this cell
put_number_back( &s, y, x, n );
// and call this function recursively!
if( bRecSlv(s,depth+1,solutions) )
return true; // If solution found AND no more solutions required,
// then exit recursive function
// take_off_number( &s, y, x, n ); // no need to zero n
// No empty cells, must be a solution. Return true if no more required
Note: We pass the whole "sudoku" object into the function, not a pointer to it. When we recurse a copy is made. When the branching is finished and we back out, the previous states are already in memory and in order.
Article created on 27-November-2019. Views: 10328 This page was last modified on 27-November-2019. All text is copyright and for personal use only but may be reproduced with the permission of the author.
Copyright Andrew Stuart @ Syndicated Puzzles, 2007-2023
... by: jobowoThe brute force algorithm suggested here can be, I believe, improved a little by an initial scan similar to the one that a paper and pencil solver will attempt, arriving at a list of empty cells and the only possible n values for those cells. Sorting by n and beginning with the cell(s) with the least n, one then begins the trial and error process (saving, as suggested, the current values so that one can return to this point if that value 'fails') and try with the next value.
With my humble programming skills, and using PHP (scarcely the fastest language), my solver can solve even the unsolveables here in less than 0.2 seconds.
But the actual point of my post here is the curious fact is that there appears to be little relationship between the "human" rating (which is presumably related to the intelligent algorithms offered here) and the brute force rating measured as time to solve. Quite a few puzzles rated "moderate" on the web take far longer to solve with this algorithm than those rated "unsolveable" or "hardest". Arto's scores in the middle of the pack of about 100 puzzles that I solve, in total, in about 4 seconds.
Some explanations are evident. If there are two cells to begin the operation that have n=2, I simply grab whichever SORT offers me first; I do something similar with the values: if 1,3,5 are possible values I pick them in numerical order.