There seem to be two main ways of solving sudoku puzzles by computer: enlightened trial and error, and logical strategies where every step can be proven, much like the technique human solvers use. These comments are in respect to the second method.
The basis for many sudoku solver programs is to eliminate candidates from a 3-dimensional array, 9x9x9. Two of the axes are the familiar sudoku grid, column by row, with the additional 9 boxes overlaid on the columns and rows. The third axis is eligible digit value. Values in the candidate array are 1 for a valid candidate, 0 for an invalid one.
You'll have to visualize the 9 potential digit possibilities strung out in the third dimension. The solution grid as pictured, is separate from the candidate array. When a cell in the solution grid is solved, all the digits for that column and row in the candidate array are set to zero.
The most basic first step in any solution is to examine every set in every dimension (column, row, box, cell) to look for single occurrences (Singles). These sets are 1x9, one dimensional arrays:
When these sets are in the X or Y axis, they are columns, rows and boxes. The boxes can be transformed, if desired, into a 1x9 array by concatenating the three 1x3 segments. A set in the z axis is composed of digits. In all cases, a unique value of 1-9 is to be determined. When there is only one digit left in a set, all other digit candidates in that cell are invalid and can be removed - a "hidden" single. When there is only one cell left in a set, the only remaining digit is assigned - a "naked" single. This closely mimics manual techniques.
The next step has been to look for double, triple and quad (DTQ) structures as we do manually, in the hope that they might provide some candidate deletions. A further basic technique is to find X-wing, Swordfish or Jellyfish (XSJ) patterns for the same reasons. Excellent descriptions of these configurations can be found here.
I argue here that these patterns in DTQ and XSJ are identical, just in different planes of the candidate matrix. Furthermore, since we are solving by computer, there should be no excitement about finding a quad or a jellyfish, but only about finding candidate deletions as a result of one of these structures or patterns. Just as there are four sets in which to look for singles, so are there four planes in which to look for what I've chosen to call Multiples - what have been called doubles, triples and quads, both hidden and naked and X-wing, swordfish and jellyfish, both vertical and horizontal orientation. A diagram for these planes would look like this:
The planes are digit x cell #, from a particular column, row or box (after concatenation of the 3 fragments of the box). These are the planes that have produced the DTQ structures. The fourth plane is column x row, which has produced the XSJ structures for a particular digit. In order to convince you that these structures are identical, I find it best to start with a generalized diagram about the latter plane for a single digit (swordfish) from sudokuwiki:
The concept embodied here is critical for understanding my multiples concept. What is displayed is the idea that to find a swordfish for a given digit, three columns must be identified that in total intersect three of the rows. The same holds for 2 columns (x-wing) and 4 columns (jellyfish). As indicated in the diagram, the occurrence of more of the same digit in the same row - but not in the same column - can be deleted. Let�s now point out that this diagram deals with columns and rows.
If we were to change those axes to digits (as the columns) and cell # (as the rows), we�d have the perfect definition for a hidden triple. That is, 3 digits (columns) occupy only 3 cells (rows) in total, therefore other digits in those cells can be deleted.
There is just one more facet of this concept: orientation. We speak of hidden and naked in the DTQ family and vertical and horizontal in the XSJ family. The latter is easy to understand. It�s also not a great stretch to see that, after finding this horizontal orientation swordfish, if we were to transpose the array above (swap axes, a trivial exercise), we could examine for vertical orientation patterns using the same code as we used for finding the horizontal one - because we�ve swapped horizontal and vertical in the array. The same holds for the difference between hidden and naked DTQs, although it�s not so obvious. If you consider that �hidden� patterns depend on x digits in only x cells and then that naked patterns depend on x cells containing only x digits, it becomes a little easier.
[Note: Box/Line interaction is handled separately.]
I conclude that a single algorithm can be constructed so that it can search each plane described above for deletes that may be justified by one or more multiples and I've written elsewhere about what I've termed a Related Sets algorithm.
The Related Sets algorithm uses a work matrix such as this one which would find a hidden double - two columns (1,3) with two values that only intersect two rows in total (1,6), allowing a deletion of digit 2 in the first cell of the set:
The 9x9 input data arrays come from the 4 sources described above: columns, rows and boxes and digits. With the exception of the box data, these are simple slices from the array. In all cases, the data can be mapped back to the candidate co-ordinates for justified deletion. An audit trail is available that can produce the traditional structure names with exact details.
The routine would look in the array for structures based on qty 2, 3 and 4, in both orientations and eliminate the need for maintaining much existing code.