Recently, I have been working on a method to fingerprint Sudoku puzzles. Although canonicalization can find puzzles that are mathematically equivalent, it is a slow process, not suited for scanning large collections of puzzles. Instead, I am taking a fingerprint of a puzzle, which does not identify puzzles with 100% certainty, but it can be used as a triage before using the time-consuming canonicalization routines.

Fingerprinting only examines the candidate space in the initial grid:

For each digit:

- Count the number of eliminated candidates

- Count the number of surplus eliminations for all candidates (when a cell can see multiple peers with the same digit and/or another digit is placed in the cell as a given)

Sort these counts in descending order

Also count the number of cells with 1 candidate, 2 candidates, ..., 9 candidates left.

I've used Gordon's collection of minimum Sudokus and it contains 474 twins and 4 triplets. (2 or 3 puzzles with the same fingerprint)

Here are 2 puzzles with the fingerprint 57:17,49:6,48:7,48:7,48:7,45:10,43:12,33:3,30:6,1:1:7:13:22:15:5:0:0.

- Code: Select all
`5 . .|. 3 .|. 1 . 6 . .|. 3 .|. 1 .`

. . 9|8 . .|. . . . . 5|9 . .|. . .

. . .|. . .|. . . . . .|. . .|. . .

-----+-----+----- -----+-----+-----

1 . .|. . .|. 3 . 1 . .|. . .|. 3 .

. . .|2 . 8|. . . . . .|2 . 9|. . .

. . .|6 . .|. . . . . .|7 . .|. . .

-----+-----+----- -----+-----+-----

6 4 .|. 7 .|. . . 7 4 .|. 8 .|. . .

. . .|. . .|9 . 2 . . .|. . .|2 . 5

7 . .|. . .|8 . . 8 . .|. . .|9 . .

The distribution of the candidates is the same, but after relabeling the second puzzle, the true similarities are revealed:

- Code: Select all
`5 . .|. 3 .|. 1 . 5 . .|. 3 .|. 1 .`

. . 9|8 . .|. . . . . 9|8 . .|. . .

. . .|. . .|. . . . . .|. . .|. . .

-----+-----+----- -----+-----+-----

1 . .|. . .|. 3 . 1 . .|. . .|. 3 .

. . .|2 . 8|. . . . . .|2 . 8|. . .

. . .|6 . .|. . . . . .|6 . .|. . .

-----+-----+----- -----+-----+-----

6 4 .|. 7 .|. . . 6 4 .|. 7 .|. . .

. . .|. . .|9 . 2 . . .|. . .|2 . 9

7 . .|. . .|8 . . 7 . .|. . .|8 . .

The only difference is that 2 and 9 are swapped in row 7.

The solutions are also strikingly similar:

- Code: Select all
`5 2 6|7 3 9|4 1 8 5 2 6|7 3 9|4 1 8`

4 1 9|8 6 5|7 2 3 4 1 9|8 6 5|3 2 7

8 3 7|1 2 4|6 9 5 8 3 7|1 2 4|5 9 6

-----+-----+----- -----+-----+-----

1 8 4|5 9 7|2 3 6 1 8 4|5 9 7|6 3 2

9 6 3|2 1 8|5 4 7 9 6 3|2 1 8|7 4 5

2 7 5|6 4 3|1 8 9 2 7 5|6 4 3|9 8 1

-----+-----+----- -----+-----+-----

6 4 8|9 7 2|3 5 1 6 4 8|9 7 2|1 5 3

3 5 1|4 8 6|9 7 2 3 5 1|4 8 6|2 7 9

7 9 2|3 5 1|8 6 4 7 9 2|3 5 1|8 6 4

Only r2345678c7 and r2345678c9 have been swapped.

The solving paths are different because the second puzzle has an extra hidden pair (2,8) in row 1.

It would be interesting to study twins a little further. What would be the upper and lower boundaries of changed digits in the solution when we swap 2 givens? Would it be possible to find a twin that not only has the same fingerprint, but also has exactly the same solving path?

What is the maximum difference between 2 puzzles with the same fingerprint, in terms of swapped digits and/or different placements?

I will try to find some answers from the remaining twins in Gordon's collection.

Ruud