One of the side effects of the puzzle minimization algorithms
discussed here is the possibility to generate all puzzles for a subgrid or even (theoretically) for entire solution grid.
The algorithm works on a fixed grid and accepts a set of forced non-givens. If the non-givens are empty, then the entire grid is enumerated. For non-empty non-givens the givens are minimized so that all minimal puzzles are generated.
The algorithm is based on the concept that all unavoidable sets within the grid must be hit with at least one clue (
Gary McGuire), and further on concept that generation of duplicate puzzles can be discarded by marking the cells that have already been processed within the given context as "dead clues". AFAIK the "dead clues" concept originates from
this Ocean's post in 2006.
Since the algorithm is
depth first search, it starts producing results almost immediately.
The exercise
I used my recent implementation of this algorithm in gridchecker.
I used 2 source grids - the one with the most known number of hardest puzzles, and one of the
blue's average which is average in another sense.
For the first grid I generated all minimal puzzles for some areas of forced non-givens. For this exercise I used the puzzles within 44 fixed givens (
h44, respectively 81-44=37 forced non-givens), and the starting part of all puzzles (
h81, without any restrictions except the given unique solution grid and minimality).
For the second grid I generated starting part of all puzzles (
am81).
Then I compared distributions to these from
Denis Berthier (
DB).
Results
- Code: Select all
am81 h81 h44 am81% h81% h44% DB%
20 0.0000000 0.0000000 0.0000000 0.000000
21 0.0000000 0.0000000 0.0000000 0.000034
22 0.0000000 0.0000000 0.0000000 0.003400
23 9 0.0000000 0.0000000 0.0030945 0.149000
24 74 1546 0.0000000 0.0021854 0.5315692 2.280000
25 5834 21715 0.0000000 0.1722955 7.4663815 13.420000
26 31 109274 71286 0.0112368 3.2271888 24.5106365 31.940000
27 2609 674117 115864 0.9457010 19.9086958 39.8381224 32.740000
28 36372 1358764 68328 13.1839930 40.1283740 23.4935720 15.480000
29 105191 939847 11389 38.1292591 27.7564993 3.9159392 3.560000
30 96402 259322 694 34.9434537 7.6585560 0.2386216 0.410000
31 31231 35700 6 11.3205017 1.0543280 0.0020630 0.022000
32 3757 2980 1.3618240 0.0880083 0.0000000 0.000000
33 273 130 0.0989561 0.0038393 0.0000000 0.000000
34 14 1 0.0050747 0.0000295 0.0000000 0.000000
275880 3386043 290837 100.0000000 100.0000000 100.0000000 100.004434
- The bias
- distrib.PNG (11.15 KiB) Viewed 2672 times
Comments
h44 sample is the closest to DB. The possible origins of the bias are the selection of an untypical grid and/or the selection of the forced non-givens. This sample differs from others in that that it is a
complete enumeration and therefore it is absolutely insensitive to the method of its generation.
h81 sample has over 3 millions of puzzles, but they are generated in the very very beginning of the process where many of the branches in the search tree are in their first iteration. The implemented "dead clues" concept is firstly to try with a cell and after whole enumeration of the subtree to mark the cell as "dead" and continue with the next. So the possible duplicates are generated at the early stages and skipped at the latter stages. This is the point where the bias comes from. Technically inverse implementation is possible and then the duplicates would be skipped at first stages and generated in latter.
am81 is a 10 times shorter sample from a different grid which confirms the bias.
The observed side effect could be used for generation from scratch of minimal puzzles with relatively high number of clues.