## Counting minimal puzzles: subsets, supersets, etc.

Everything about Sudoku that doesn't fit in one of the other sections
The relative error is defined as sd(X) / (E(X) * sqrt(n)) with sd being the standard deviation, E being the expected value and n being the number of samples (in this case: random grids).

Edit: You can use the relative error (RE) to get confidence intervals (95% confidence: [E(X) - 1.96*RE*E(X), E(X) + 1.96*RE*E(X)]) for the actual value of X.
Code: Select all
`1st computation: [  0,  619] (E(X) cannot be negative)2nd computation: [219, 1357]`

So the confidence intervalls do overlap!
Afmob

Posts: 130
Joined: 28 June 2011

### Re: Counting minimal puzzles: subsets, supersets, etc.

Thanks again,
this was very instructive for me.
eleven

Posts: 2381
Joined: 10 February 2008

As promised, here is some data that shows how these improvements affect the superset method. I used 10k random grids and let the program run for 15 minutes. The table below shows how many samples were analyzed.
Code: Select all
`+-------------------+--------+---------+---------+----------+-----------+| Subset size       |     24 |      25 |      26 |       27 |        28 |+-------------------+--------+---------+---------+----------+-----------+| JSolve            |   8689 |  189283 | 1898371 |  7906853 |  16690185 || JSolve     + UA4  |  42457 |  391940 | 2364523 |  8792517 |  17255544 || ZhouSolver        |  22000 |  392631 | 2992618 | 13542390 |  33135535 || ZhouSolver + UA4  |  74531 |  683705 | 4082289 | 16809766 |  36026738 || ZhouSolver + UA46 | 126795 |  835978 | 4476687 | 18253441 |  39053223 || Simple MinCheck   | 152823 | 1074178 | 6580190 | 37648477 | 146587568 |+-------------------+--------+---------+---------+----------+-----------+`

You can clearly see that the UA4 test is more efficient for smaller starting subsets. Furthermore the speedup für 26 clue subsets is larger than I initially remembered.

One can also use these numbers to compute the optimal subset size to get the most minimal puzzle of a certain size.
Code: Select all
`+---------------------+----+----+----+----+----+----+----+----+----+----+----+----+----+| Minimal puzzle size | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 |+---------------------+----+----+----+----+----+----+----+----+----+----+----+----+----+| JSolve              | 25 | 25 | 25 | 26 | 26 | 26 | 26 | 26 | 26 | 26 | 26 | 26 | 27 || JSolve     + UA4    | 24 | 24 | 24 | 25 | 25 | 25 | 25 | 26 | 26 | 26 | 26 | 26 | 26 || ZhouSolver          | 25 | 25 | 25 | 25 | 25 | 26 | 26 | 26 | 26 | 26 | 26 | 27 | 27 || ZhouSolver + UA4    | 24 | 24 | 24 | 25 | 25 | 25 | 25 | 26 | 26 | 26 | 26 | 26 | 27 || ZhouSolver + UA46   | 24 | 24 | 24 | 24 | 24 | 25 | 25 | 25 | 26 | 26 | 26 | 26 | 27 || Simple MinCheck     | 24 | 24 | 24 | 24 | 24 | 25 | 25 | 26 | 27 | 27 | 27 | 27 | 27 |+---------------------+----+----+----+----+----+----+----+----+----+----+----+----+----+`

Note that more minimal puzzles do not necessarily imply smaller relative error due to possible higher deviation.

Edit (September 1st): I've added an UA6 test which also favours smaller starting subsets.
Edit (September 4th): Added optimized minimality check (see following post) which heavily favours larger starting subsets.
Last edited by Afmob on Sun Sep 13, 2015 1:51 pm, edited 4 times in total.
Afmob

Posts: 130
Joined: 28 June 2011

### Simple minimality check

I further improved my implementation of the superset method by adding a fast check for (weak) minimality, so this might also be of interest out of this thread.

The idea is to check whether a clue is implied by the other clues via naked or hidden singles. This is easy to implement and it works wonders as you can see in the updated tables in my former post, especially for larger subsets. Note that this does not suffice to validate whether a puzzle is minimal but it can be used to detect non-minimality.

So how good is it? I tested it on 500,000 random subsets of size 26 of which only 28,210 where minimal. Using just ZhouSolver it takes 18.98 seconds. But if I call "Simple MinCheck" before ZhouSolver the program only needs 6.95 seconds. The new algorithm identifies 398,655 non-minimal puzzles which equals about 84.5 % of all non-minimal subsets. Sadly the new algorithm has only little impact on the UA46 test and the main UA algorithm which is used to add clues for the superset method.

The following code uses tables from JSolve but it's hopefully easy enough to understand. Also the code is still under development, so there's room for improvement.
Simple MinCheck: Show
Code: Select all
`bool JClass ::simpleMinCheck(int * num) {  int a, c, cell;  int mask;  CONST Cell * pC;  memset(board,0,sizeof(board));  memset(board2,0,sizeof(board2));  for (cell=0; cell<CELLS; cell++) if (num[cell]) {    mask = DigitToMask(num[cell]); //1 << (digit-1)    board[cell] |= mask; //Exception for clues to detect naked singles    board2[cell] = MaskAllDigits; //Exception for clues to detect hidden singles    pC = NeighborList(cell);    for (c=0; c<NEIGHBORS; c++) {      a = *pC++;      board2[a] |= (board[a] & mask); //Cell sees at least two clues with the same value      board[a] |= mask; //Candidate is not possible    }  }  //Hidden single  for (c=0; c<HOUSES; c++) { //Which candidates are not possible in each house?    mask = MaskAllDigits;    for (cell=0; cell<DIGITS; cell++) mask &= board2[HouseCellList(c,cell)];    if (mask) return false;  }  //Naked single  for (cell=0; cell<CELLS; cell++) if (board[cell] == MaskAllDigits) return false;  return true;}`

Edit: The algorithm can also detect invalid (as in having no solution) puzzles but not all of them.
Afmob

Posts: 130
Joined: 28 June 2011

### New superset computation

Using the aforementioned new algorithms I did another and even longer computation to further reduce the relative error.

Superset results:
Code: Select all
`Computation time: 31x30 days250,989,600,000 samples 14,124,252,278 minimal 26 hits  7,278,464,335 minimal 26 hits passing UA46 test    378,487,813 valid minimal puzzles+----+------------+--------------+----------------+| Cl |      Count |  E(nr/grid)  | E(rel err)*100 |+----+------------+--------------+----------------+| 26 |     330167 |    1.489e+15 |      1.740e-01 || 27 |    9127717 |    1.525e+15 |      4.487e-02 || 28 |   60443872 |    7.213e+14 |      2.198e-02 || 29 |  134325151 |    1.658e+14 |      1.697e-02 || 30 |  118275522 |    1.947e+13 |      1.871e-02 || 31 |   46184785 |    1.226e+12 |      2.834e-02 || 32 |    8839894 |    4.400e+10 |      5.878e-02 || 33 |     907140 |    9.578e+08 |      1.653e-01 || 34 |      51963 |    1.291e+07 |      6.212e-01 || 35 |       1572 |    1.004e+05 |      3.209e+00 || 36 |         30 |    5.324e+02 |      2.000e+01 |+----+------------+--------------+----------------+`

There are some good and bad news. The new algorithms definitely helped to speed up the superset method and the estimated number of minimal 35 clue puzzles seems to be quite stable when comparing it to the previous computations. The number of minimal 36 clue Sudokus seems to be less than expected but the relative error is not small enough to make any valid conclusions.
The bad news is that with all the optimizations I've done in my implementation of the superset method, the random grid generator stayed the same and it has now become a bottleneck since roughly 39% of the computation time is spent with generating random sudoku solutions. Looking quickly over the code Red Ed provided I've found some ways to possibly speed it up but I haven't found the time to do so.

I will start another computation using 25 clue subsets to lessen the impact of the grid generator. I'll make another post when the computation has finished (probably end of December) and the results are at least as good as these ones.
Afmob

Posts: 130
Joined: 28 June 2011

### 25 clue superset computation

Here are the results of the 25 clue superset computation:
Code: Select all
`Computation time: 31x30 days89,341,600,000 samples11,157,487,676 minimal 25 hits 7,794,693,371 minimal 25 hits passing UA46 test 1,821,801,421 valid minimal puzzles+----+------------+--------------+----------------+| Cl |      Count |  E(nr/grid)  | E(rel err)*100 |+----+------------+--------------+----------------+| 25 |     105878 |    6.229e+14 |      3.073e-01 || 26 |    6556754 |    1.484e+15 |      5.705e-02 || 27 |   91005892 |    1.525e+15 |      2.130e-02 || 28 |  401703804 |    7.215e+14 |      1.300e-02 || 29 |  669679520 |    1.659e+14 |      1.140e-02 || 30 |  471675542 |    1.947e+13 |      1.332e-02 || 31 |  153524306 |    1.227e+12 |      2.034e-02 || 32 |   25172860 |    4.400e+10 |      4.176e-02 || 33 |    2257277 |    9.566e+08 |      1.168e-01 || 34 |     116152 |    1.303e+07 |      4.425e-01 || 35 |       3388 |    1.086e+05 |      2.327e+00 || 36 |         48 |    4.701e+02 |      1.667e+01 |+----+------------+--------------+----------------+`

The results match the previous calculations with the generator only taking up about 8.6% of the computation time.
Afmob

Posts: 130
Joined: 28 June 2011

Previous