Counting minimal puzzles: subsets, supersets, etc.

Everything about Sudoku that doesn't fit in one of the other sections

Counting 17 & 18 minimal puzzles

Postby Afmob » Fri Jul 26, 2013 12:55 pm

coloin wrote:I believe dobrichev's gridchecker program could check several random grids to confirm this figure.

That is a good idea! If blue or dobrichev won't try it until next week I'll give it a shot. Though I won't expect any wonders concerning the relative error since it will probably take several hours on average to analyze a grid but it should be faster (as in getting more 17 and 18 minimals) than the subset method.
Afmob
 
Posts: 132
Joined: 28 June 2011

Re: Counting minimal puzzles: subsets, supersets, etc.

Postby dobrichev » Fri Jul 26, 2013 6:18 pm

Hi,

This is the exact (but never verified) count of the minimal puzzles for the grid with 20 17-clue puzzles

[edit: former list with wrong result]
Hidden Text: Show
Code: Select all
wc -l *.puzzles.txt
        20 20.17.puzzles.txt
      1402 20.18.puzzles.txt
     79947 20.19.puzzles.txt
  15034293 20.20.puzzles.txt


[edit: below is the repaired list]
Code: Select all
wc -l *.minimals.txt
        20 20.17.minimals.txt
       294 20.18.minimals.txt
     32748 20.19.minimals.txt
  12141180 20.20.minimals.txt


The grid 20x17
Code: Select all
594231786786945132123768954965173248378492561241856397432619875619587423857324619
Last edited by dobrichev on Wed Jul 31, 2013 9:19 pm, edited 1 time in total.
dobrichev
2016 Supporter
 
Posts: 1862
Joined: 24 May 2010

Re: 20x17 grid

Postby blue » Fri Jul 26, 2013 7:13 pm

Hi dobrichev,

I can confirm the number of 17s (20), and the number of 18's (1402).
I'm not set up to count anything larger.

Regards,
Blue.
blue
 
Posts: 1037
Joined: 11 March 2013

Subset & superset results

Postby Afmob » Mon Jul 29, 2013 9:00 am

The computation has just finished and you can find the results in this post.

Here are the eight 18 clue minimals I've found:
Code: Select all
.6......94...5............8.......1..29..6.........73....2.9...3......4.5.1.7....
......5....1.6....7.....42.8.........2.43.....4......6.3.........6.9...7.....72..
..5..1....6....3.8............6...9....3....71.92...4.73.......8....9.1..........
.94.6.....7............35.1............4.....1....82..6...2......96...4.......89.
....4......5.....1..7..3....4.197......5......3....6........8.......893...16.....
..7..13..............9....83........58............62.....5...8.7.6....95..1....4.
...9..3..9........2...4..7.....7...........4..5.6..9.....5..2..4.2....1.7....8...
9......2.......3.....8.34......7..16....95....8.......6..41..9.....2.....3.......

And the one 36 clue minimal:
Code: Select all
.759.13..93.7...6...1.43.79........2.2.13..56.5.2...3..47..9...59..1..47.1247..9.

I wrote: I expect to find some 36 clue minimal puzzles and get an estimate for the number of 18 clue minimal puzzles with the relative error being smaller than 25 %.

Well, I only got one of those rights and only partially. :mrgreen: I was quite surprised that the number of 18 clue minimal Sudoku is that low but eleven apparently was not. Do you have a link to your estimates? On the other hand we now got a pretty good estimate on the number of 19, 34 and 35 minimals.

So what's next? I'll give eleven's suggestion a try and compute the exact number of 17 and 18 clue minimals for some unbiased random grids though the sample size will be substantially lower. As for the superset method, I'll still believe that there might be a another necessary condition besides minimality of the subset. If we can find it and it can be checked fast enough it might be worth a try to repeat the computation.
Afmob
 
Posts: 132
Joined: 28 June 2011

Re: Counting minimal puzzles: subsets, supersets, etc.

Postby coloin » Mon Jul 29, 2013 9:48 am

I think the exercise would be to see how accurate your estimates would be for the grid with 20-17s.

The current number of different 17-puzzles is 49152.

The estimate of the number of 18-puzzles came from work by anon17 who generated ~800 million 18-puzzles all in a closed group within {-1+1} of each other. The remaining ? 800 million estimated other puzzles were in smaller groups without the single large one. His discussions were lost in the crash.

C
coloin
 
Posts: 2493
Joined: 05 May 2005
Location: Devon

Re: Subset & superset results

Postby eleven » Mon Jul 29, 2013 3:11 pm

Afmob wrote:I was quite surprised that the number of 18 clue minimal Sudoku is that low but eleven apparently was not. Do you have a link to your estimates?

My estimates came from a graphical extra/interpolation of the values we had at that time. See the table and graph at the end of [corrected:] this post, showing the logarithm of the (estimated) minimal puzzle counts.
I did the same with your updated values. These are the old and new curves (i had raised my estimate for 39's to about 1000, because with dobrichevs and blue's findings there were more than 300 known ones).
[Edit: I had used a wrong plot file for the old estimates. This one matches the numbers in the link above (with 3 for 39's)]
Attachments
minimal_counts2.png
minimal_counts2.png (70.32 KiB) Viewed 1406 times
eleven
 
Posts: 3149
Joined: 10 February 2008

Re: Counting minimal puzzles: subsets, supersets, etc.

Postby dobrichev » Mon Jul 29, 2013 9:37 pm

I am curious whether we can gain more precise results at the lower end by breaking the fundamental rule for "givens".
Every given is a constrain in form "this cell can't contain values other than V". What if a "weak" puzzle starts directly from pencilmarks with possibly entirely missing predefined to a single value cells?
From this perspective the regular puzzles contain some redundant constrains. With some luck we can map the characteristics of a regular puzzle to the weakened ones, say by examining the known area and determining some proportions based on the number of givens and the number of the redundant constrains.
For the famous 16-given pseudo-puzzle with 2 solutions, we need only one more "weak" constrain in any of the 18 empty cells that share 2 possible values. This, without taking into account the other redundancies, we can claim the existence of at least 18 different puzzles with "16 1/9" givens. So it is possible to collect statistical data beyond the 17 givens limit and somehow map it back to the ordinary puzzles.
It is trivial to adjust the solver for such weakened rule. Problems arise with the very extended space. For example, the complexity of {-1,+1} transformation for a regular puzzle with N givens is (N removals) * ((81 - N + 1) * 9 insertions). For weakened puzzle it requires 9 * 9 = 81 times more operations. Even identification of the redundant constrains within a regular puzzle expands it to millions of valid "weakened" puzzles, and I am stuck there. Bottom-up subset/superset generation with the help of UA sets requires additional UA classification by the secondary value in the permutation, etc. New tools are required and maybe the creators of the composite puzzles with shared areas already have them.

The ordinary sudoku puzzles are really very over-determined.
Challenge: Is there at least one valid puzzle having at least one given which can't be represented as pencilmarks with more than just one possibility keeping the puzzle valid?
dobrichev
2016 Supporter
 
Posts: 1862
Joined: 24 May 2010

Postby Afmob » Tue Jul 30, 2013 4:46 am

dobrichev, check this thread.

coloin, here the results for the 20 17s.

Code: Select all
Computation time: 18 hours

6,427,631,739 samples
   19,033,333 valid 28 hits
  275,780,992 valid minimal puzzles

+----+------------+--------------+------------+
| Cl |      Count |  E(nr/grid)  | E(rel err) |
+----+------------+--------------+------------+
| 17 |         37 |    3.443e+01 |     30.46% |
| 18 |         66 |    3.573e+02 |     20.66% |
| 19 |        918 |    3.131e+04 |      6.62% |
| 20 |      51748 |    1.216e+07 |      0.85% |
| 21 |    1894616 |    3.395e+09 |      0.19% |
| 22 |   23394440 |    3.593e+11 |      0.08% |
| 23 |   91163379 |    1.377e+13 |      0.05% |
| 24 |  111298725 |    1.950e+14 |      0.04% |
| 25 |   42677944 |    1.065e+15 |      0.04% |
| 26 |    5114880 |    2.383e+15 |      0.07% |
| 27 |     182721 |    2.341e+15 |      0.28% |
| 28 |       1518 |    1.050e+15 |      2.57% |
+----+------------+--------------+------------+


They are quite different from the numbers given above. I would appreciate it, if someone (blue?) could confirm those results with the subset method.
Afmob
 
Posts: 132
Joined: 28 June 2011

Postby dobrichev » Tue Jul 30, 2013 6:42 am

Afmob wrote:dobrichev, check this thread.

Thank you.
I read it (w/o referenced articles yet). 96 weak constraints (= 10 2/3 givens = 633 initial pencilmark candidates) result in unique but easy to solve puzzle. The other end - maximal non-redundant initial weak constraints - is not discussed. The period from Oct 12 2006 till Oct 27 2006 when the posts are written suggests the absence of massive search 8-)
Still no clue whether the "weak givens" distribution could be mapped to regular puzzles distribution.
dobrichev
2016 Supporter
 
Posts: 1862
Joined: 24 May 2010

Re: Counting minimal puzzles: subsets, supersets, etc.

Postby coloin » Tue Jul 30, 2013 9:36 pm

Afmob wrote:here the results for the 20 17s.....
well thanks for trying ! ..... i was hoping, as i am sure you were, that the results would concur ! :(

No one seems to be surprised that some grids would appear to have up to x 10 more 25-puzzles than others
blue wrote:
Code: Select all
1.2712e+015 412673859837295641956841732183462975725938164694157328348729516561384297279516483
1.2564e+014 724865931156392478398741265267489153541273689839156742472518396615937824983624517

Looking at your data the grid with 20 17-puzzles is approaching double the total 25-28-puzzles than average ...
7e15 vv 4e15 :roll:

C
coloin
 
Posts: 2493
Joined: 05 May 2005
Location: Devon

20x17 confirmation

Postby blue » Wed Jul 31, 2013 10:29 am

Afmob wrote:coloin, here the results for the 20 17s.

Code: Select all
Computation time: 18 hours

6,427,631,739 samples
   19,033,333 valid 28 hits
  275,780,992 valid minimal puzzles

+----+------------+--------------+------------+
| Cl |      Count |  E(nr/grid)  | E(rel err) |
+----+------------+--------------+------------+
| 17 |         37 |    3.443e+01 |     30.46% |
| 18 |         66 |    3.573e+02 |     20.66% |
| 19 |        918 |    3.131e+04 |      6.62% |
| 20 |      51748 |    1.216e+07 |      0.85% |
| 21 |    1894616 |    3.395e+09 |      0.19% |
| 22 |   23394440 |    3.593e+11 |      0.08% |
| 23 |   91163379 |    1.377e+13 |      0.05% |
| 24 |  111298725 |    1.950e+14 |      0.04% |
| 25 |   42677944 |    1.065e+15 |      0.04% |
| 26 |    5114880 |    2.383e+15 |      0.07% |
| 27 |     182721 |    2.341e+15 |      0.28% |
| 28 |       1518 |    1.050e+15 |      2.57% |
+----+------------+--------------+------------+


They are quite different from the numbers given above. I would appreciate it, if someone (blue?) could confirm those results with the subset method.

Done.

Code: Select all
6530227646 samples
19338712 size 28 hits

+----+-----------+--------------+------------+
| Cl |     Count |  E(nr/grid)  | E(rel err) |
+----+-----------+--------------+------------+
| 17 |        24 |  2.1983e+001 |   39.97%   |
| 18 |        56 |  2.9844e+002 |   26.96%   |
| 19 |       963 |  3.2332e+004 |    6.60%   |
| 20 |     52345 |  1.2107e+007 |    0.83%   |
| 21 |   1915653 |  3.3784e+009 |    0.19%   |
| 22 |  23739797 |  3.5886e+011 |    0.08%   |
| 23 |  92581883 |  1.3762e+013 |    0.05%   |
| 24 | 113101341 |  1.9502e+014 |    0.03%   |
| 25 |  43352478 |  1.0652e+015 |    0.04%   |
| 26 |   5190856 |  2.3808e+015 |    0.07%   |
| 27 |    184443 |  2.3264e+015 |    0.28%   |
| 28 |      1613 |  1.0986e+015 |    2.49%   |
+----+-----------+--------------+------------+

[ I don't think this was a concern, but one of dobrichev or I, probably should have mentioned that the 1402 18's that we counted, contained only 294 minimal puzzles. ]

Regards,
Blue.
blue
 
Posts: 1037
Joined: 11 March 2013

Re: Counting minimal puzzles: subsets, supersets, etc.

Postby Afmob » Wed Jul 31, 2013 10:53 am

Thank you, blue! I was really wondering why the results differed that much and was worrying that my subset algorithm was messed up. I made a second run (with equal computation time) where the 17s estimate was 10, so together with the first run, the average would be about 22.

coloin, here the minimal estimates for the PT grid which has less than average number of minimals:

Code: Select all
Computation time: 18 hours

8,488,983,601 samples
    5,744,321 valid 31 hits
  154,640,658 valid minimal puzzles

+----+------------+--------------+------------+
| Cl |      Count |  E(nr/grid)  | E(rel err) |
+----+------------+--------------+------------+
| 20 |          1 |    6.531e+00 |    100.00% |
| 21 |       1517 |    5.494e+04 |      3.64% |
| 22 |     172965 |    3.759e+07 |      0.54% |
| 23 |    4339802 |    6.182e+09 |      0.19% |
| 24 |   29832702 |    3.081e+11 |      0.10% |
| 25 |   63450613 |    5.336e+12 |      0.07% |
| 26 |   44899164 |    3.524e+13 |      0.06% |
| 27 |   10981492 |    9.482e+13 |      0.09% |
| 28 |     935445 |    1.090e+14 |      0.20% |
| 29 |      26725 |    5.503e+13 |      0.86% |
| 30 |        232 |    1.242e+13 |      7.26% |
+----+------------+--------------+------------+

I'll start a larger computation and change the table if I find some 20 or 31 clue minimals.
Last edited by Afmob on Thu Aug 01, 2013 6:04 am, edited 1 time in total.
Afmob
 
Posts: 132
Joined: 28 June 2011

Re: Subset & Superset method

Postby denis_berthier » Wed Jul 31, 2013 5:27 pm

Hi Afmob,

First of all, congratulations for the interesting results about the extreme numbers of clues. As I noticed long ago, high values are out of reach of the controlled-bias generator. This may not be true of low values, if sufficiently many clues are eliminated from the start (this was the first optimisation I found, and it may be pursued further for small numbers of clues).


Afmob wrote:In contrast to my former computations I do not only use one small sample size of 100k grids but using Red Ed's unbiased generator some samples (100k for subset method, 10k for superset method) are produced with each sample being used 10 times. Afterwards, new samples are generated. Therefore the relative errors are slightly underestimated but only by a small margin.

hopefully "small", but not calculated.


1) Mean number of minimal puzzles per complete grid

Afmob wrote:Subset results:
Code: Select all
Computation time: 15x240 hours

601,942,000,000 samples
  1,039,057,869 valid 28 hits
 13,676,903,000 valid minimal puzzles

+----+------------+--------------+----------------+
| Cl |      Count |  E(nr/grid)  | E(rel err)*100 |
+----+------------+--------------+----------------+
| 18 |          8 |    4.625e-01 |      3.536e+01 |
| 19 |       6092 |    2.219e+03 |      1.518e+00 |
| 20 |    1287655 |    3.231e+06 |      1.368e-01 |
| 21 |   67409525 |    1.290e+09 |      2.925e-02 |
| 22 |  984980363 |    1.615e+11 |      1.146e-02 |
| 23 | 4280735451 |    6.903e+12 |      6.447e-03 |
| 24 | 5689364460 |    1.064e+14 |      4.798e-03 |
| 25 | 2343546179 |    6.247e+14 |      5.106e-03 |
| 26 |  298326641 |    1.484e+15 |      9.373e-03 |
| 27 |   11149692 |    1.526e+15 |      3.568e-02 |
| 28 |      96934 |    7.163e+14 |      3.212e-01 |
+----+------------+--------------+----------------+

Supset results:
Code: Select all

21,310,000,000 samples
 1,199,199,372 minimal 26 hits
    32,151,592 valid minimal puzzles

+----+------------+--------------+----------------+
| Cl |      Count |  E(nr/grid)  | E(rel err)*100 |
+----+------------+--------------+----------------+
| 26 |      28055 |    1.491e+15 |      5.970e-01 |
| 27 |     775928 |    1.527e+15 |      1.540e-01 |
| 28 |    5131589 |    7.213e+14 |      7.545e-02 |
| 29 |   11419771 |    1.660e+14 |      5.831e-02 |
| 30 |   10035639 |    1.946e+13 |      6.439e-02 |
| 31 |    3929868 |    1.229e+12 |      9.756e-02 |
| 32 |     749034 |    4.391e+10 |      2.019e-01 |
| 33 |      77082 |    9.586e+08 |      5.705e-01 |
| 34 |       4476 |    1.310e+07 |      2.192e+00 |
| 35 |        149 |    1.121e+05 |      1.172e+01 |
| 36 |          1 |    2.090e+02 |      1.000e+02 |
+----+------------+--------------+----------------+


Taking the best of the superset/subset (which is which? you call them both subset) results for each number of clues, we get:

Total mean number of minimal puzzles per grid =
(+ 4.625e-01 2.219e+03 3.231e+06 1.290e+09 1.615e+11 6.903e+12 1.064e+14 6.247e+14 1.484e+15 1.526e+15 7.213e+14 1.660e+14 1.946e+13 1.229e+12 4.391e+10 9.586e+08 1.310e+07 1.121e+05 2.090e+02)
= 4.6562e+15
Very close to my own estimate, 4 years ago (4.6553e+15, p. 43 of the pdf here http://forum.enjoysudoku.com/the-real-distribution-of-minimal-puzzles-t30127-7.html)

Code: Select all
Error =
 (+
   (* 4.625e-01 3.536e+01)
   (* 2.219e+03  3.536e+01)
   (* 3.231e+06 1.368e-01)
   (* 1.290e+09 2.925e-02)
   (* 1.615e+11 1.146e-02)
   (* 6.903e+12 6.447e-03)
   (* 1.064e+14  4.798e-03)
   (* 6.247e+14 5.106e-03)
   (* 1.484e+15 9.373e-03)
   (* 1.526e+15 3.568e-02)
   (* 7.213e+14 7.545e-02)
   (* 1.660e+14 5.831e-02)
   (* 1.946e+13 6.439e-02)
   (* 1.229e+12 9.756e-02)
   (* 4.391e+10 2.019e-01)
   (* 9.586e+08 5.705e-01)
   (* 1.310e+07 2.192e+00)
   (* 1.121e+05 1.172e+01)
   (* 2.090e+02 1.000e+02)
)
= 137587748084193.0, which gives an interval [4.6548 4.6576]e+15


relative error: 0.0295 %

My estimate had relative error 0.065%
So, yours is apparently half of mine. But this is without counting the unknown error due to your relatively small sample of complete grids - I used all the complete grids, several times, and I had therefore no such error.


Your computation time: 15x240 hours = 5 months. If we divide by 4 (in order to allow a double sampling error), this gives 1.25 CPU-months.
I don't have exact figures for the time it took me to generate my 5.9 million uncorrelated puzzles with the controlled-bias generator (which, at the start, was not optimised - the last version was 200 times faster than the first). In any case, the count was in CPU-months, not in years. So, the benefit (if any, i.e. if not spoiled by sampling errors of the complete grids) is not orders of magnitude.

2) Estimating other random variables
Now, suppose you want to compute the mean SER of minimal puzzles, with the same relative precision as with the controlled-bias generator. You will have to use (at least) 1/4 of your puzzles, i.e. 13,676,903,000/4, i.e. more than 3 billion, when I need only about 6 million - 500 times less.
The same problem will appear every time you want to estimate the mean of any random variable.
(Indeed, these figures can be lowered, in both methods, by concentrating on the main part of the distribution. But this doesn't change the computation time ratio.)
denis_berthier
2010 Supporter
 
Posts: 4189
Joined: 19 June 2007
Location: Paris

Re: 20x17 confirmation

Postby dobrichev » Wed Jul 31, 2013 7:12 pm

blue wrote:[ I don't think this was a concern, but one of dobrichev or I, probably should have mentioned that the 1402 18's that we counted, contained only 294 minimal puzzles. ]

Yes, this is my failure. There are non-minimal puzzles in my lists, also repeated puzzles, also minimal but with other size. Now I am filtering them. Will post the results when ready.
dobrichev
2016 Supporter
 
Posts: 1862
Joined: 24 May 2010

Comments on my computations

Postby Afmob » Wed Jul 31, 2013 7:41 pm

denis_berthier wrote:Taking the best of the superset/subset (which is which? you call them both subset) results for each number of clues, we get:

The second table shows the results of the superset method which I falsely labeled supset. I've corrected this mistake. Note that I did not call it subset as you can see in the quote in your post.

As for the results, one can often find situations where one method is better than the other and vice versa. For example if I look at the relative error for the number of 22 clue minimals you got a relative error of 1.23% and blue got about 0.05% which is a major difference. Note that I choose blue's results since they have no sample error and the computation time is not as large is mine was. My current interest is in improving the superset method and getting better results for the number of low clue and high clue minimals.

dobrichev wrote:Will post the results when ready.
I'm looking forward to it. :)
Afmob
 
Posts: 132
Joined: 28 June 2011

PreviousNext

Return to General