How many minimal sudokus has an average grid ?

Everything about Sudoku that doesn't fit in one of the other sections

How many minimal sudokus has an average grid ?

Postby ravel » Mon Aug 28, 2006 10:07 pm

There was a question somewhere in this forum, how many sudokus there are at all. I only could give the vague answer "quadrillions". I would like to know it more exactly. When i remember right, Coloin already suggested to generate a lot of sudokus to a given grid and count the duplicates to be able to estimate the number of sudokus in it. We know that different grids will have a different number of minimal sudokus.
So i have 3 questions.
Is there already known more about, how much minimal sudokus a random or special grids has ?
Are there other suggestions, how the number can be estimated ?
Are there free programs available, that can generate random sudokus to given grids from a file (or randomly created grids) and write them to a file including duplicates (or just count them) ? If it works under linux, i could run it for 2 weeks, since i will be on holidays the next 2 weeks.
When we make such a search for a number of random grids, we should be able to give a good statistically based answer and get a feeling how much the number varies for different grids.
Remark: I am also curious what the "structures" thread will bring the next days, so i will wait patiently that people engaged there have time:)
ravel
 
Posts: 998
Joined: 21 February 2006

Postby ravel » Wed Aug 30, 2006 1:05 pm

My first try brought a surprising high number of minimal puzzles to be expected in a grid. Whereas i had thought about 10-100 millions, it now seems, that there are more than 10^15 (quadrillions).
With gsf's solver i started to generate (all) minimal puzzles to 2 grids (one is an arbitrary one, the other the solution grid of Ocean #1/M21/D21, the current hardest). The program starts from left to right dropping all clues that are not needed for a unique solution. Then following minimal sudokus always keep as much as possible givens (from left to right) in common with the last solution.
So far i have more than 50000 sudokus for each grid.
What i assumed, is confirmed in the samples so far: When the next empty cell (read from left) is set to a number, about the same amount of puzzles can be expected (with cells left of it unchanged). You can see below that puzzles with double number in many cases have exactly one new number changed (marked in the puzzle above with ^).


Code: Select all
       1 ...............163.......97..2...........47...1.5.83.6.4.^36.25.5...7..9.392..6..
    1000 ...............163.......97..2...........47...1.5.83.6^4.93.8.52.6.1...9.39..5...
    2000 ...............163.......97..2...........47...1.5^83.67...368.525...74...39..5...
    4000 ...............163.......97..2...........47...1.52.346.41.36.25.5.81...98.9.4..7.
    8000 ...............163.......97..2...........47...1^5283...4193.8.5.5...74..83..4.6.1
   16000 ...............163.......97..2...........47..^17.283.67..9.6.252.6.1...98.9.4..7.
   32000 ...............163.......97..2...........47..9..52..4674.936.252.68.7..9.3.2.....
   55531 ...............163.......97..2...........47..9.75.8.467.193.82.25...7..9.3...56..

       1 ................35...59.1................42.8.6.2..94..47^.8..1.8.156...36.7..89
    1000 ................35...59.1................42.8.6.2..94.^479.8..12983..6...36.7.5..
    2000 ................35...59.1................42.8.6.2..94.5....8.2.2.83.5.74.36.7.5.9
    4000 ................35...59.1................42.8.6.2.^94.5.796...1..83..6..1...72.89
    8000 ................35...59.1................42.8.6.2^7.4..4796.32...8.1..74.364..5..
   16000 ................35...59.1................42.8.6^25.94.5....8..12.83..6.41.64.2.8.
   32000 ................35...59.1................42.8^61..7.4.54...8.2.29..1.6...364..5..
   59861 ................35...59.1................42.88..257.4354.96..2.2983.56...364...8.

So in the first list 36 positions are still to place and in the second 37, which leads to an expected number of 2^36*64000 or 2^37*64000, i.e. more than 10^15 sudokus.

Remark: More than 10000 of the first puzzles of the second grid can possibly be solved almost identically: Fill out all numbers under the /-diagonal and then solve it with a quad plus BUG+1.
ravel
 
Posts: 998
Joined: 21 February 2006

Postby Red Ed » Wed Aug 30, 2006 2:50 pm

How does your method perform on the 2x2 case, for which we know the number of minimal puzzles?
Red Ed
 
Posts: 633
Joined: 06 June 2005

Postby ravel » Wed Aug 30, 2006 3:33 pm

I cannot try, because gsf's program only handles 3x3 sudokus.
ravel
 
Posts: 998
Joined: 21 February 2006

Postby ravel » Tue Sep 19, 2006 2:40 pm

After more than 2 weeks i had to stop gsf's program now, generating minimal sudokus for 2 given grids. It found about 200000 sudokus each varying only the last 39 cells with only 7 and 6 resp. fixed givens in the first 42 cells ! The trend that switching to the next rightmost new given always gives about the double number of minimal sudokus, still held, as the following table shows, though any estimation based on it must be very vague. Listed are the number of free unchanged cells read from left (the search algorithm varies the rigthmost cells first to find the next minimal sudoku), the number of minimal sudokus found so far, the current sudoku containing the new position and the estimated number of sudokus in that grid calculated as the number of found sudokus multiplied with 2 power the number of free unchanged cells.
So i still expect between 10^15 and 10^17 sudokus in the two grids. If this would be representative, it would mean a total number of about 10^25 essentially different (non isophorphic) sudokus.
I have no good idea in the moment, how better estimations could be achieved or useful lower/upper bounds could be calculated. Maybe coloins recent approach is promising (hope i understood it right): Calculate all say 20-clues for a grid, and millions of random minimal grids and then estimate the overall number from the statistics.
Code: Select all
          ...............163.......97..2...........47...1.5.83.6.4..36.25.5...7..9.392..6..  (first generated)
40   3516 ...............163.......97..2...........47...1.5.8346.4..368.5.5...7..9.392..6.1  4*10^15
39   3975 ...............163.......97..2...........47...1.52.346.41.36.25.5...7..9.392..6..  2*10^15
38  10023 ...............163.......97..2...........47...17..83.6.4..36.25.5...7..9.3.2..6..  3*10^15
37  22785 ...............163.......97..2...........47..9..5.8.4..41.36.25.5...7..9.3.2..6..  3*10^15
36  86763 ...............163.......97..2...........47.2.1.5.83...4..36.25.5...7..9.392..6..  6*10^15
35 122537 ...............163.......97..2...........471..1.5.83.6.4..36.252....7..9.39..56..  4*10^15
   192326 ...............163.......97..2...........471.917.283..74..368..25..1.4...3..4....  (last generated)


          ................35...59.1................42.8.6.2..94..47..8..1..8.156...36.7..89  (first generated)
40   5979 ................35...59.1................42.8.6.2.7.4..47..8..1..8.156...36.7..89  6*10^15
39  14715 ................35...59.1................42.8.6.25.94..47..8..1..8.1.6...3..7.5.9  8*10^15
38  22191 ................35...59.1................42.8.61...94..47..8..1.98.1.6...3..725.9  6*10^15
37  54118 ................35...59.1................42.88..2.7.43.47..8..1..8.156...36.7..89  7*10^15
36 113784 ................35...59.1................4218.6.2..94..47..8.....8.156...36.7..89  8*10^15
   236401 ................35...59.1................4218861...9.3547..8....9.31.6..1...7..89  (last generated)

Note: Essentially different/not isomorphic does not mean, that also the solution paths essentially differ. I tried about 30 of the puzzles of the second set including the first and the last in the Sudoku Explainer and all of them were solved with the same quad and turbot fish (!). So it seems that more than 200000 essentially different sudokus are so similar for a human solver that he would have a problem to distinguish them.
ravel
 
Posts: 998
Joined: 21 February 2006

Postby coloin » Thu Sep 21, 2006 2:08 pm

Well...it is a tricky one.
My first and second attempts at this were unsuccesful but my third attempt may be valid - although Ive made many assumptions and approximations
First Attempt
-the reverse of the "birthday" problem in mathematics - coincidentally was an old research topic of r.e.s
The formula k*=SQRT(2*N) .....rearranged gives N ~ 1/2 * k^2
Where N is the estimated number of grids and k is the number of grids generated when a significant number of duplicates emerge......
This was all very well but at the time I struggled to be able to recognise duplicates from more than a 64000 sample size.
Since then no grid has generated a single duplicate in one million grids, which makes me confident that we have more than 5^10^11 minimal puzzles per grid. [no marks really]
Second Attempt
My second method was to compute [using checker] the number of puzzles of size 20 in a particular grid
Accepting the potential for errors in this calculation
Code: Select all
         Est.no.of 20-clue puzzles   Incid. of puzzles per 10^6   Number of puzzles estimated
Grid
74-2          3000                                   1                      3*10^9
MC            638                                    5                      1.3*10^7

We know this is wrong
There are at least 60,000 33s in the MC grid ! - none appeared by random generation !
Both these estimations are way too low and....... must be flawed - either as a bug or skewed by a probability quirk. [I think the latter]
Third Attempt
I think this is similar to your method .....onlyI have found the average number of puzzles in a 34 clue minimal sudoku.....and made an approximation as to how many more puzzles each additional clue generates.
Take a random 25 clue puzzle ......add 9 non minimal random clues [from the solution grid][one clue in each box]
I think this is the same one as ravel used. [with 9 more clues]
Code: Select all
+---+---+---+
|...|...|.8.|
|...|7..|163|
|.2.|...|.97|
+---+---+---+
|..2|3..|...|
|...|..4|7.2|
|91.|5.8|3.6|
+---+---+---+
|74.|.36|.25|
|.5.|8.7|..9|
|.39|2..|6.1|
+---+---+---+  ravel 34


Count up almost all the puzzles......This can be done.....

The program that I use finds it easier to come up with smaller puzzless..not sure why.

Here are the results of the numberof minimal puzzles found per grids generated.
Code: Select all
puzzles generated     10000      100000    500000   2000000   
ravel32                                                98
ravel33                                               233
ravel34                1259       1305      1305     1305
ravel35                           4507               4563
ravel36                                             10482*                                               
25clueplus9-1           886        910       910     
25clueplus9-2          5591      11577     12104     
25clueplus9-3          3038       4068      4080       
25clueplus9-4           885        894       894     
25clueplus9-5          5456      11407     11955     

average                                     5208
                             
So starting from 34 clues how many more puzzles do you get by adding an additional clue ?

 
In the ravel 32                  98 total
In the ravel 33  135  new       233 total      factor 135/98    = 1.37
In the ravel 34  1072 new      1305 total      factor 1072/233  = 4.6
In the ravel 35  3258 new      4563 total      factor 3258/1305 = 2.5
In the ravel 36  5919 new     10482 total      factor 5919/4563 = 1.2 *

* most probably thereare more puzzles than 10482

If we ignore the effect of the unavoidable clues...
Assuming the average puzzles size is 25 clues. [it is]
Assuming the average puzzle count in a 34 cell non minimal puzzle is around 5000
Also assuming I have got these estimates correct !
Code: Select all
 In a 35 clue grid a clue is present in 25 ot of 35 ~ 75% therefore a 35 clue grid will 100/25 = 4.0 times more puzzles than a 34
In a 40 clue grid a clue is present in 25 ot of 40 ~ 62%  therefore a 40 clue grid will 100/38 = 2.6 times more puzzles than a 39
In a 45 clue grid a clue is present in 25 ot of 45 ~ 55%  therefore a 45 clue grid will 100/45 = 2.2 times more puzzles than a 44
In a 50 clue grid a clue is present in 25 ot of 50 ~ 50%  therefore a 50 clue grid will 100/50 = 2.0 times more puzzles than a 49
In a 60 clue grid a clue is present in 25 ot of 60 ~ 41%  therefore a 60 clue grid will 100/59 = 1.7 times more puzzles than a 59
In a 70 clue grid a clue is present in 25 ot of 70 ~ 35%  therefore a 70 clue grid will 100/65 = 1.5 times more puzzles than a 69
In a full grid a clue is in 25 out of 81           ~ 30%  therefore a 81 clue grid will 100/70 = 1.4 times more puzzles than a 80


Approximating ! adding 47 clues !
      [35-36][37-38][39-42][43-45] [46-55] [56-65]  [66-75] [76-81]
5000 * [4^2]*[3^2]*[2.5^4]*[2.2^3]*[2^10]*[1.7^10]*[1.5^10]*[1.4^6]   = 5000*16*9*39*10*1024*201*57*7

  =   

23,060,356,300,800,000 ~ 2^16


Thats OK then

C
coloin
 
Posts: 2515
Joined: 05 May 2005
Location: Devon

Postby ravel » Thu Sep 21, 2006 4:21 pm

Thanks Coloin,

without having it read accurately it seems, that your attempt - though somewhat similar to mine - is different enough (using additional premises like the distribution of random sudokus concerning the numbers of clues and another way of extrapolation) to be looked as independant.
So i am happy for now to see about the same result.
ravel
 
Posts: 998
Joined: 21 February 2006

Postby coloin » Fri Sep 22, 2006 11:08 am

From page 15 of the Sudoku Maths thread.....I have just found this snippet from Ocean and dukoso, which is relevant to this thread....
Ocean wrote:Surprisingly I have not been able to find discussions of these two open problems:


1. What is the number of UNIQUE VALID SUBGRIDS?
2. What is the number of unique valid subgrids with N clues?

(Where a "Unique Valid Subgrid" means a puzzle with one and exactly one solution).




Some trivial numbers:

The total number of subgrids with zero, one or many solutions, is 10^81 - covering all from 0 to 81 clues.

For each valid grid, we can find 2^81 (=2.4e24) subgrids where that particular grid is a solution (mostly among other solutions).

For each valid grid, we can find "81 over N" subgrids with N clues, where that particular grid is among the solutions.


(81 over 15) = 8.1e+15
(81 over 16) = 3.3e+16
(81 over 17) = 1.2e+17
(81 over 18) = 4.5e+17
(81 over 19) = 1.5e+18
(81 over 20) = 4.6e+18
(81 over 21) = 1.3e+19
(81 over 22) = 3.7e+19
(81 over 23) = 9.5e+19
(81 over 24) = 2.3e+20
(81 over 25) = 5.2e+20


Since there are 6670903752021072936960 valid sudokugrids (from 5472730538 equivalence classes), we have the upper bounds:

the number of valid subgrids is less than 2^81 * 6670903752021072936960 (=1.61e46).

(and also: the number of essentially different valid subgrids is less than 2^81 * 5472730538 [=1.3e34]).

The number of unique valid subgrids is probably a lot lower than these upper bounds, but still a huge number.

...............



dukuso wrote:sudokugrids:1e22 from 1e10 classes
sudokus:1e47 from 1e35 classes (1-solution)
minimal sudokus:1e37 from 1e25 classes (no superfluous clues)


Ocean wrote:Thank you for the numbers!

It immediately surprised me that the number of sudokus "equals" my rough upper bounds, but the explanation is simple: the vast majority of sudokus have 30-50 clues, and in this range the fraction of puzzles with more than one solution is very small. For small number of clues, the opposite is true (the fraction with exactly one solution is small).

I don't quite see how you get the numbers for minimal sudokus, but they seem "reasonable". Good to know there are plenty enough


Dukoso never explained how he got his answer....

1e37/1e22 = 1e15...


Absolute counts for number of distinct minimal puzzles generated from 6 different 34 clue sub-grids :
Code: Select all
              seed     10000      100000    500000   
puzzle           
ravel34                1259       1305      1305     
25clueplus9-1           886        910       910     
25clueplus9-2          5591      11577     12104     
25clueplus9-3          3038       4068      4080       
25clueplus9-4           885        894       894     
25clueplus9-5          5456      11407     11955 


The number of grids found at the 10000 is proportionate to the final number.......so I could certainly analyse a few more different subgrids and confirm the average of around 5000

I will also confirm the increments following the addition of a single clue.

C
coloin
 
Posts: 2515
Joined: 05 May 2005
Location: Devon

Postby ravel » Sat Sep 23, 2006 12:46 pm

Coloin,

very interesting, dukusos estimation (however he achieved it), which only differs by a factor of 10 to ours.

I have read your post now more carefully and the tricky way, how you get your estimation, seems plausible for me.

Here is another idea, how to verify the estimations. I found an old statistic by dukuso about the distribution of a million randomly generated sudokus, number 4 is for a random grid:

Code: Select all
clues , 1)     2)      3)      4)   
----------------------------------
17,     0       0       0       0
18,     0       0       0       0
19,     0     4.3       0       5
20,    59     182       0     254
21,  2428    6051      85    8268
22, 33966   61826    1775   80869
23,170727  227480   21648  273518
24,342620  352289  116766  364111
25,298349  248568  286836  209158
26,122691   86061  329853   56006
27, 25237   15908  185028    7284
28,  2733    1547   50469     505
29,   205      74    7040      22
30,   7.6     8.6     486       0
31,     0       0      12       0
32,     0       0     2.4       0
-------------------------------------
aver.24.38   24.10   25.72   23.88

(Maybe you have a better one ?)

According to it, 36% - about a 3rd - of the minimal puzzles in an average grid have 24 clues.
There are 24 out of 81, about 2e21 (hope its correct, calculated manually) 24-clue sudokus in a grid. If we assume, that there are about 2e16 minimal sudokus and a third of them has 24 clues, then 1 out of 300000 random 24-clues should be minimal (3 in a million). It should not be very time consuming to verify this.
ravel
 
Posts: 998
Joined: 21 February 2006

Postby Ocean » Sat Sep 23, 2006 4:08 pm

Interesting discussions and statistics. Have a few comments though.
ravel wrote:According to it, 36% - about a 3rd - of the minimal puzzles in an average grid have 24 clues.

Not so sure about this. The only thing we can say is that dukuso's method finds 36% minimal 24s in an average grid. The method used appears to be (strongly or weakly?) biased towards the lower number of clues.

Why? - the explanation is quite simple, as the example below shows:

#
100000002020030040000105000006000300030010070008000900000608000070040020300000008 #G21
000000002020030040000105000006000300030010070008000900000608000070000020300000008 #M19
100000002020030040000105000006000300030010070008000900000608000070040020000000008 #M20
#


The non-minimal 21 contains two minimals (one 19 and one 20). Dukosu's method would find 67% minimal 19s and 33% minimal 20s in such grids (compared to the actual 50-50).

Note also coloin's observation, which is the same fenomenon:

coloin wrote:The program that I use finds it easier to come up with smaller puzzless..not sure why.

For verification (or disproof):
1. Count the number of minimal 33s, 32s, ... etc. in the grid ravel34 (sums up to 1305?).
2. Find 'random' minimals in a set of 13050 samples (say) of ravel34, one minimal per sample. (Or better use 13050 different scrambled versions of ravel34, in order not to get confused by 'duplicate' finds).
3. Compare the two results and decide if there is a bias or not.

Thought it was worth to mention, but I have not checked how much influence the bias has on calculations or statistics - whether it is significant or negligible.

#
[Edit/Added:]
Actual stats for grid ravel34. It contains only 24s, 25s, 26s, 27s and 28s, and is probably not too typical. The grid contains 1305 different minimal sudokus. The 'random' column is the number of puzzles found in 13050 grids divided by 10. If there was no bias, the two coulumns should be roughly equal.

Code: Select all
Distribution of minimal puzzles in grid 'ravel34':
000000080000700163020000097002300000000004702910508306740036025050807009039200601

clues,    actual           random       bias factor
-------------------------------------------------------
24,   104 ( 7.97%)     284.1 (21.78%)     2.73
25,   691 (52.95%)     733.3 (56.19%)     1.06
26,   444 (34.02%)     257.3 (19.72%)     0.58
27,    63 ( 4.83%)      29.3 ( 2.23%)     0.46
28,     3 ( 0.23%)       1.1 ( 0.08%)     0.37
--------------------------------------------------------
aver. 25.36             25.03

The bias is quite clear. The fraction of 24s (the smallest occuring) is overestimized by a factor 2.7, the fraction of 28s is underestimated by a factor 0.37, while the average is not much affected in this case.
Last edited by Ocean on Sun Sep 24, 2006 6:27 pm, edited 1 time in total.
Ocean
 
Posts: 442
Joined: 29 August 2005

Postby ravel » Sat Sep 23, 2006 5:33 pm

Good observations, Ocean,

we cannot be sure, that dukusos method really gives the correct distribution.
Maybe Coloin knows, how he did it it. When i understood it right, he started with the full grid and eliminated clues. When he stopped with the first minimal grid, i think, the distribution should be correct (?) [edit: i think, i got it now - no, because more 20-clues would lead to the 19 than to the 20]. When he continued to try to find a unique sudoku with less clues, then not.

But maybe other statistics are already available (Coloin said, the average sudoku would have 25 clues, not 24).
ravel
 
Posts: 998
Joined: 21 February 2006

Postby ravel » Sat Sep 23, 2006 7:36 pm

Think, we could do it this way:
When we generate say a million 25-clue sudokus we e.g. could find, that 10 are minimal, 550000 have multiple solutions and the rest is not minimal. If we do it also for other clue numbers, we should be able to make an estimation for the total number then (though i am no expert in statistics).
ravel
 
Posts: 998
Joined: 21 February 2006

Postby coloin » Mon Sep 25, 2006 8:25 pm

Thanks Ocean for explaining that.........dukuso's program does remove clues untill multiple solutions are obtained and minimality is achieved.

I estimated an extra clue to 25 to allow for this discrepancy - it showed up particularly in generating the minimal puzzles in other subgrids.The larger puzzles were the last ones to be found as a distinct puzzle amongst the many duplicates.

If you can effectivly randomize the placements of 25 clues from a full grid then ravel's latest method might confirm our results.
I have a hunch that our special grids such as the SF and SFB have more puzzles !!!!! This would be reflected by this method. [More non minimals , ? less minimals, less multiple solutions]

I wonder what a more "accurate" distribution of puzzle stats is now.

C
coloin
 
Posts: 2515
Joined: 05 May 2005
Location: Devon

Postby ravel » Tue Sep 26, 2006 12:19 pm

My last idea does not seem to be very fruitful (and i had a logical mistake, because most random grids are both not unique and not minimal). In 25 mio random sudokus with 24-28 givens i only found 12 minimal ones, which is too few to be representative (though it confirms the earlier estimation of e16 minimals in this grid):
(the search for minimals in the 28-clues is not finished)

Code: Select all
For 5 mio random sudokus each:
clues, unique(all minimals), minimal, all possible, estimated minimals
 28:    7050(?>20000)           0(?)     4.4e21
 27:    2186(14256)             3        2.3e21        1.4e16
 26:     473(1627)              1        1.1e21          2e14
 25:     127(206)               8        5.2e20          9e14
 24:      10(16)                0        2.3e20         <5e13

Btw, gsf, i dont know, how to stop calculating the minimals after one is found, -m1 does not work.
And is there a possibility to specify a column offset in the input file ?
ravel
 
Posts: 998
Joined: 21 February 2006

Postby gsf » Tue Sep 26, 2006 3:05 pm

ravel wrote:Btw, gsf, i dont know, how to stop calculating the minimals after one is found, -m1 does not work.
And is there a possibility to specify a column offset in the input file ?

what was the full command line where -m1 misbehaved?

by column offset you mean a one puzzle per line file where the puzzle
is in the Nth space separated column? -cN
if the separator character is , then -cN,
if you'd like to label the columns for -f%(name)f then -cN,name-1,name-2,...

you can put this magic string as the first line in a puzzle data file
Code: Select all
#!sudoku -c5,

which means one puzzle per line, comma-separated fields, puzzle in 5th field
and the sudoku command will pick out the puzzles when it reads the file
without specifying the -c command line option

I have been using this for some puzzle collection flat-file databases:
Code: Select all
-c5,SYMMETRY,STEPS,CLUES,MINIMAL,PUZZLE,SUBMITTER,DATE
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Next

Return to General