The New Sudoku Players' Forum

by **ravel** » Mon Aug 28, 2006 10:07 pm

There was a question somewhere in this forum, how many sudokus there are at all. I only could give the vague answer "quadrillions". I would like to know it more exactly. When i remember right, Coloin already suggested to generate a lot of sudokus to a given grid and count the duplicates to be able to estimate the number of sudokus in it. We know that different grids will have a different number of minimal sudokus.
So i have 3 questions.
Is there already known more about, how much minimal sudokus a random or special grids has ?
Are there other suggestions, how the number can be estimated ?
Are there free programs available, that can generate random sudokus to given grids from a file (or randomly created grids) and write them to a file including duplicates (or just count them) ? If it works under linux, i could run it for 2 weeks, since i will be on holidays the next 2 weeks.
When we make such a search for a number of random grids, we should be able to give a good statistically based answer and get a feeling how much the number varies for different grids.
Remark: I am also curious what the "structures" thread will bring the next days, so i will wait patiently that people engaged there have time

by **ravel** » Wed Aug 30, 2006 1:05 pm

My first try brought a surprising high number of minimal puzzles to be expected in a grid. Whereas i had thought about 10-100 millions, it now seems, that there are more than 10^15 (quadrillions).
With gsf's solver i started to generate (all) minimal puzzles to 2 grids (one is an arbitrary one, the other the solution grid of Ocean #1/M21/D21, the current hardest). The program starts from left to right dropping all clues that are not needed for a unique solution. Then following minimal sudokus always keep as much as possible givens (from left to right) in common with the last solution.
So far i have more than 50000 sudokus for each grid.
What i assumed, is confirmed in the samples so far: When the next empty cell (read from left) is set to a number, about the same amount of puzzles can be expected (with cells left of it unchanged). You can see below that puzzles with double number in many cases have exactly one new number changed (marked in the puzzle above with ^).

Code: Select all: 1 ...............163.......97..2...........47...1.5.83.6.4.^36.25.5...7..9.392..6.. 1000 ...............163.......97..2...........47...1.5.83.6^4.93.8.52.6.1...9.39..5... 2000 ...............163.......97..2...........47...1.5^83.67...368.525...74...39..5... 4000 ...............163.......97..2...........47...1.52.346.41.36.25.5.81...98.9.4..7. 8000 ...............163.......97..2...........47...1^5283...4193.8.5.5...74..83..4.6.1 16000 ...............163.......97..2...........47..^17.283.67..9.6.252.6.1...98.9.4..7. 32000 ...............163.......97..2...........47..9..52..4674.936.252.68.7..9.3.2..... 55531 ...............163.......97..2...........47..9.75.8.467.193.82.25...7..9.3...56.. 1 ................35...59.1................42.8.6.2..94..47^.8..1.8.156...36.7..89 1000 ................35...59.1................42.8.6.2..94.^479.8..12983..6...36.7.5.. 2000 ................35...59.1................42.8.6.2..94.5....8.2.2.83.5.74.36.7.5.9 4000 ................35...59.1................42.8.6.2.^94.5.796...1..83..6..1...72.89 8000 ................35...59.1................42.8.6.2^7.4..4796.32...8.1..74.364..5.. 16000 ................35...59.1................42.8.6^25.94.5....8..12.83..6.41.64.2.8. 32000 ................35...59.1................42.8^61..7.4.54...8.2.29..1.6...364..5.. 59861 ................35...59.1................42.88..257.4354.96..2.2983.56...364...8.

So in the first list 36 positions are still to place and in the second 37, which leads to an expected number of 2^36*64000 or 2^37*64000, i.e. more than 10^15 sudokus.

Remark: More than 10000 of the first puzzles of the second grid can possibly be solved almost identically: Fill out all numbers under the /-diagonal and then solve it with a quad plus BUG+1.

by **Red Ed** » Wed Aug 30, 2006 2:50 pm

How does your method perform on the 2x2 case, for which we know the number of minimal puzzles?

by **ravel** » Wed Aug 30, 2006 3:33 pm

I cannot try, because gsf's program only handles 3x3 sudokus.

by **ravel** » Tue Sep 19, 2006 2:40 pm

After more than 2 weeks i had to stop gsf's program now, generating minimal sudokus for 2 given grids. It found about 200000 sudokus each varying only the last 39 cells with only 7 and 6 resp. fixed givens in the first 42 cells ! The trend that switching to the next rightmost new given always gives about the double number of minimal sudokus, still held, as the following table shows, though any estimation based on it must be very vague. Listed are the number of free unchanged cells read from left (the search algorithm varies the rigthmost cells first to find the next minimal sudoku), the number of minimal sudokus found so far, the current sudoku containing the new position and the estimated number of sudokus in that grid calculated as the number of found sudokus multiplied with 2 power the number of free unchanged cells.
So i still expect between 10^15 and 10^17 sudokus in the two grids. If this would be representative, it would mean a total number of about 10^25 essentially different (non isophorphic) sudokus.
I have no good idea in the moment, how better estimations could be achieved or useful lower/upper bounds could be calculated. Maybe coloins recent approach is promising (hope i understood it right): Calculate all say 20-clues for a grid, and millions of random minimal grids and then estimate the overall number from the statistics.

Code: Select all: ...............163.......97..2...........47...1.5.83.6.4..36.25.5...7..9.392..6.. (first generated) 40 3516 ...............163.......97..2...........47...1.5.8346.4..368.5.5...7..9.392..6.1 4*10^15 39 3975 ...............163.......97..2...........47...1.52.346.41.36.25.5...7..9.392..6.. 2*10^15 38 10023 ...............163.......97..2...........47...17..83.6.4..36.25.5...7..9.3.2..6.. 3*10^15 37 22785 ...............163.......97..2...........47..9..5.8.4..41.36.25.5...7..9.3.2..6.. 3*10^15 36 86763 ...............163.......97..2...........47.2.1.5.83...4..36.25.5...7..9.392..6.. 6*10^15 35 122537 ...............163.......97..2...........471..1.5.83.6.4..36.252....7..9.39..56.. 4*10^15 192326 ...............163.......97..2...........471.917.283..74..368..25..1.4...3..4.... (last generated) ................35...59.1................42.8.6.2..94..47..8..1..8.156...36.7..89 (first generated) 40 5979 ................35...59.1................42.8.6.2.7.4..47..8..1..8.156...36.7..89 6*10^15 39 14715 ................35...59.1................42.8.6.25.94..47..8..1..8.1.6...3..7.5.9 8*10^15 38 22191 ................35...59.1................42.8.61...94..47..8..1.98.1.6...3..725.9 6*10^15 37 54118 ................35...59.1................42.88..2.7.43.47..8..1..8.156...36.7..89 7*10^15 36 113784 ................35...59.1................4218.6.2..94..47..8.....8.156...36.7..89 8*10^15 236401 ................35...59.1................4218861...9.3547..8....9.31.6..1...7..89 (last generated)

Note: Essentially different/not isomorphic does not mean, that also the solution paths essentially differ. I tried about 30 of the puzzles of the second set including the first and the last in the Sudoku Explainer and all of them were solved with the same quad and turbot fish (!). So it seems that more than 200000 essentially different sudokus are so similar for a human solver that he would have a problem to distinguish them.

by **coloin** » Thu Sep 21, 2006 2:08 pm

Well...it is a tricky one.
My first and second attempts at this were unsuccesful but my third attempt may be valid - although Ive made many assumptions and approximations
First Attempt
-the reverse of the "birthday" problem in mathematics - coincidentally was an old research topic of r.e.s
The formula k*=SQRT(2*N) .....rearranged gives N ~ 1/2 * k^2
Where N is the estimated number of grids and k is the number of grids generated when a significant number of duplicates emerge......
This was all very well but at the time I struggled to be able to recognise duplicates from more than a 64000 sample size.
Since then no grid has generated a single duplicate in one million grids, which makes me confident that we have more than 5^10^11 minimal puzzles per grid. [no marks really]
Second Attempt
My second method was to compute [using checker] the number of puzzles of size 20 in a particular grid
Accepting the potential for errors in this calculation

Code: Select all: Est.no.of 20-clue puzzles Incid. of puzzles per 10^6 Number of puzzles estimated Grid 74-2 3000 1 3*10^9 MC 638 5 1.3*10^7

We know this is wrong
There are at least 60,000 33s in the MC grid ! - none appeared by random generation !
Both these estimations are way too low and....... must be flawed - either as a bug or skewed by a probability quirk. [I think the latter]
Third Attempt
I think this is similar to your method .....onlyI have found the average number of puzzles in a 34 clue minimal sudoku.....and made an approximation as to how many more puzzles each additional clue generates.
Take a random 25 clue puzzle ......add 9 non minimal random clues [from the solution grid][one clue in each box]
I think this is the same one as ravel used. [with 9 more clues]

Code: Select all: +---+---+---+ |...|...|.8.| |...|7..|163| |.2.|...|.97| +---+---+---+ |..2|3..|...| |...|..4|7.2| |91.|5.8|3.6| +---+---+---+ |74.|.36|.25| |.5.|8.7|..9| |.39|2..|6.1| +---+---+---+ ravel 34

Count up almost all the puzzles......This can be done.....

The program that I use finds it easier to come up with smaller puzzless..not sure why.

Here are the results of the numberof minimal puzzles found per grids generated.

Code: Select all: puzzles generated 10000 100000 500000 2000000 ravel32 98 ravel33 233 ravel34 1259 1305 1305 1305 ravel35 4507 4563 ravel36 10482* 25clueplus9-1 886 910 910 25clueplus9-2 5591 11577 12104 25clueplus9-3 3038 4068 4080 25clueplus9-4 885 894 894 25clueplus9-5 5456 11407 11955 average 5208 So starting from 34 clues how many more puzzles do you get by adding an additional clue ? In the ravel 32 98 total In the ravel 33 135 new 233 total factor 135/98 = 1.37 In the ravel 34 1072 new 1305 total factor 1072/233 = 4.6 In the ravel 35 3258 new 4563 total factor 3258/1305 = 2.5 In the ravel 36 5919 new 10482 total factor 5919/4563 = 1.2 * * most probably thereare more puzzles than 10482

If we ignore the effect of the unavoidable clues...
Assuming the average puzzles size is 25 clues. [it is]
Assuming the average puzzle count in a 34 cell non minimal puzzle is around 5000
Also assuming I have got these estimates correct !

Code: Select all: In a 35 clue grid a clue is present in 25 ot of 35 ~ 75% therefore a 35 clue grid will 100/25 = 4.0 times more puzzles than a 34 In a 40 clue grid a clue is present in 25 ot of 40 ~ 62% therefore a 40 clue grid will 100/38 = 2.6 times more puzzles than a 39 In a 45 clue grid a clue is present in 25 ot of 45 ~ 55% therefore a 45 clue grid will 100/45 = 2.2 times more puzzles than a 44 In a 50 clue grid a clue is present in 25 ot of 50 ~ 50% therefore a 50 clue grid will 100/50 = 2.0 times more puzzles than a 49 In a 60 clue grid a clue is present in 25 ot of 60 ~ 41% therefore a 60 clue grid will 100/59 = 1.7 times more puzzles than a 59 In a 70 clue grid a clue is present in 25 ot of 70 ~ 35% therefore a 70 clue grid will 100/65 = 1.5 times more puzzles than a 69 In a full grid a clue is in 25 out of 81 ~ 30% therefore a 81 clue grid will 100/70 = 1.4 times more puzzles than a 80 Approximating ! adding 47 clues ! [35-36][37-38][39-42][43-45] [46-55] [56-65] [66-75] [76-81] 5000 * [4^2]*[3^2]*[2.5^4]*[2.2^3]*[2^10]*[1.7^10]*[1.5^10]*[1.4^6] = 5000*16*9*39*10*1024*201*57*7 = 23,060,356,300,800,000 ~ 2^16

Thats OK then

C

by **ravel** » Thu Sep 21, 2006 4:21 pm

Thanks Coloin,

without having it read accurately it seems, that your attempt - though somewhat similar to mine - is different enough (using additional premises like the distribution of random sudokus concerning the numbers of clues and another way of extrapolation) to be looked as independant.
So i am happy for now to see about the same result.

by **coloin** » Fri Sep 22, 2006 11:08 am

From page 15 of the Sudoku Maths thread.....I have just found this snippet from Ocean and dukoso, which is relevant to this thread....

Ocean wrote:Surprisingly I have not been able to find discussions of these two open problems:

1. What is the number of UNIQUE VALID SUBGRIDS?
2. What is the number of unique valid subgrids with N clues?

(Where a "Unique Valid Subgrid" means a puzzle with one and exactly one solution).

Some trivial numbers:

The total number of subgrids with zero, one or many solutions, is 10^81 - covering all from 0 to 81 clues.

For each valid grid, we can find 2^81 (=2.4e24) subgrids where that particular grid is a solution (mostly among other solutions).

For each valid grid, we can find "81 over N" subgrids with N clues, where that particular grid is among the solutions.

(81 over 15) = 8.1e+15
(81 over 16) = 3.3e+16
(81 over 17) = 1.2e+17
(81 over 18) = 4.5e+17
(81 over 19) = 1.5e+18
(81 over 20) = 4.6e+18
(81 over 21) = 1.3e+19
(81 over 22) = 3.7e+19
(81 over 23) = 9.5e+19
(81 over 24) = 2.3e+20
(81 over 25) = 5.2e+20

Since there are 6670903752021072936960 valid sudokugrids (from 5472730538 equivalence classes), we have the upper bounds:

the number of valid subgrids is less than 2^81 * 6670903752021072936960 (=1.61e46).

(and also: the number of essentially different valid subgrids is less than 2^81 * 5472730538 [=1.3e34]).

The number of unique valid subgrids is probably a lot lower than these upper bounds, but still a huge number.

...............

dukuso wrote:sudokugrids:1e22 from 1e10 classes
sudokus:1e47 from 1e35 classes (1-solution)
minimal sudokus:1e37 from 1e25 classes (no superfluous clues)

Ocean wrote:Thank you for the numbers!

It immediately surprised me that the number of sudokus "equals" my rough upper bounds, but the explanation is simple: the vast majority of sudokus have 30-50 clues, and in this range the fraction of puzzles with more than one solution is very small. For small number of clues, the opposite is true (the fraction with exactly one solution is small).

I don't quite see how you get the numbers for minimal sudokus, but they seem "reasonable". Good to know there are plenty enough

Dukoso never explained how he got his answer....

1e37/1e22 = 1e15...

Absolute counts for number of distinct minimal puzzles generated from 6 different 34 clue sub-grids :

Code: Select all: seed 10000 100000 500000 puzzle ravel34 1259 1305 1305 25clueplus9-1 886 910 910 25clueplus9-2 5591 11577 12104 25clueplus9-3 3038 4068 4080 25clueplus9-4 885 894 894 25clueplus9-5 5456 11407 11955

The number of grids found at the 10000 is proportionate to the final number.......so I could certainly analyse a few more different subgrids and confirm the average of around 5000

I will also confirm the increments following the addition of a single clue.

C

by **ravel** » Sat Sep 23, 2006 12:46 pm

Coloin,

very interesting, dukusos estimation (however he achieved it), which only differs by a factor of 10 to ours.

I have read your post now more carefully and the tricky way, how you get your estimation, seems plausible for me.

Here is another idea, how to verify the estimations. I found an old statistic by dukuso about the distribution of a million randomly generated sudokus, number 4 is for a random grid:

Code: Select all: clues , 1) 2) 3) 4) ---------------------------------- 17, 0 0 0 0 18, 0 0 0 0 19, 0 4.3 0 5 20, 59 182 0 254 21, 2428 6051 85 8268 22, 33966 61826 1775 80869 23,170727 227480 21648 273518 24,342620 352289 116766 364111 25,298349 248568 286836 209158 26,122691 86061 329853 56006 27, 25237 15908 185028 7284 28, 2733 1547 50469 505 29, 205 74 7040 22 30, 7.6 8.6 486 0 31, 0 0 12 0 32, 0 0 2.4 0 ------------------------------------- aver.24.38 24.10 25.72 23.88

(Maybe you have a better one ?)

According to it, 36% - about a 3rd - of the minimal puzzles in an average grid have 24 clues.
There are 24 out of 81, about 2e21 (hope its correct, calculated manually) 24-clue sudokus in a grid. If we assume, that there are about 2e16 minimal sudokus and a third of them has 24 clues, then 1 out of 300000 random 24-clues should be minimal (3 in a million). It should not be very time consuming to verify this.

by **Ocean** » Sat Sep 23, 2006 4:08 pm

Interesting discussions and statistics. Have a few comments though.

ravel wrote:According to it, 36% - about a 3rd - of the minimal puzzles in an average grid have 24 clues.

Not so sure about this. The only thing we can say is that dukuso's method finds 36% minimal 24s in an average grid. The method used appears to be (strongly or weakly?) biased towards the lower number of clues.

Why? - the explanation is quite simple, as the example below shows:

#
100000002020030040000105000006000300030010070008000900000608000070040020300000008 #G21
000000002020030040000105000006000300030010070008000900000608000070000020300000008 #M19
100000002020030040000105000006000300030010070008000900000608000070040020000000008 #M20
#

The non-minimal 21 contains two minimals (one 19 and one 20). Dukosu's method would find 67% minimal 19s and 33% minimal 20s in such grids (compared to the actual 50-50).

Note also coloin's observation, which is the same fenomenon:

coloin wrote:The program that I use finds it easier to come up with smaller puzzless..not sure why.

For verification (or disproof):
1. Count the number of minimal 33s, 32s, ... etc. in the grid ravel34 (sums up to 1305?).
2. Find 'random' minimals in a set of 13050 samples (say) of ravel34, one minimal per sample. (Or better use 13050 different scrambled versions of ravel34, in order not to get confused by 'duplicate' finds).
3. Compare the two results and decide if there is a bias or not.

Thought it was worth to mention, but I have not checked how much influence the bias has on calculations or statistics - whether it is significant or negligible.

#
[Edit/Added:]
Actual stats for grid ravel34. It contains only 24s, 25s, 26s, 27s and 28s, and is probably not too typical. The grid contains 1305 different minimal sudokus. The 'random' column is the number of puzzles found in 13050 grids divided by 10. If there was no bias, the two coulumns should be roughly equal.

Code: Select all: Distribution of minimal puzzles in grid 'ravel34': 000000080000700163020000097002300000000004702910508306740036025050807009039200601 clues, actual random bias factor ------------------------------------------------------- 24, 104 ( 7.97%) 284.1 (21.78%) 2.73 25, 691 (52.95%) 733.3 (56.19%) 1.06 26, 444 (34.02%) 257.3 (19.72%) 0.58 27, 63 ( 4.83%) 29.3 ( 2.23%) 0.46 28, 3 ( 0.23%) 1.1 ( 0.08%) 0.37 -------------------------------------------------------- aver. 25.36 25.03

The bias is quite clear. The fraction of 24s (the smallest occuring) is overestimized by a factor 2.7, the fraction of 28s is underestimated by a factor 0.37, while the average is not much affected in this case.

by **ravel** » Sat Sep 23, 2006 5:33 pm

Good observations, Ocean,

we cannot be sure, that dukusos method really gives the correct distribution.
Maybe Coloin knows, how he did it it. When i understood it right, he started with the full grid and eliminated clues. When he stopped with the first minimal grid, i think, the distribution should be correct (?) [edit: i think, i got it now - no, because more 20-clues would lead to the 19 than to the 20]. When he continued to try to find a unique sudoku with less clues, then not.

But maybe other statistics are already available (Coloin said, the average sudoku would have 25 clues, not 24).

by **ravel** » Sat Sep 23, 2006 7:36 pm

Think, we could do it this way:
When we generate say a million 25-clue sudokus we e.g. could find, that 10 are minimal, 550000 have multiple solutions and the rest is not minimal. If we do it also for other clue numbers, we should be able to make an estimation for the total number then (though i am no expert in statistics).

by **coloin** » Mon Sep 25, 2006 8:25 pm

Thanks Ocean for explaining that.........dukuso's program does remove clues untill multiple solutions are obtained and minimality is achieved.

I estimated an extra clue to 25 to allow for this discrepancy - it showed up particularly in generating the minimal puzzles in other subgrids.The larger puzzles were the last ones to be found as a distinct puzzle amongst the many duplicates.

If you can effectivly randomize the placements of 25 clues from a full grid then ravel's latest method might confirm our results.
I have a hunch that our special grids such as the SF and SFB have more puzzles !!!!! This would be reflected by this method. [More non minimals , ? less minimals, less multiple solutions]

I wonder what a more "accurate" distribution of puzzle stats is now.

C

by **ravel** » Tue Sep 26, 2006 12:19 pm

My last idea does not seem to be very fruitful (and i had a logical mistake, because most random grids are both not unique and not minimal). In 25 mio random sudokus with 24-28 givens i only found 12 minimal ones, which is too few to be representative (though it confirms the earlier estimation of e16 minimals in this grid):
(the search for minimals in the 28-clues is not finished)

Code: Select all: For 5 mio random sudokus each: clues, unique(all minimals), minimal, all possible, estimated minimals 28: 7050(?>20000) 0(?) 4.4e21 27: 2186(14256) 3 2.3e21 1.4e16 26: 473(1627) 1 1.1e21 2e14 25: 127(206) 8 5.2e20 9e14 24: 10(16) 0 2.3e20 <5e13

Btw, gsf, i dont know, how to stop calculating the minimals after one is found, -m1 does not work.
And is there a possibility to specify a column offset in the input file ?

by **gsf** » Tue Sep 26, 2006 3:05 pm

ravel wrote:Btw, gsf, i dont know, how to stop calculating the minimals after one is found, -m1 does not work.
And is there a possibility to specify a column offset in the input file ?

what was the full command line where -m1 misbehaved?

by column offset you mean a one puzzle per line file where the puzzle
is in the Nth space separated column? -cN
if the separator character is , then -cN,
if you'd like to label the columns for -f%(name)f then -cN,name-1,name-2,...

you can put this magic string as the first line in a puzzle data file

Code: Select all: #!sudoku -c5,

which means one puzzle per line, comma-separated fields, puzzle in 5th field
and the sudoku command will pick out the puzzles when it reads the file
without specifying the -c command line option

I have been using this for some puzzle collection flat-file databases:

Code: Select all: -c5,SYMMETRY,STEPS,CLUES,MINIMAL,PUZZLE,SUBMITTER,DATE

The New Sudoku Players' Forum

How many minimal sudokus has an average grid ?

How many minimal sudokus has an average grid ?