## Su-Doku's maths

Everything about Sudoku that doesn't fit in one of the other sections
Hi, I wrote the post below. Some more comments:

Besides narrowing down the number of solutions by 9!*5184 (as opposed to 9!*72 by the other method), I think are other reasons why this approach may be faster.

One of the things to note is that if the digits 1-8 have been placed, then there is one and only one way to place the 9's, giving a valid grid. So there are only 7 steps to make (place 2,3,..,8) before finding a solution.

Another thing is that after, say, the 1's and 2's have been placed, we care only which squares have been used, not which digits are in which squares. That is, a 12 above a 21 counts the same as a 21 above a 12. The data structure that would be used then, is something to store the board (maybe boolean of length 81) plus an integer storing the number of ways to get to that specific board state - and this would have to be stored in, say, a binary tree for easy searching ("has this board state been reached before?"). This would cut down on the speed of the program, at the expense of added storage.

(How much will it cut down the speed? I don't know. I looked at one random completed Sudoku grid and found 19 instances of the "12 above a 21" phenomenon. Note that if we have fixed where the 1's go, then there is no saving when placing the 2's. But there is a little with placing the 3's. My guess is that at least ~1/8 of those diagrams are repeats. This fraction would increase greatly when placing the 4's. How much storage is required? I don't know.)

I am not the most gifted programmer and thus am not sure whether I can implement the method in the preceeding paragraphs. However, I can do the naive brute-forcing, and the first step of finding the ~2000*7 possible fillings for the digits 2,3,...,8. If anyone sees promise with this method and would like to help, I would appreciate it.

Brian

Anonymous wrote:My guess: 2.8 x 10^25.

Given any digit, there are (3!)^6 = 46656 ways of filling the grid satisfying the rules. Given a filling with one digit and a filling with another, the probability that the two fillings are compatible is roughly (8/9)^9 = 34.6% (in each of the nine boxes we check that the two digits aren't in the same square).

Now, we choose a filling for all nine digits, (46656)^9 possible ways. The probability that these fillings are compatible is (.346)^36 (since there are (9 choose 2) = 36 choices of pairs of digits to check compatibility). Multiplying these two numbers yields the answer of 2.8 x 10^25.

This, of course, is a rough estimate, as I have assumed a lot of independence that is not true. But I think it is a reasonable guess.

Has anyone tried this method using a computer? That is, filling in all the 1's first, then filling in the 2's, and so on, rather than filling the upper left box with 1-9, then the upper middle box, and so on?

We may assume (by multiplying by 9! afterwards) that the upper left box is 123, 456, 789, which cuts down the possible fillings to 46656/9 = 5184 for each digit. Furthermore, we may choose exactly where the 1's go (by later multiplying by 5184). Then we eliminate the fillings for the digits 2,3,... that aren't compatible with the filling of 1's, leaving about (8/9)^8 * 5184 = 2000 possible fillings for each digit. With an explicit list of the 46656 possible fillings, it is simple to check whether two fillings are compatible (just list the squares the digits occupy and see whether any are the same).

Is this a practical method?
Guest

Posts: 312
Joined: 25 November 2005

coloin wrote:Extrapolating "guest" idea

Boxes 1 5 an 9 are independant and each are 9!

boxes 2 and 6 are similar [x possibly valid ways]
boxes 4 and 8 are similar [y possible valid ways]
boxes 3 and 7 are similar [1 valid way to fill in]

represented as
9! x 1
y 9! x
1 y 9!

Something of which I'm really confused:

does this mean that you can choose what you fill in box 1, box 6 and box 9 and that the number of possibilities to fill in these cells are always the same?

As in, x and y will always have the same value for all values of box 1, box 6 and box 9

(A basic test of mine showed that that is not the case... I'm currently going over the code and the results to see if I made a mistake somewhere)
Animator

Posts: 469
Joined: 08 April 2005

### It was attractive but wrong

No the x = 56 number, attractive though it is, is fundementally flawed the way I have outlined. There are, as you have righly posted, potentiallly small numbers of variatians which will give rise to problems in box 6 and 4 and thereafter 3 which cannot be filled validly.

I get the feeling someone much cleverer has been down this line.

for value of x1 [ box 6] is < 56
for value of x2 [box 2] probably less than x1
for value of y1 [box 4] is ?
for value of y2 [box 8] is ?

I am sure a program can be written to can calculate the proportion of x1 [56] which gives the number of ways to populate box 6 with a given small series of variations of boxes in 5 and 9.

This may or may not be easy !

Then we can work on x2
coloin

Posts: 1715
Joined: 05 May 2005

The main thing that botters and confuses me is this:

Assume the following grid:

1 2 3 | * * * | * * *
4 5 6 | * * * | * * *
7 8 9 | * * * | * * *
-----------------------
* * * | 1 2 3 | * * *
* * * | 4 5 6 | * * *
* * * | 7 8 9 | * * *
-----------------------
* * * | * * * | 1 2 3
* * * | * * * | 4 5 6
* * * | * * * | 7 8 9

If I try to fill this grid in then I get a result of 283576 valid solutions.

Now assume we change box 9 into:
4 5 6
7 8 9
1 2 3

Then the result reamins the same: 283576 valid solutions.

Now assume we change box 9 into (8 and 7 are swithed):
1 2 3
4 5 6
8 7 9

Then the number of valid solutions drops to 271012...

More and more I'm beginning to believe that the only way to get the answer is to do a brute force on it without making a single assumption... :(
Animator

Posts: 469
Joined: 08 April 2005

### Bothering

Yes this is bothering and has been pointed out by tinfoil and afj although I didnt really consider it odd as they were working on boxes 1,2 and 3.

su bloku also got differing values - average 133700 - from which he calculated his estimate of the number.

The obvious answer is that the change in the digits in box 9 changes the number of invalid boxes thereby created.

I think others have been working on these invalids and have made reasonable progress.

However using boxes 1 5 and 9 as independant free spirits maybe simplify the calculation.

Guest/Brian seems to have a workable program - the clock is ticking !
coloin

Posts: 1715
Joined: 05 May 2005

Anonymous wrote:However, I can do the naive brute-forcing, and the first step of finding the ~2000*7 possible fillings for the digits 2,3,...,8.

I did this, found the 17972 possible fillings for the digits 2,3,...,8 (half were 2097 and half were 2396), and wrote the brute-force program.

My computer is nearly 6 years old, a Celeron 466Mhz (I believe). It finds about 400,000 solutions per minute. Given that the estimate of 6x10^21 is correct, dividing by 5184*9! gives 3x10^12 solutions, which would take more than 100,000 hours on my computer. Does anyone know by how much I can improve the 100,000 hours by using a faster computer (I have access to some at school)?

Brian
Guest

Posts: 312
Joined: 25 November 2005

Does anyone know by how much I can improve the 100,000 hours by using a faster computer

One thing you could do is, after placing digits 1-5, represent each entry in the (by now very short) list of possible digit "templates" as a 32-bit int. This means the arithmetic in the inner loop (i.e. digits 6-9) will be very quick. It may also pay to do a 81->64 bit compression right at the start (i.e. given the top left box fill and the placement of the 1s) so as to make the work for digits 2-4 reasonably quick as well.

I think your idea of fixing the top-left box contents + position of 1s is excellent. The solution is coming soon, I can feel it ...
Guest

Anonymous wrote:Does anyone know by how much I can improve the 100,000 hours by using a faster computer (I have access to some at school)?

That is hard, if not impossible to tell without knowing the language it has been written in, and without seeing the code... (Perhaps you should put the code online somewhere?)
Animator

Posts: 469
Joined: 08 April 2005

This thread has once again been extremely active! What an interesting problem! I thought I'd add some comments on today's posts.

Firstly, the method of su bloku, mentioned in coloin's posts. My feeling is that blocks 1, 5 and 9 really aren't independent at all. Once block 1 is fixed, then different choices for blocks 5 and 9 give different numbers of solutions. (It's just the same as with blocks 1, 2 and 3 which I've been thinking about.) As for coloin's numbers x and y, these have indeed been considered earlier, by IJ, tinfoil, myself and by Tim. The answer is that x lies between 384 and 448, depending on the arrangements for blocks 1 and 5, and is not 56, which came from something else. y is either 0 or 8, depending again on the previously filled in grids. Since each choice of blocks 1, 5 and 9 may give different answers, this means that the final answer need not be divisible by 9!^3.

Brian's method looks quite nice, and may be easy to program. However, the factor of 9!*5184 is actually the same as the brute force method (one gets a factor of 72 from permuting rows and another factor of 72 from the columns, making 72^2=5184 in total). However, the guess of 2.8x10^25 is larger than the upper bound of 9!*56^2*6^12*448*8^2 that tinfoil found earlier.

It seems clear that whichever brute force method is used, we need some clever tricks to reduce the search, and then a good programmer (Bertram's program looks most impressive, and extremely promising!). I'm optimistic that one can reduce the 1150 CPU hours mentioned by Bertram to around 50 with further clever tricks, which really will make it manageable! (I might also be able to get some time on quite a quick computer.) If only I were a good programmer...

Frazer
Guest

### Is the number 56 right ?

Is the number 56 right ?

The 56 was calculated for two boxes horizontally

I am not sure if this applies to two 9! at rightangles to each other.

take boxes 5 and 9 [z is any number]

z z z [6,5,4,3] [5,4,3] [4,3]
z z z [3,2,1 0] [2,1,0] [1,0]
z z z [ 1,0] .... [ 1,0 ]...[ 1,0]
.......... z ............ z ...... z
........... z ............ z .... z
........... z ............ z ..... z
The maximum and minimum options for box 6 are shown if the order of filling them in is left to right up to down

This may mean the maximum number x1 for box6 is 6!
It also means that the minimum number for x1 is 0 ie invalid

example - this is spooky

123 ? ? ?
456 ? ? ?
789 ? ? ?
4 7 1
5 8 2
6 9 3

Can you fill in this ? [no]

Also the "fundemental" issues regarding invalid boxes arnt relevant yet

I cant believe ive come this far and not seen this !

Am I totally wrong here ?
Guest

Posts: 312
Joined: 25 November 2005

### Ah yes

That was me as guest.

Thanks afj

somewhere between 384 and 448 was a long time ago ! [sorry]

Should that be somewhere between 0 and 448 [ or 432 or 400 or 392 or 384] as sometimes [see above] there is no valid box ?

We should be able to get the average number for a reasonably large number of 1 5 9 boxes

Regards
coloin

Posts: 1715
Joined: 05 May 2005

### Does it get simpler ?

I am still persevering with the 1,5,9 boxes as 9! each

The different values you get for different grids in box 9 is reflected by the different values you get for box 6 - and the rest of the unfiled boxes. All you are doing in working out the number of grids is exactly what I am doing only I am estimating an average value for each step and you are getting a single value for all the steps. The fact that the permutations for box 6 are variable and dependant on box 9 complies with fact that you get a different number of valid sudokos with different box9 s.

There will be a set of numbers for box 6 given the 9!^2 number of permutations for boxes 5 and 9.

starting with your values [I am not sure how you got them though]

448........? times
432........? times
400........? times
392........? times
384........? times
?..........? times

average no of times over a large number of box5 and box 9 = x1

Someone must be able to compute this one - it is only 2 boxes !

----------------------------------------------

Working backwards this time from a valid suduku.

based on my box labeling

9! x2 1
y1 9! x1
1 y2 9!

x1 is covered above
x2 is the last big problem
y1*y2 is covered below

If for a given valid sudoku grid and we are postulating that box 7 is 1

Then there will only be 1 way to fill in y1 and also y2

[Is this complicated problem now suddenly getting simpler ?]

2 9 5 .. 7 4 3 .. 8 6 1
4 3 1 .. 8 6 5 .. 9 2 7
8 7 6 .. 1 9 2 .. 5 4 3

z z z .. 4 5 9 .. 2 1 6
z z z .. 3 8 7 .. 4 9 5
z z z .. 2 1 6 .. 7 3 8

7 6 3 .. z z z .. 1 8 9
9 2 8 .. z z z .. 3 5 4
1 5 4 .. z z z .. 6 7 2

3 8 7
6 1 2
5 4 9 is the only box 4

and

5 2 4
6 7 1
9 3 8 is the only box 8

based on my equation the number of grids is

9!^3*x1*x2*y1*y2*1*1*1 approx = 6x10^21

if in the simple scenario y1*y2 = 1
then x1*x2 will approx = 125563
if x2 = 448
then x1 will be in the region of 280

If someone can estimate [or better still calculate] x1 and x2 [and verify y1*y2] slightly more accurately - then we might be in business.

Not sure about estimating x2 [box 2 ] but it should be feasible if it is the last step.

There is a final spanner to scupper this of course - inevitably.

I have not allowed for the numerous occaisions that box 7 and boxes 4 and 8 dont have valid boxes - or does that get sorted out when you estimate /calculate x1 and x2 ? It might be estimating is the only easy answer.

Incidently the average number of valid grids when you have fully filled in the boxes 1,2,3,5,6 & 9 with the data from a valid sudoku is 7.
Can this be used as y1*y2 ?

Help needed now.

Regards
Last edited by coloin on Mon May 23, 2005 6:38 am, edited 3 times in total.
coloin

Posts: 1715
Joined: 05 May 2005

afj wrote:I've finally got around to reuniting the runs I did on my home computer and the runs I left going on my work computer overnight. Here are the results so far. In all cases, the first two blocks were

123 478
456 139
789 256

Various blocks 3 : possible ways to complete to a grid:

569/278/134 : 97961464
569/287/134 : 97539392

I verified these two; my program finishes in less than 2 minutes for both.

Thanks to a hint by afj I could speed up the process by another factor of 33, so you can expect an exact number later today or tomorrow.
Guest

### Nearly I think

del
Last edited by coloin on Wed Jul 27, 2005 5:39 am, edited 2 times in total.
coloin

Posts: 1715
Joined: 05 May 2005

Red Ed wrote:One thing you could do is, after placing digits 1-5, represent each entry in the (by now very short) list of possible digit "templates" as a 32-bit int. This means the arithmetic in the inner loop (i.e. digits 6-9) will be very quick. It may also pay to do a 81->64 bit compression right at the start (i.e. given the top left box fill and the placement of the 1s) so as to make the work for digits 2-4 reasonably quick as well.

I wrote it in C. I did the 81->64 bit compression as you suggested, Red Ed, and this made things about 2.5 times faster. (I actually had stored the board states as integer arrays of length 9, each entry representing a box using bitmasking (is this what it's called?); it was very simple to permute rows/columns this way.) I don't know what you mean by the first comment though.

I had another thought. We are exploiting a lot of symmetry, a factor of 9! * 5184 for relabelling the digits and permuting rows/columns within blocks. But there is more symmetry to exploit: There are 3! to permute the blocks themselves, by row or column, plus we can rotate the whole grid by 90 degrees. This gives a factor of (3!)^2*2 = 72.

After the 1's were placed (in a unique way up to symmetry), I found 2097 ways to place the 2's. I did not take into account the "12 above a 21" symmetry (that is, worry about which squares have been used, not which numbers are in them) or the factor of 72. This might bring the total number of unique fillings of 1's and 2's to ~200 (a complete guess). For each of these fillings, there are only around 800 ways to fill in the 3's, and we can exploit symmetry again to avoid a lot of overcounting. This may bring the computation down to a reasonable level.

One of the problems with this approach is that it may take a lot of effort to determine whether two partially filled grids are equivalent. There are (3!)^6*72 = 3,359,232 total ways for two grids to be equivalent - we obviously can't go through all of them. Note that, for example, with 1's and 2's filled, there are only three possibilities (up to row/column permutations) for each box: either the two filled in squares are in the (A)same row, (B) same column, or (C) neither. We can first apply the factor of 72 (not doing any row/column permutations within a box), classify each as some string "ABBCACCCB" corresponding to the nine boxes, and consider only those of the 72 whose string comes first lexicographically. For each of these, we compare to fillings already found with the same string. There are potentially (3!)^6 different ways for them to be equivalent, but again, doing things smartly (e.g. first getting the upper-left boxes to match) cuts this factor down greatly.

Theoretically, this is all possible, and it seems to result in a much faster program--at the expense of a lot more programming (and a lot more opportunities to introduce human error). With Bertram's program looking very realistic (and with his comments about exploiting more symmetries as well), I think I will just sit back and wait for his program to complete.

Brian
Guest

Posts: 312
Joined: 25 November 2005

PreviousNext