## Su-Doku's maths

Everything about Sudoku that doesn't fit in one of the other sections

### estimate for 4x4

More on estimating N(4x4):

Back on April 22 "tinfoil" gave a method for estimating N(3x3) that produced 6.6571e21, which, although it wasn't known at the time, is only about 0.2% off of the correct value. This method easily generalizes to any size Sudoku grid:
Code: Select all
2k
b(k)
N(k x k) = -----------
2  k(k-2)
(k )!

where b(k) is the total number of solutions for band 1 when box 1 contians one fixed permutation of symbols. For a 4x4 grid this is
Code: Select all
8
b(4)
N(4x4) = ------
8
(16!)

Here are the values:

b(4) = 46,506,177,615,378,500,246,568,960 = 4.65e25 = 2^39 * 3^14 * 5 * 37 * 95603
N(4x4) = (389370619663220736/175175)^8 = 5.9584e98

Back when tinfoil posted his method someone remarked that it was a leap of faith, but I have reason to believe that it's pretty reliable, and I think I have a way to estimate the accuracy of the estimate. My analysis is somewhat emperical and not at all rigorous, however, and I'll leave it up to the more serious mathematicians to decide. I'll post the details when I get a chance.

Posts: 63
Joined: 18 October 2005

### Re: estimate for 4x4

That's Kevin Kilfoil's method (I got the name from the Felgenhauer/Jarvis paper).

In his description of the method, he talks not about the number of band 1 solutions, but the number of band 1 solutions excluding the column rule. Have you done the same?

EDIT: oh, I have no brain. The column rule has no effect in a band. Perhaps if I make the font size small enough we can pretend I never posted this ...
Last edited by Red Ed on Mon Dec 12, 2005 5:34 pm, edited 1 time in total.
Red Ed

Posts: 633
Joined: 06 June 2005

### Re: estimate for 4x4

Back on April 22 "tinfoil" gave a method for estimating N(3x3) that produced 6.6571e21, which, although it wasn't known at the time, is only about 0.2% off of the correct value. This method easily generalizes to any size Sudoku grid:
Code: Select all
2k
b(k)
N(k x k) = -----------
2  k(k-2)
(k )!

where b(k) is the total number of solutions for band 1 when box 1 contians one fixed permutation of symbols.

Nice generalization, this can be used on nxm-sudoku for every n and m, and becomes a generalization of my 3xn-estaimates in http://forum.enjoysudoku.com/viewtopic.php?t=44&start=412 and http://forum.enjoysudoku.com/viewtopic.php?t=44&start=416. (What a coincidence on the url: 44 and 416... )

In general: Let b(n,m) be the number of ways to create the bands of nxm boxes (that is n rows, m columns in each box, n boxes in a band). To make the formula clearer, I say any band, regardless of symbol permutation in the first box.

Then the estimate of numbers of nxm-sudoku is
Code: Select all
m         n
b(n,m)  * b(m,n)
N(n x m) = -----------------
nm
(nm)!

This is how: Let X be the space of (nm)x(nm)-grids built by legal sudoku bands, but with no attention put on wether the columns follow the rules of Sudoku. The size of X is b(n,m)^m. Let also Y be the set of grids built by legal stacks with no attention put on the rows, #Y is then b(m,n)^n.

The nxm-sudoku grids are then the intersection of X and Y. A random x in X and y in Y are identical in a given box with probability (nm)!, and under the assumption that these probabilities are independent for each box, we arrive at the estimate above.

Btw. you should be credited for the number of legal bands in 4x4-sudoku in the Wikipedia M.o.S-page, though you didn't spell it out. It's just 16!*b(4) = 973038982740573238251518542982676480000 [Edit: I have now put you on M.o.S]
Last edited by kjellfp on Mon Dec 12, 2005 7:58 am, edited 1 time in total.
kjellfp

Posts: 140
Joined: 04 October 2005

tinfoil's argument doesn't seem to depend on the size of the blocks.
Suppose, we choose the blocks to be 1*1, then we
have 9^81 ways to populate the puzzle.

There are 9! ways to fill the first row, when we add row-constraints
and (9!)^9 ways if we just only consider row-constraints.
That gives a reduction rate of R=9^81/(9!)^9=1.8e27.

If the same rate would independently apply to the rows and
columns , then we would estimate the number
of 9*9 latin squares as 9^81/R/R=6e22.
But the true value is 5.5e27.

----------------------------------------------------------

Now let's consider 3*3 blocks and still try to estimate the
number of 9*9 latin squares with tinfoil's method.

We get 4.3e7 ways to arrange one block and 4.3e7^9=5.0e68 ways
to populate the puzzle.

The first 3*9-band can be filled in 2.1e15 ways, or 9.3e45
ways for the whole 9*9.
Reduction rate is 5.4e22.

So the number of Latin squares were 5.0e68/R/R=1.7e23
But the true value is 5.5e27.

---------------------------------------------------------

Now let's use tinfoil's method to estimate the number of
magic sudokus.
Each block can be filled in 72 ways, so 5.2e16 ways to populate
the grid.

The first 3*9 band can be filled in 5184 ways so R=1e13
and we calculate 5e-10 magic sudokus,
while the true number is 6e6.
Last edited by dukuso on Mon Dec 12, 2005 12:44 pm, edited 1 time in total.
dukuso

Posts: 479
Joined: 25 June 2005

I don't see how you're using tinfoild's (or my) estimates here.

For nxn latin squares, the estimate is

N(n,1) = b(n,1)^1 * b(1,n)^n / n!^n.

Now, b(1,n) is n! and b(n,1) is the number of nxn latin squares, so the estimated number of latin squares is then the same as the exact number of latin squares, which is just as accurate as it is useless

It is for n,m>1 we get estimates which are not exact answers, but only require band counting and not entire grid counting.
kjellfp

Posts: 140
Joined: 04 October 2005

the formula would be really nice, if it's correct.
Can't we just test it on 2*3,3*4 sudokus ?
AFAIR we already counted the bands and the sudokus,
so let's just fill the numbers into the formula...

in my latin squares computation I was using 1*1 boxes,
not 1*9. Maybe we have to require that the constraints
on the boxes and rows,columns must be the same,
i.e. have the same number of cells and each symbol exactly
once.

Why are the probabilities (almost) independent ?
dukuso

Posts: 479
Joined: 25 June 2005

I realized now that you only grabbed some of the ideas, not the exact formula, for an estimation of latin squares.

I think your example shows that the distance between the truth and the estimate depends on the box sizes. You used 1x1-boxes, and then it is more likely that the probabilities for each box are more dependent.

We don't know the exact answer for 4x3 yet, but this used on 3x3 showed that we missed the target by only 0.2%. For 3x2-sudoku, the error was 1/16. Both experience and intution tells me that the bigger the boxes get, the more accurate are the estimates (relatively, not in difference)
kjellfp

Posts: 140
Joined: 04 October 2005

I remember, you had an estimate on 4*3
trying to walk through the old posts....

When I wrote 1e107 for 16*16s some time ago, I did spend
quite some time to convince myself that it should be correct.
But I don't remember exactly so I could say how much
convinced. I would have to search for the old notices
and redo the calculation.

Let S(a,b,c,d) be the number of sudokugrids of size a*b
with boxes of size c*d.

S(n*m,n*m,n,m)~S(n,m*n,n,m)^m*S(m,m*n,m,n)^n/(n*m)!^(n*m)

there could be other similar formulars, even for
(generalized) sudokus with different number of
cells for rows,columns,blocks

Wikipedia gives the number of bands as:
3×4: 3.1672366e19
4×3: 2.2736145e24
4×4: 9.7303898e38

and the number of 3*4 sudokus as
8.106417e46, while the formula gives 8.1064172

looks very good !!!
dukuso

Posts: 479
Joined: 25 June 2005

dukuso wrote:Wikipedia gives [...] the number of 3*4 sudokus as
8.106417e46, while the formula gives 8.1064172
looks very good !!!

The formula and Wikipedia use the same calculation, so it's not strange the numbers are equal. The exact number of 3x4-sudoku is not calculated yet.
kjellfp

Posts: 140
Joined: 04 October 2005

I'm confused.
The numbers were given by you in Oct., right ?
(found it in the archive)
Did you use the formula already at that time ?
I can't remember having seen it.
dukuso

Posts: 479
Joined: 25 June 2005

Dukosu is right in that the formula doesn't depend on the size of the blocks, and so it can be applied to a 9x9 Latin square. Why is the result a poor estimate? I think Kjellfp has got it right when he says that the assumption of independence of row and column constraints improves as the boxes get bigger, and 1x1 is the worst case. I'll add to that that I think the assumption improves as the number of boxes gets smaller, and again for a Latin square this is a worst case, with 81 boxes in the 9x9 case.

Consider a grid with m x n boxes, where the boxes are of any size. Let q(i,j) be the average number of ways to complete box (i,j) subject to the row constraints imposed by boxes (k,j), k < i, and subject to the column constraints imposed by boxes (i,k), k < j (i.e. subject to the boxes to the left of and above the box (i,j)). This is an an average, in general, because the number of ways to fill in the box depends on what is in the boxes above and to the left, and I'm averaging over all possibilities. So for example q(1,1) is just 9! for a 3x3 Sudoku, and 9 for a 9x9 Latin square, since there are no boxes above and to the left. More examples, just so this is clear: for a 3x3 Sudoku q(1,2) = q(2,1) = 12096, q(1,3) = q(3,1) = 216.

Tinfoil's method can be applied to each box individually. One can estimate q(i,j) with the formula

q'(i,j) = q(i,1) * q(1,j) / q(1,1) = q(1,i) * q(1,j) / q(1,1)

I'm leaving out some intermediate steps here, but they're easy to fill in. Note that I use the exact values for q(1,i) and q(1,j), not the estimates q', because these are calculated exactly by counting band 1. It is easy to prove that the product over all (i,j) of q'(i,j) (where we let q'(1,j) = q(1,j)) gives exactly the estimation formula that I posted and that Kjellfp generalized to rectangular grids. Therefore, evaluating the accuracy of the estimate reduces to evaluating the accuracy of these individual box estimates.

For 3x3 Sudoku, q'(2,2) = 403.2000, and q(2,2) = 403.2678, an error of only 0.017%. The other cases are easy to determine and are of similar accuracy. Furthermore, there are only 4 boxes where we are using the q' estimates in the product; for the other 5 boxes we have an exact value.

For a 9x9 Latin square, q'(2,2) = 7.111, and q(2,2) = 7.125, an error of 0.197%. Not bad, but there are 64 q' estimates in the product (I have not calculated the errors for those cases, but it wouldn't be hard to do), so it's not surprising that the total error is large. But dukosu reports that the final estimate is off by a factor of 10000, which is rather larger than one might expect just considering q'(2,2), so one might want to look into this further to see how accurate the other q' estimates are. And it would be good for someone to verify my calculations, because I haven't had time to check carefully.

For 4x4 Sudoku, I figure that it won't be too hard to get q(2,2) to compare against q'(2,2) from the formula, and I expect (following Kjellfp's experience and intuition) that the agreement will be excellent. It would be much harder to get any of the other q values, but perhaps once we know q(2,2) we can make an educated guess at the accuracy of the complete estimate.
Last edited by PatmaxDaddy on Mon Dec 12, 2005 12:25 pm, edited 1 time in total.

Posts: 63
Joined: 18 October 2005

For a 4x4 Sudoku:
Code: Select all
q(1,1) =   20,922,789,888,000
q(1,2) =      248,341,303,296
q(1,3) =          564,438,622
q(1,4) =              331,776

Note that q(1,3) is not an exact value; the others are. Multiply these and you'll get the band 1 count (but you have to use the exact value for q(1,3), a ratio whose denominator will cancel out in the complete product, leaving an integer).

 The exact value of q(1,3) is 5215977308160 / 9241

Posts: 63
Joined: 18 October 2005

dukuso wrote:I'm confused.
The numbers were given by you in Oct., right ?
(found it in the archive)
Did you use the formula already at that time ?
I can't remember having seen it.

The numbers given in October stemps from the discussion in http://forum.enjoysudoku.com/viewtopic.php?t=44&start=412

For 3xN-sudoku, the formula just suggested should give

N(Nx3) = b(N,3)^3 * b(3,N)^N / (3N)!^(3N)

We can express b(3,N) as follows: Let M be the number of Nx3 box column configurations that "fits" with a fixed one (fits in the sence that column k of the two boxes have no elements in common for every k). For a stack of Nx3 boxes, there are (3N)! ways to build the first box, M*N!^3 ways to build the second, and N!^3 ways to build the third. Therefore

b(3,N) = (3N)! * M * N!^6 and so

N(Nx3) = b(N,3)^3 * M^N * N!^(6N) / (3N)!^(2N)

which is what I used in October.
kjellfp

Posts: 140
Joined: 04 October 2005

### Re: estimate for 4x4

Re:
PatmaxDaddy wrote: More on estimating N(4x4):
Code: Select all
2k
b(k)
N(k x k) = -----------
2  k(k-2)
(k )!

where b(k) is the total number of solutions for band 1 when box 1 contians one fixed permutation of symbols
This form relates to the kjell formulation as
Code: Select all
b(n,m) = (k^2)!*b(k),   for k = n = m

Redefine b(k) as
B(K) =  (k^2)!*b(k)
the number of permutations of a band of length K = k^2.

The PatMax form then becomes

2k
B(K)
N(k x k) = -----------
2  k*k
(k )!
which is consistent with kjell's nm generalization, for n=m=k. The expression in the denominator here is N(GridBoxes) : the number of ways to fill k*k blocks, each with k^2 symbols (once each)
Code: Select all
let    2
K=k

the denominator is then

K    2  k*k
K! = (k )!

combined with the B(k) form gives

2k
B(K)
N(k x k) = -------
K
K!
the original tinfoil formulation.
LarryLACa

Posts: 32
Joined: 24 August 2005

### Re: estimate for 4x4

LarryLACa wrote:
Code: Select all
2k
B(K)
N(k x k) = -------
K
K!

I considered writing the formula this way in my post, but I chose to divide out the common factor of [(k^2)!]^2k and define b(k) so as not to include the permutations for box 1. Clearly one could write it either way, but I figured that I wanted the formula for computational purposes, and so 1) there is no reason to keep a common factor that just has to be divided out, and 2) when counting band 1 solutions, we always start by assuming a canonical form for box 1.