by **PatmaxDaddy** » Mon Dec 12, 2005 3:46 pm

Dukosu is right in that the formula doesn't depend on the size of the blocks, and so it can be applied to a 9x9 Latin square. Why is the result a poor estimate? I think Kjellfp has got it right when he says that the assumption of independence of row and column constraints improves as the boxes get bigger, and 1x1 is the worst case. I'll add to that that I think the assumption improves as the number of boxes gets smaller, and again for a Latin square this is a worst case, with 81 boxes in the 9x9 case.

Consider a grid with m x n boxes, where the boxes are of any size. Let q(i,j) be the average number of ways to complete box (i,j) subject to the row constraints imposed by boxes (k,j), k < i, and subject to the column constraints imposed by boxes (i,k), k < j (i.e. subject to the boxes to the left of and above the box (i,j)). This is an an average, in general, because the number of ways to fill in the box depends on what is in the boxes above and to the left, and I'm averaging over all possibilities. So for example q(1,1) is just 9! for a 3x3 Sudoku, and 9 for a 9x9 Latin square, since there are no boxes above and to the left. More examples, just so this is clear: for a 3x3 Sudoku q(1,2) = q(2,1) = 12096, q(1,3) = q(3,1) = 216.

Tinfoil's method can be applied to each box individually. One can estimate q(i,j) with the formula

q'(i,j) = q(i,1) * q(1,j) / q(1,1) = q(1,i) * q(1,j) / q(1,1)

I'm leaving out some intermediate steps here, but they're easy to fill in. Note that I use the exact values for q(1,i) and q(1,j), not the estimates q', because these are calculated exactly by counting band 1. It is easy to prove that the product over all (i,j) of q'(i,j) (where we let q'(1,j) = q(1,j)) gives exactly the estimation formula that I posted and that Kjellfp generalized to rectangular grids. Therefore, evaluating the accuracy of the estimate reduces to evaluating the accuracy of these individual box estimates.

For 3x3 Sudoku, q'(2,2) = 403.2000, and q(2,2) = 403.2678, an error of only 0.017%. The other cases are easy to determine and are of similar accuracy. Furthermore, there are only 4 boxes where we are using the q' estimates in the product; for the other 5 boxes we have an exact value.

For a 9x9 Latin square, q'(2,2) = 7.111, and q(2,2) = 7.125, an error of 0.197%. Not bad, but there are 64 q' estimates in the product (I have not calculated the errors for those cases, but it wouldn't be hard to do), so it's not surprising that the total error is large. But dukosu reports that the final estimate is off by a factor of 10000, which is rather larger than one might expect just considering q'(2,2), so one might want to look into this further to see how accurate the other q' estimates are. And it would be good for someone to verify my calculations, because I haven't had time to check carefully.

For 4x4 Sudoku, I figure that it won't be too hard to get q(2,2) to compare against q'(2,2) from the formula, and I expect (following Kjellfp's experience and intuition) that the agreement will be excellent. It would be much harder to get any of the other q values, but perhaps once we know q(2,2) we can make an educated guess at the accuracy of the complete estimate.

Last edited by

PatmaxDaddy on Mon Dec 12, 2005 12:25 pm, edited 1 time in total.