Su-Doku's maths

Everything about Sudoku that doesn't fit in one of the other sections

Re: Band 1 counting methods

Postby LarryLACa » Fri Jan 06, 2006 7:41 pm

PatmaxDaddy wrote:For my 4xC and 5xC counts, I find a set of equivalence classes (gangsters) for box 2 where box 2 permutations are in the same class if the "structure" of the mapping from box 1 rows to box 2 rows is the same. This structure is represented by a directed, labelled graph (DLG) containing R nodes.
(where the boxed nodes in the diagrams represent the box minirows for Boxes1,2 and the 'arc labels' represent the dependence/distribuition of cell value counts between the boxes)

Here are the number of box 2 gangsters for various cases:
Code: Select all
4x4:   28
4x5:   53
5x3:   89
5x4:  618
5x5: 3087
Does anyone see a (useful) connection between the 28 4x4 Box2 equivalence classes (or the 53 4x5 ones) and the number 'inner loop cycles' that would be required to implement Kjell's 4xC band expression (for the Box2 component)? Is it practical to use this type of DLG pattern matching as a strategy for implementing the nested sum loops implied by the Kjell 4xC band expression?

Has anyone explored using symbolic algebra programs (e.g. Mathematica (combinatorics pkg) / Maple) for computing (or developing) the band permutations counts?
LarryLACa
 
Posts: 32
Joined: 24 August 2005

Re: Band 1 counting methods

Postby kjellfp » Sat Jan 07, 2006 12:40 am

First of all. Good work on the 5x3- and 5x4-counting. I have plans for a 5xC-counting and may try to verify your results.

There is one place, however, where I disagree:

PatmaxDaddy wrote:Kjell's 4xC formula specifies the count but not an efficient way to compute it.


and later:

PatmaxDaddy wrote:Using the gangsters, my 4x4 running time is 32 ms, and my 4x5 time is 830 ms.


I implemented my method to do 4XC-counting when C runs from 1. It did all computations for C from 1 to 35 in 5 seconds, and from 1 to 47 in 30 seconds. It confirms the values already given by others for C<=7, so I'm pretty confident it's correct.

The method that lies behind the 4xC-formula can be extended to RxC, but the complexity grows so fast that for R>=5 it's much much more convenient to present an algorithm than a formula. I have no idea about the time it takes, but it certainly gets more complicated as R grows.
kjellfp
 
Posts: 140
Joined: 04 October 2005

Postby PatmaxDaddy » Sat Jan 07, 2006 2:07 pm

Kjell is correct, I was quite mistaken when I wrote that the 4xC formula doesn't specify an efficient computation. I must admit I didn't look at it carefully. In fact it seems to be of much lower algorithmic complexity than the band counting methods I've been using. Perhaps Kjell's formula might have complexity that is polynomial in C, whereas I've been using methods that are exponential in C. If Kjell has in fact developed a polynomial-time method for band 1 counting, that is a very significant result. The exponential-time methods have reached the limits of what they can do on GHz-class machines, and we would need a major algorithmic improvement to go any further. Perhaps Kjell's method is just that improvement. Beyond just practical considerations, however, developing P-time algorithms for seemingly exponential problems is a major achievement. But even if the formula is exponential in C, it clearly grows much slower than the methods I've used.

I'd love to determine the complexity of Kjell's formula, i.e. the function f where the running time is O(f(C)). Shouldn't be too hard to analyze, but Kjell, perhaps you already know it?

[Edit] The number of outer summand iterations is (2C+1)(2C+2)/2, making the outer summand O(C^2) (someone please confirm this), so if the inner summand is also polynomial then the whole formula is.
Last edited by PatmaxDaddy on Sat Jan 07, 2006 11:02 am, edited 1 time in total.
PatmaxDaddy
 
Posts: 63
Joined: 18 October 2005

Postby PatmaxDaddy » Sat Jan 07, 2006 2:10 pm

I've run more consistency checks on my 5xC results, and everything still checks out.
PatmaxDaddy
 
Posts: 63
Joined: 18 October 2005

Re: Band 1 counting methods

Postby PatmaxDaddy » Sat Jan 07, 2006 2:25 pm

kjellfp wrote:The method that lies behind the 4xC-formula can be extended to RxC, but the complexity grows so fast that for R>=5 it's much much more convenient to present an algorithm than a formula.

Have you considered writing what might be called a band 1 compiler? This would be a program that, give a value of R, outputs a C program (C being the language, not the number of columns) that computes band counts for RxC cases. This may be similar to Larry's idea of using symbolic algebra programs.

I used to do this sort of thing on a smaller scale in my professional work when I needed very fast programs. Sometimes I'd write C programs that generated a set of other C programs for various cases. Sometimes I'd write a C function that when called would generate and execute machine language inner loops on the stack, where the inner loops were optimized for the specific case with which the function was presented.

Just something to think about.
PatmaxDaddy
 
Posts: 63
Joined: 18 October 2005

Postby PatmaxDaddy » Sun Jan 08, 2006 2:51 am

I think the inner summand of Kjell's 4xC formula is O(C^2), which makes the whole thing O(C^4). The inner summand has four constraints in six parameters, so 2 independent variables each O(C).
PatmaxDaddy
 
Posts: 63
Joined: 18 October 2005

Postby kjellfp » Sun Jan 08, 2006 8:23 am

PatmaxDaddy wrote:I think the inner summand of Kjell's 4xC formula is O(C^2), which makes the whole thing O(C^4). The inner summand has four constraints in six parameters, so 2 independent variables each O(C).

I think the inner summand is O(C^3) and the total is O(C^5). That is because there is a relation between the relations, i.e. one constraint can be derived from the three others.

I could (and should?) leave one of them out, but for the symmetry, I thought it could stay.
kjellfp
 
Posts: 140
Joined: 04 October 2005

Postby PatmaxDaddy » Mon Jan 09, 2006 2:22 am

kjellfp wrote:I think the inner summand is O(C^3) and the total is O(C^5). That is because there is a relation between the relations, i.e. one constraint can be derived from the three others.
Didn't notice that before. I agree, O(C^5).

Note that the outer summand can be done much more efficiently than the (2C+1)(2C+2)/2 iterations that I posted, using the a<=b<=c symmetry that Kjell previously noted, but the result is still O(C^2).

This polynomial time algorithm for band 1 counting is perhaps the most interesting thing I've seen posted on this entire thread. I can't wait to see the general RxC method. I'm thinking that it will be exponential in R and polynomial in C, but we'll see. I'm hoping that it will put 5x5 within reach, and maybe others.

Right now I'd just like to understand the 4xC formula, and I'm not there yet. Kjell, can you give us some more info? Just some hints about how the a,b,c, and k parameters relate to the grid and the symbols would really help.
PatmaxDaddy
 
Posts: 63
Joined: 18 October 2005

Postby kjellfp » Mon Jan 09, 2006 9:19 am

PatmaxDaddy wrote:Right now I'd just like to understand the 4xC formula, and I'm not there yet. Kjell, can you give us some more info? Just some hints about how the a,b,c, and k parameters relate to the grid and the symbols would really help.


I will give the RxC-method, but I'll have to finish another Sudoku project just now (which is about to end as I believe I have detected the last bug, as always...).

The formulas for 2xC, 3xC and 4xC all pop out from the general RxC algorithm. And yes, as far as I can see it, the algorithm seems to be exponential in R and polynomial in C.
kjellfp
 
Posts: 140
Joined: 04 October 2005

Postby PatmaxDaddy » Mon Jan 09, 2006 4:41 pm

When the Wikipedia page says that Kjell's 3xC and 4xC formulas are "verified", what does that mean? One might expect it to mean that there is a proof that the formula holds for all C, but if so I haven't seen one (Kjell himself says for 4xC that he's "pretty confident it's correct", implying he doesn't have a proof). Perhaps it means that the formulas have been checked against counting methods for certain small values of C, and if that's what it means it should say so (i.e., instead of saying "yes" it should say "C <= k", or "all C" when proved).

I understand the 3xC formula and believe I could construct a proof for all C if I needed to, so I've got no problem with it being described as "verified". I don't understand the 4xC formula (not yet anyway, and I'm going to struggle with it until I do). I believe it's right, partly because it has been checked for small C and partly because I have great respect for Kjell, but that's not a proof.
PatmaxDaddy
 
Posts: 63
Joined: 18 October 2005

Postby Red Ed » Mon Jan 09, 2006 8:33 pm

PatmaxDaddy wrote:When the Wikipedia page says that Kjell's 3xC and 4xC formulas are "verified", what does that mean?
I put those comments on Wikipedia. 3xC is clear enough -- I think we can agree the formula is correct. 4xC ... well I thought I understood it, but perhaps Kjell Fredrik might like to post the details about how it works (just 4xC for now; not the whole RxC algorithm) so we can all nod sagely and not feel so bad about the blithe "Yes" on Wikipedia.
Red Ed
 
Posts: 633
Joined: 06 June 2005

Postby kjellfp » Mon Jan 09, 2006 9:51 pm

PatmaxDaddy wrote:When the Wikipedia page says that Kjell's 3xC and 4xC formulas are "verified", what does that mean? One might expect it to mean that there is a proof that the formula holds for all C, but if so I haven't seen one (Kjell himself says for 4xC that he's "pretty confident it's correct", implying he doesn't have a proof).


I'm sure 3xC is verified. I have explained it, and I find it simple enough.

About 4xC, I'm curious myself. As it has been put up as verified, it might be, I'm not the one to answer. But I would like to have a positive confirmation.

About having a proof, I do. When I say that I'm pretty confident, it's more a result of maybe my english not being completely accurate, and the fact that I always ask myself how buggy my implementations are. But the fact that my calculations are confirmed for C<=7 convinces me. A strict mathematical confirmation/proof can be judged by others, as I'm now going to post the general RxC algorithm.
kjellfp
 
Posts: 140
Joined: 04 October 2005

Postby kjellfp » Tue Jan 10, 2006 8:09 am

Red Ed wrote:...so we can all nod sagely and not feel so bad about the blithe "Yes" on Wikipedia.

I took the liberty of changing it to "No" on Wikipedia. After all, the fact that the values concide with those done by others for C<=7 is more a confirmation of these explicit examples than a mathematical proof for the formula itself. There is a difference between confirming a formula and confirming a number. The first is logic, the second a calculation.
kjellfp
 
Posts: 140
Joined: 04 October 2005

Postby PatmaxDaddy » Tue Jan 10, 2006 12:46 pm

A proof is more than a mathematical technicality, it teaches the reader why the formula works. I brought all this up about a proof because I don't understand 4xC (yet), and need to be taught. I assume others do as well (I suspect only Kjell understands 4xC, because no one else has offered an explanation).

A reader of Wikipedia who has not been immersed in Sudoku grids for months as we have will need even more help. I suspect most such readers will struggle to understand 3xC, even though it seems obvious to us.

As for me, I don't need a formal proof to understand the formula, but if I could just figure out what the a, b, c, and k parameters mean I'm sure the rest of it would make sense. If anyone out there has a clue, please speak up.
PatmaxDaddy
 
Posts: 63
Joined: 18 October 2005

Postby kjellfp » Tue Jan 10, 2006 1:10 pm

a is the number of symbols in row 1 and 2 in the first boxes (that is, symbols that are either in row 1 in box 1 and row 2 in box 2 OR in row 1 in box 2 and row 2 in box 1).

It will then also be the number of symbols in row 3 and 4 in the first two boxes, as well as the number of symbols in row 1 and 2 in the two last boxes, and the number of symbols in row 3 and 4 in the first two boxes.

b is the number of symbols in row 1 and 3 in the first two boxes, together with other combinations as for the variable a.

c is the number of symbols in row 1 and 4 in the first two boxes.

Fix a,b,c and the symbols they count. Among the a symbols that lie in row 1 and 2 in box 1 and 2, k12 counts how many of them that lie in row 1 in box 1 (and thus also in row 2 in box 2). In general, for i<j, among the symbols on row i and j in the first two boxes, k_ij tells how many of them that are in row i in box 1 and row j in box 2.

What I do is breaking the band in two parts, with box 1 and 2 in the first part and box 3 and 4 in the second part. This is R=4 and L=2 in the RxC band counting algorithm described here.
kjellfp
 
Posts: 140
Joined: 04 October 2005

PreviousNext

Return to General