## Minimum number of clues

Everything about Sudoku that doesn't fit in one of the other sections
Wolfgang wrote:Moschopulus, i ran my old program with your grid above over night. I only found a lot of 20-clues (must be over 200). Finding a 19-clue in this grid seems to be about that hard as to find an 18-clue in 'Moschopulus I'.

Thanks. If there were no 19 in this grid, that makes it pretty unusual.
Maybe it could be the only grid with no 19.
Moschopulus

Posts: 256
Joined: 16 July 2005

I'm thinking about a more illustrative/intuive/effective way to "normalize" sudokus, ie to calculate a unique representative for isomorphic sudokus. In a first step i would (re)order the boxes from top/left to bottom/right according to the number of clues, and under those with the same number, according to the cells occupied with clues from top left to bottom right (applying the transformations defined by gfroyle).
The prolific pattern would become this then:
Code: Select all
`... x.. xx.      xx. x.. ...x.x ... ...      x.. ... ...... x.. ...      ... .x. xx.x.. .x. ..x      x.. x.. ....x. x.. x..      ..x ..x x..... ... ...      ... ... x..... .xx ..x      ..x .x. ....x. ... .x.      ... x.. ..x... .x. ...      ... ... ..x`

Then, for each of the best "pattern normalized grids" (think there are 72*4*6^6 for a full grid, where all box patterns are full) i would renumber the clues starting with 1 from left/top to right/bottom. An empty cell would get 10. The "grid" with the lowest numbers from top/left to bottom/right defines the "normalized" sudoku then.

Is it correct that a sudoku "normalized" this way is unique? Has anyone done this already? If yes and no, i will try to work it out.
Wolfgang

Posts: 208
Joined: 22 June 2005

Wolfgang wrote:I'm thinking about a more illustrative/intuive/effective way to "normalize" sudokus, ie to calculate a unique representative for isomorphic sudokus. In a first step i would (re)order the boxes from top/left to bottom/right according to the number of clues, and under those with the same number, according to the cells occupied with clues from top left to bottom right (applying the transformations defined by gfroyle).
The prolific pattern would become this then:
Code: Select all
`... x.. xx.      xx. x.. ...x.x ... ...      x.. ... ...... x.. ...      ... .x. xx.x.. .x. ..x      x.. x.. ....x. x.. x..      ..x ..x x..... ... ...      ... ... x..... .xx ..x      ..x .x. ....x. ... .x.      ... x.. ..x... .x. ...      ... ... ..x`

Then, for each of the best "pattern normalized grids" (think there are 72*4*6^6 for a full grid, where all box patterns are full) i would renumber the clues starting with 1 from left/top to right/bottom. An empty cell would get 10. The "grid" with the lowest numbers from top/left to bottom/right defines the "normalized" sudoku then.

Is it correct that a sudoku "normalized" this way is unique? Has anyone done this already? If yes and no, i will try to work it out.

Yes, it is correct that a sudoku "normalized" 'this way' is unique - But I think it's necessary to clarify a few things. For instance, if you regard a "pattern normalized grid" the same way I would do, there is no need to introduce a 10 for empty cells - a 10 or a 0 makes no difference.

In general, several transformations (band/row/column permutations + transposition) may lead to identical "pattern normalized grid", and it is necessary to check ALL these (reassign the clue values, check ordering, and pick the lowest).

All sudokus with a given "clue pattern" will have identical "normalized clue patterns". But if one clue value is changed "near upper left", and the sudoku then normalized, the result will in general be a normalized sudoku with completely different clue values. If a clue value is changed "near bottom right", the new normalized sudoku will look "almost like its mother".

(There are numerous ways of normalizing sudokus, and ANY will do, as long as the process is unique. Normalization may be done based on clue pattern, on solution grid, or any 'class invariant' property.)
Ocean

Posts: 442
Joined: 29 August 2005

I would like to suggest some alternative simple normalized sudoku forms (or norms):

1. C0-Norm: The 'lowest' sudoku of all sudokus in a sudoku class (nearly all start 00...)

2. C9-Norm: The highest sudoku of all sudokus in a sudoku class (nearly all start 98...)

3. C1-Norm: From a C9-Norm, permute the symbol values 123456789->987654321 (nearly all start 12...)

4. G1-Norm: The sudoku for the 'lowest' solution grid of all in its S-class.

All the above norms are unique (as Wolfgangs 'Pattern Norm' is). They are all simple to compute in the labourous way (generate all instances for the class, and pick the norm), and it's also possible to find effective algorithms to compute the norms. The G1-Norm has the property that all sudokus with the same solution grid will group together in a sorted set of sudokus. I have a preference for the C1-Norm, which looks more natural than the more basic C0-Norm (normally starting with lots of zeros). The C9-Norm may be regarded as an intermediate form (simple way to reach the C1-Norm).

Conjecture: Any sudoku can be transformed to one of its normalized forms through a transformation operator P, and back again with the 'inverse' of P (denoted P^). Any two instances in a sudoku class can be transformed to each other. If two sudokus A and B have the same normalized form, (P1*A=P2*B, for the two transformation operators P1 and P2), we have A=P1^*P2*B, and B=P2^*P1*A.

(A transformation operator P consists of a row permutation matrix R, a column permutation matrix C, a value permutation matrix V, and a transposistion number T. The band permutations can be incorporated in R and C).

[Any corrections, new suggestions, improvement of the notation, and better explanation of the mathematics, will be appreciated...]

---
Last edited by Ocean on Mon Nov 21, 2005 11:18 pm, edited 1 time in total.
Ocean

Posts: 442
Joined: 29 August 2005

Ocean wrote:Any sudoku can be transformed to one of its normalized forms through a permutation matrix P, and back again with the 'inverse' of P (denoted P^).

I don't believe this...

The norms themselves are all perfectly sensible, but the representation of the transformation from a sudoku to its norm doesn't seem right to me, because a single matrix cannot represent all of the operations of row-permutation, column-permutation, symbol relabelling and transposition.

Normally, the term "permutation matrix" means a 0/1 matrix with exactly one 1 in each row and column, and it represents a single permutation. Thus to represent a permutation of the rows AND columns you would need two permutation matrices P and Q and then you would multiply one of them on the left, and the other on the right.

B = P A Q

But even if you are using the term in a different context, it is impossible to get a permutation of the rows AND columns by just one matrix multiplication.

And as far as I can see, symbol relabelling and transposition just can't be represented by a matrix multiplication in any way at all.

Cheers

Gordon
gfroyle

Posts: 214
Joined: 21 June 2005

gfroyle wrote:
Ocean wrote:Any sudoku can be transformed to one of its normalized forms through a permutation matrix P, and back again with the 'inverse' of P (denoted P^).

I don't believe this...

The norms themselves are all perfectly sensible, but the representation of the transformation from a sudoku to its norm doesn't seem right to me, because a single matrix cannot represent all of the operations of row-permutation, column-permutation, symbol relabelling and transposition.

Thank you for the corrections! It was obviously wrong to call the transformation a 'permutation matrix', since that concept has a precise mathematical meaning.

What I actually meant, was: It's possible to define a transformation operator P, and the related transformation operation P*A (NOT matrix multiplication). I think the operator P can be represented as three 9-element vectors R,C and V (all some specific permutation of 123456789), plus a number T (0 or 1). (The transformation operation consists of one row permutation, one column permutation, one symbol relabelling, and possibly one transposition).
If the operation i linear (and I believe it is) it is possible to define the corresponding inverse operation (and the inverse operator P^).

Hope I didn't go too wrong here - I have not written the algorithms yet or carried out any formal proofs, only done some 'back of the envelope' calculations...

PS.
Just a small comment on 'symbol relabelling': The vector V might be hard to FIND, but is simple to USE: Apply the unix command "tr" directly on a file, or a C-statement like "for (i=1;i<=81;i++) B[i]=V[A[i]];".

PPS.

- -Here are the ideas developed so far ...

Let the transformation operator P be represented by the set of three 9x9 permutation matrices R, C and V, plus a possible transposition t. [As a convention we say t=0 means no transpose, and t=1 means transpose].

P = {R; C; V; t}

For two consecutive transformations P1 and P2, we define the product P1 * P2:

P1 = {R1; C1; V1; t1}
P2 = {R2; C2; V2; t2}

P1 * P2 = {R1[(1-t1)R2+t1C2]; [(1-t1)C2+t1R2]C1; V1*V2; (t1+t2)mod 2}

If the transformation operation is linear, it's possible to define an inverse operator P^, such that P * P^ = {I; I; I; 0}, where I denotes the identity matrix.

P^ = {(1-t)R^+tC^; (1-t)C^+tR^; V^; t}

(where R^,C^,V^ denotes the respective inverse matrices).

... hope most of it is correct so far. The noncommutativity of the transformation operation (and matrix operations) might not have been taken proper care of...

If we represent the permutation matrices by their corresponding array representation:
For V, all 9! possible permutations of 123456789 are allowed (362880).
For R and C, only 6^4 (=1296) combinations of 123456789 are allowed.

!--------------------------------

The whole point of all this, was...

Conjecture:

Any two sudokus A1 and A2 belonging to the same class can be transformed to some common instance of the class (for example a chosen normalized form). If the transformation to this norm is represented by operators P1 and P1, the transformation between A1 and A2 is P1 * P2^:

P1 * P2^ = {R1[(1-t1)((1-t2)R2^+t2C2^)+t1((1-t2)C2^+t2R2^)]; [(1-t1)((1-t2)C2^+t2R2^)+t1((1-t2)R2^+t2C2^)]C1; V1*V2^; (t1+t2)mod 2}

The four cases for t1/t2 gives:

t1=0, t2=0 =>: P1 * P2^ = {R1*R2^; C2^*C1; V1*V2^; 0}
t1=0, t2=1 =>: P1 * P2^ = {R1*C2^; R2^*C1; V1*V2^; 1}
t1=1, t2=0 =>: P1 * P2^ = {R1*C2^; R2^*C1; V1*V2^; 1}
t1=1, t2=1 =>: P1 * P2^ = {R1*R2^; C2^*C1; V1*V2^; 0}

!-----------------
Last edited by Ocean on Mon Nov 21, 2005 10:49 pm, edited 4 times in total.
Ocean

Posts: 442
Joined: 29 August 2005

Ocean wrote:I would like to suggest some alternative simple normalized sudoku forms (or norms):

...
2. C9-Norm: The highest sudoku of all sudokus in a sudoku class (nearly all start 98...)

3. C1-Norm: From a C9-Norm, permute the symbol values 123456789->987654321 (nearly all start 12...)

Think the way i described above should give a C1 Norm.
Wolfgang

Posts: 208
Joined: 22 June 2005

Wolfgang wrote:I'm thinking about a more illustrative/intuive/effective way to "normalize" sudokus, ie to calculate a unique representative for isomorphic sudokus.

Yes, it is a slight shame that Gordon's Nauty program presents the transformed puzzles in a non human way !

I have long thought that box 1 should always be
Code: Select all
`123 ... ...456 ... ...789 ... ...`

and that the 6^6 could be represented as
Code: Select all
`1-- --- ------ 1-- ------ --- 1---1- --- ------ -1- ------ --- -1---1 --- ------ --1 ------ --- --1`
Last edited by coloin on Tue Nov 22, 2005 8:33 pm, edited 1 time in total.
coloin

Posts: 1702
Joined: 05 May 2005

As Ocean mentioned, the problem with the C1-norm is that if you make a slight change, eg add a clue out of the solution grid, you can get a totally different normalized sudoku. Since i need a norm also for multiple solution sudokus, i cannot define it over the solution grid.
But a fast function to normalize low clue sudokus should improve my search, because i saw that sometimes i wasted time trying sudokus that were (obviously) isomorphic to others before. With a simple heuristic (just dropping and adding a clue alternatively depending on the remaining number of candidates after "level 2 guesses") i had found two 17-clues (and hundreds of 18-clues) 2 weeks ago. But after "improving" it last week no more 17-clue appeared
Wolfgang

Posts: 208
Joined: 22 June 2005

The goal is to develop a canonical ordering of a set of clues that is invariant for transformations of the same set of clues. Transformations consist of reordering rows of columns within a band, reordering bands of rows or columns, transposing the rows and columns and reassigning the symbols

Start by building a list of the relationship between pairs of clues. The relationship values in order of strength are:
same row or col and same block
same row or col, different block
same block, different row and col
same symbol, same band vertical or horizontal
same symbol, different vertical and horizontal band
different symbol, same band vertical or horizontal
different symbol, different vertical and horizontal band
Each clue will have a rank which is the number of clues that it is better than. A clue is better than another clue if it has a higher rank or if it has the same rank and a stronger relationship to another clue or if its rank and relationship are the same and the related clue has a higher rank or if all that is the same the comparison continues with the next relations until all the relations have been compared.

Clues initially all start with a rank of zero and the ranks are updated after each round of comparisons. It is important that all the comparisons are made before the new ranks are assigned otherwise some clues may improperly be ranked higher due to the order of evaluation. The end result will usually leave each clue with a unique rank. However, there are some (rare?) symmetries that can leave clues sharing the same rank.
DanO

Posts: 40
Joined: 18 October 2005

There is a first version of my program to normalize sudokus to C1-norm. Of course it is not fully tested, but (on a 1.2 GB notebook) i converted gfroyles 17-clues in 23 seconds (avg < 1 ms) and got unique sudokus without duplicates. The conversion of the solution grids took 1 1/2 hours (avg 174 ms). The (rather raw) source code (linux/c++, 8K) can be downloaded with
http://wolfgang-sudoku.150m.com/normC1.zip
(use "save as")
It should be portable for other OSs with minor effort.
[edit:] oops, downloading over a link does not work from this free webspace, i will try to put it elsewhere in the next days ..
For the moment, you have to copy this address into the address field of the browser.
Wolfgang

Posts: 208
Joined: 22 June 2005

Lets come back to the minimum clue thread (or rename the 17-clue update thread to minimum clue part II).
I suppose, my impression that almost all 17-clues are already found, was too pessimistic (and maybe also reflected my disappointment that none of meanwhile 70 17-clues i found was new).
What i have found are no random 17-clues, but i would call them "mainstream 17-clues". In fact the simple heuristic is a greedy algorithm (though always starting with a random sudoku). it only steps from one "best" 16/17-clue to the next "best". If it would start with a sudoku only some clue-exchanges away from a 17-clue, it is not very unprobable that it would miss it.
It is clear that gfroyle's heuristic is more sophisticated and thus covers the 17-clues i can find this easy way. He obviously has searched almost the whole "heuristic space".
On the other side the only 17-clue i found with another method was not in the list.
BTW: I doubt that most of the grids contain an 18-clue. Is this examplified or only estimated (i know how hard it was to find some 18s in ssf)?
Wolfgang

Posts: 208
Joined: 22 June 2005

Moschopulus wrote:Some stats on the MCN.

MCN= max clique number = number of unavoidables in a maximal set of pairwise disjoint unavoidables

I took 13919 grids which have a 17 (from gfroyles list), and I took 13919 grids chosen completely at random (using dukuso's program).

Here are the MCNs.(*)

[MCN, number of grids that have a 17, number of random grids]

[4, 0, 0],
[5, 0, 0],
[6, 15, 0],
[7, 130, 2],
[8, 1146, 194],
[9, 3879, 1442],
[10, 5254, 4217],
[11, 2753, 4828],
[12, 672, 2451],
[13, 69, 690],
[14, 1, 88],
[15, 0, 7],
[16, 0, 0]

mean [9.83,10.74]
variance [1.12,1.22]

blue=random grids, red=grids with 17

(*) Actual MCNs may vary. These MCNs are calculated from a selection of unavoidable sets, not from *all* unavoidable sets. However the selection is chosen in the same way each time. It consists of all unavoidables of sizes 4 and 6, and a few others of a certain type.

[edit, added: The mean MCN of the known grids with more than 4 17s is 9.55.]

Interesting data....

Interesting in that there just happens to be one grid with a MCN of 14 which has one or more 17s.

This MCN 14 grid may prove interesting.....

In this grid there will be a reduced number of unavoidables.
There will also therefore be fewer collections of these maximal collections.

Asuming there is only one 17 ...in all of the reduced number of collections there will be at least 11 unique clues.[the unavoidable set will not be doubly covered]

What is this grid ?

......................................................................................................

It is also interesting that the 2nd most fertile grid with [19+1=20 ] 17s has a MCN of 12 !

These are the 19 [with the same 16 clue backbone]
Code: Select all
`5942.........4..3.1.........6..73.........5.1..........3.....7....5..4..8........5.42...8.....4..3.1.........6..73.........5.1..........3.....7....5..4..8........5.42......8..4..3.1.........6..73.........5.1..........3.....7....5..4..8........5.42.........4..321.........6..73.........5.1..........3.....7....5..4..8........5.42.........4..3.12........6..73.........5.1..........3.....7....5..4..8........5.42.........4..3.1...6.....6..73.........5.1..........3.....7....5..4..8........5.42.........4..3.1....8....6..73.........5.1..........3.....7....5..4..8........5.42.........4..3.1.....9...6..73.........5.1..........3.....7....5..4..8........5.42.........4..3.1........96..73.........5.1..........3.....7....5..4..8........5.42.........4..3.1.........6..732........5.1..........3.....7....5..4..8........5.42.........4..3.1.........6..73..8......5.1..........3.....7....5..4..8........5.42.........4..3.1.........6..73.........5.12.........3.....7....5..4..8........5.42.........4..3.1.........6..73.........5.1.....6....3.....7....5..4..8........5.42.........4..3.1.........6..73.........5.1.......9..3.....7....5..4..8........5.42.........4..3.1.........6..73.........5.1..........3....87....5..4..8........5.42.........4..3.1.........6..73.........5.1..........3.....7....58.4..8........5.42.........4..3.1.........6..73.........5.1..........3.....7....5..42.8........5.42.........4..3.1.........6..73.........5.1..........3.....7....5..4..8...2....5.42.........4..3.1.........6..73.........5.1..........3.....7....5..4..8.......9`

this is the grid
Code: Select all
`594231786786945132123768954965173248378492561241856397432619875619587423857324619`

the last unavoidable set to be covered includes these clues ? {12,18,22,29,32,35,36,37,41,47,49,61,66,68,77,85,88,95,99}

this is the infered 16 clue stage
Code: Select all
`+---+---+---+|5.4|231|7.6||7.6|945|13.||1.3|7..|.54|+---+---+---+|.65|173|.4.||378|4..|5.1||.41|.5.|3.7|+---+---+---+|43.|.1.|.75||61.|5.7|4.3||857|3.4|.1.|+---+---+---+`

The nine other clues {55,56,58,64,73,74,76,83,97} - I think they are clues which can solve the three other solution grids........hmm...........
Last edited by coloin on Sat Dec 31, 2005 9:43 pm, edited 2 times in total.
coloin

Posts: 1702
Joined: 05 May 2005

Here's the grid with MCN = 14 that has a 17.

439286157176593284258147396381754629795632841624918735542371968867429513913865472

You could use checker to find the puzzle. (Of course I had the puzzle once, but I can't find it now)
Moschopulus

Posts: 256
Joined: 16 July 2005

Thanks a lot - and "A happy new year" to you.
I was wondering how I was going to get the grid from Gordon's list !
Here is the 17 from the grid with MCN 14
Code: Select all
`+---+---+---+|...|..6|.5.||.7.|...|2..||..8|1..|...|+---+---+---+|3..|.54|...||...|...|8.1||...|...|7..|+---+---+---+|54.|...|.6.||...|.2.|...||...|8..|...|+---+---+---+with clues {16,18,22,27,33,34,41,45,46,57,59,67,71,72,78,85,94}`

from unavoid
Code: Select all
`Found 239 unavoidable sets (24 of size 4 or 6).The maximum # of disjoint unavoidable sets (max clique number -- MCN) is 14.One such maximal collection is:   {16,19,36,39}               1.......16   {17,18,87,88}               1.......18    {22,23,82,83}               1.......22   {27,29,97,99}               1.......27   {46,48,56,58}               1.......46   {62,63,72,73}               1.......72   {75,78,95,98}               1.......78   {11,14,31,35,84,85}         1.......85   {13,15,25,28,33,38}         1.......33   {42,49,52,57,77,79}         1.......57   {43,45,53,59,65,69}         2.......45&59   {44,47,51,54,61,67}         1.......67   {64,66,81,86,91,94}         1.......94   {21,24,32,34,71,76,92,96}   2.......34&71     41 not used in this disjoint collection`

The point of this exercise:-
Using the 12 most "efficient" clues I was not able to generate another 17 however.

Are there other "such maximal collections" of size 14 in this grid ?
coloin

Posts: 1702
Joined: 05 May 2005

PreviousNext