## Canonical Form

Everything about Sudoku that doesn't fit in one of the other sections

### Canonical Form

At first I was under the impression that the canonical form of a puzzle is when it is transformed to have the smallest row order lexicographic value. Now I'm finding that there is an additional constraint that box 1 be of the form:
Code: Select all
`1 2 34 5 67 8 9`

Does anyone know why there is this additional constraint? I would think it would make more sense to simply allow row 1 to be:
Code: Select all
`1 2 3 4 5 6 7 8 9`

This makes sense since most solvers process data by rows or columns, not boxes, and is consistent with the minimum lexicographic value (which must start 123456789...). Note that the two orderings may not be possible in the same puzzle. For example the following are isomorphic and canonical based on box 1 or row 1:
Code: Select all
` 1 2 3 | 4 5 7 | 6 8 9 4 5 6 | 1 8 9 | 2 3 7 7 8 9 | 3 6 2 | 1 4 5 ------+-------+------ 2 4 5 | 9 1 8 | 3 7 6 6 3 1 | 2 7 5 | 4 9 8 8 9 7 | 6 4 3 | 5 2 1 ------+-------+------ 3 7 2 | 5 9 6 | 8 1 4 5 1 8 | 7 3 4 | 9 6 2 9 6 4 | 8 2 1 | 7 5 3 1 2 3 | 4 5 6 | 7 8 9 4 5 7 | 1 8 9 | 2 3 6 6 8 9 | 3 7 2 | 1 4 5 ------+-------+------ 2 4 5 | 9 1 8 | 3 6 7 7 3 1 | 2 6 5 | 4 9 8 8 9 6 | 7 4 3 | 5 2 1 ------+-------+------ 3 6 2 | 5 9 7 | 8 1 4 5 1 8 | 6 3 4 | 9 7 2 9 7 4 | 8 2 1 | 6 5 3`
Mike Barker

Posts: 458
Joined: 22 January 2006

You should have a look at this thread :
and read the excellent definition given by G. Royle .

JPF
JPF
2017 Supporter

Posts: 3754
Joined: 06 December 2005
Location: Paris, France

I implemented the canonical search -pretty much following gsf's steps..........

The idea behing labelling the top left box as you said would limit unnecessary lexographic searches.........

tarek

tarek

Posts: 2861
Joined: 05 January 2006

The programmer's forum link is http://www.setbb.com/phpbb/viewtopic.php?t=795&start=0&mforum=sudoku . I've also looked at Glenn Fowler's description here. JPF's links support the row cononical form. These later support the box canonical form. I guess I don't see the advantage of the box form and do see advantages to the row form. Probably the most interesting thing to me was that they may not both exist at the same time for a given puzzle. Secondly I'd like to be consistent with the consensus. At this point I'd say there are several options.
Last edited by Mike Barker on Sun Jan 14, 2007 1:23 am, edited 1 time in total.
Mike Barker

Posts: 458
Joined: 22 January 2006

I have a terrible memory, but I have a recollection from my readings on Canonical Form. That recollection being ...

There is more than one possible way to construct a canonical form to place puzzles into equivalence classes. The important thing is to chose one and use it consistently for all of your comparisons.

I favor this (extended) Canonical Form that (I believe) was proposed by JPF somewhere in this forum. I (sometimes) use it to generate puzzles.
Code: Select all
`123|4..|...456|...|...789|...|4..---+---+---.4.|...|......|...|.4....|..4|...---+---+---..4|...|......|.4.|......|...|..4`

Now, please be gentle as you rip me apart for being incorrect.
daj95376
2014 Supporter

Posts: 2624
Joined: 15 May 2006

In fact canonize is a sort of your grid.
There is not ONE canon form but one by programmer
The gsf way is , I beleive, the better way. Why?
Because afte canonnize a gsf grid have the better
dispositions for the solver.

A little sample with just a bloch:

123 xxx 789
xxx xx3 21x
x7x xxx xxx

I scramble it to make an isomorph

81x xxx xx3
72x 183 xxx
xxx x7x xxx
And now I want to now if they are isomorph

I Just have to sort athe boxes. Just like I explain it in an other tread you always have the same number of clue in the isomorphs So if you sort the boxes with the number of clues you always have the same result!!!

123 xxx 789 Box 1 4 clues box 2 1 clue box 3 5 clues
xxx xx3 21x
x7x xxx xxx

81x xxx xx3 box 1 5 clues box 2 4 clues box 31 clue
729 183 xxx
xxx x7x xxx

so we have 415 and 541
We sort he grid by box 145

xxx 123 789 Box 1 1 clues box 2 4 clue box 3 5 clues
xx3 xxx 21x
xxx x7x xxx

Now do the same with the rows
R1 6 r2 3 r3 1

xxx x7x xxx
xx3 xxx 21x
xxx 123 789

Sort them (by block)
r1 1 r2 3 r3 6

At this time you can detect all permutations, rotations but NOT the digits swap!

Bur it's easy tou just have to renumber the digit in the order you find it

i/e the 7 bedomes a 1 and 1 becomes 7
3 becomes a 2 and 2 becomes 1

6 no change
7 becomes 4 ....

xxx x1x xxx
xx2 xxx 34x
xxx 723 618

You have now ONE canonize form because take two grid make the two sort the renumber and you have the same results for each isomorph of
a given grid

There is no standard. So it's very difficult to know if a grid always exist.
If a standard is given (I vote for he gsf standard!!!) every grid in the word will be give in this form and we can imagine a world wide daba
base with all the models.

Staring from a canonozed grid you just have to sramble it

PS Using this method you can have a problem two boxes have the same number of ckues. No matter begin by an first e-numerotation ans sort th two twins always with the same method.
You will have the result tan bien...

I repeat it'S MY method every coder has his own.
but the princip is here:always sort the grid(box,row,digits) with the same method
This method is speed but not top!

Papy
Papy

Posts: 131
Joined: 15 August 2006

### Re: Canonical Form

Mike Barker wrote:At first I was under the impression that the canonical form of a puzzle is when it is transformed to have the smallest row order lexicographic value. Now I'm finding that there is an additional constraint that box 1 be of the form:
Code: Select all
`1 2 34 5 67 8 9`

Does anyone know why there is this additional constraint? I would think it would make more sense to simply allow row 1 to be:
Code: Select all
`1 2 3 4 5 6 7 8 9`

This makes sense since most solvers process data by rows or columns, not boxes, and is consistent with the minimum lexicographic value (which must start 123456789...).

the two forms, lets's call them box-normal and row-normal, are both valid
row-normal is easier to specify: the the minimum lexicographic row order value
box-order adds: with the top left box of the form ...

as with many data formats there can be space-time tradeoffs
e.g., given a solution, what is the complexity of determining the row-normal form vs. box-normal form

for row-normal
9 rows could be the first row: 9 choices
the 3 chute portions of the first row can be permuted 6 ways: 6
each column in each chute can be permuted 6 ways: 6*6*6
the puzzle can be rotated 90 deg to make cols rows: 2
for a total of 9*6*6*6*6*2 = 23328 choices

for box-normal
9 boxes could be the first box: 9
the cols in the box can be permuted 6 ways: 6
the rows in the box can be permuted 6 ways: 6
the puzzle can be rotated 90 deg to transpose the boxes: 2
for a total of 9*6*6*2 = 648

each "choice" sets the labeling of all cells for the lexicographic value

for row-normal the remaining rows within the bands are swapped to form the minimal value
for box-normal the columns within the middle and right chutes and rows within the middle and bottom bands
are swapped to form the minimum value

box-normal form is much easier easier to calculate
gordon's catalog canonicalizes (to box-normal form) at ~2800 puzzles/sec/Ghz
I haven't verified the timing difference because I didn't bother coding the row-normal form
but it would probably come in at ~100 puzzles/sec/Ghz
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Thanks to GSF, DAJ and Papy for the explainations - looks like I'll be using box-normal from now on.
Mike Barker

Posts: 458
Joined: 22 January 2006

Let me retract that last comment. Thinking about it over night I think you can accomplish row-normal form almost as efficiently as box normal. The primary selections for row normal would be the same as for box normal, that is, selection of which box is box 1, the order of the rows and columns in the box, and a 90deg rotation (or transpose). Once box 1 is selected the logic is that r2c1 must equal 4 and r2c2 must equal 5. This means that columns 4-6 can be identified without additional iterations. Given this, columns 7-9 are identified to keep r2 columns 1-6 in the smallest lexicographic order again without iteration. This means the processing for both forms is almost the same and you're left, as DAJ suggested, with the selection based on personal preference.
Mike Barker

Posts: 458
Joined: 22 January 2006

Thanks to Ruud and gsf to have detect 2 valid solution in my collection
But starting from one solution i generate those
:
12_____8_4__9__________7__3_97___5______84____________8___2__1____5__7___________
12_____8_4__9__________7__3_97___5______84____________8___2__9____5__7___________
12_____8_4__9__________7__3_97___5______84____________8___2__1____5__9__________
12_____8_4__9__________7__6_97___5______84____________8___2__1____5__7___________
12_____8_4__9__________7__6_97___5______84____________8___2__9____5__7___________
32_____8_4__9__________7__1_97___5______84____________8___2__3____5__7___________
32_____8_4__9__________7__6_97___5______84____________8___2__3____5__7___________
32_____8_4__9__________7__6_97___5______84____________8___2__9____5__7___________

For me at less the 3 problem are original: Only one digit different and no permutations. the 1 at the end is replaces by a 9 but the 9 are not replace by 1.

Some grids are bad because all digits are not use but it's the canonoi job!
I you can tell me your ideas...

Papy
Papy

Posts: 131
Joined: 15 August 2006

All but the third are invalid (0 solutions). The third is valid if you add one more empty cell to the end. Papy, don't you have any solver that you could use to check for validity yourself?

RW
RW
2010 Supporter

Posts: 1000
Joined: 16 March 2006

If you have "new" verified 17s, it is probably better to post them in the "update" thread.
Most of us use "." or "0" for an empty cell rather than a "_" which is why your puzzles tend to have accidental typo-errors.

C
coloin

Posts: 1743
Joined: 05 May 2005

Mike Barker wrote:Let me retract that last comment. Thinking about it over night I think you can accomplish row-normal form almost as efficiently as box normal. The primary selections for row normal would be the same as for box normal, that is, selection of which box is box 1, the order of the rows and columns in the box, and a 90deg rotation (or transpose). Once box 1 is selected the logic is that r2c1 must equal 4 and r2c2 must equal 5. This means that columns 4-6 can be identified without additional iterations. Given this, columns 7-9 are identified to keep r2 columns 1-6 in the smallest lexicographic order again without iteration. This means the processing for both forms is almost the same and you're left, as DAJ suggested, with the selection based on personal preference.

the formulation I gave was for canonicalizing solutions (no empty cells)
when the top left box is fixed into one of the possible 6*6 permutations all cell values are also fixed
which means the rest of the work after each fix is a glorified sort

if you only fix 1-5 in the top left box the cell value choices for 6-9 must be determined
that's a 4*3*2*1 loop on top of the work done for box-normal, unless I'm missing a nuance
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Thanks for the reply, I know this may only be of theoretical interest to me, but it was a fun logical challenge. With r2c12 mapped to {4,5} then either the middle stack and the column order is determined directly since it must be the stack containing the candidates in row 1 columns 4-9 which map to {4,5} or if such a stack doesn't exist then this is not a cononical arrangement. Columns in the middle stack arrange to order values which map to {4,5} and the last to 6. The unmapped values in row 2 columns 1-6 then provide the mapping to {7,8,9} and the last stack can be ordered to achieve {7,8,9}. A little more complicated than box normal, but no iterations are required. Not a big deal, but it means the thruput arguement is not as significant and the simpler to define row-normal form is more viable. Also not a big deal, but I'd think using a transpose would be slightly more numerically efficient than a 90deg rotation.
Mike Barker

Posts: 458
Joined: 22 January 2006

Mike Barker wrote:Thanks for the reply, I know this may only be of theoretical interest to me, but it was a fun logical challenge. With r2c12 mapped to {4,5} then either the middle stack and the column order is determined directly since it must be the stack containing the candidates in row 1 columns 4-9 which map to {4,5} or if such a stack doesn't exist then this is not a cononical arrangement. Columns in the middle stack arrange to order values which map to {4,5} and the last to 6. The unmapped values in row 2 columns 1-6 then provide the mapping to {7,8,9} and the last stack can be ordered to achieve {7,8,9}. A little more complicated than box normal, but no iterations are required. Not a big deal, but it means the thruput arguement is not as significant and the simpler to define row-normal form is more viable. Also not a big deal, but I'd think using a transpose would be slightly more numerically efficient than a 90deg rotation.

thanks for providing a second set of eyes on the problem
I was stuck on the top left box
with this insight the gains from fixing the top left box can be used to optimize the row-normal computations
recoding to row-normal allowed a few more optimizations (basically early termination of the
current iteration when it can't improve the current best), so its faster than box-normal

I reposted my solver with row-order canonical for -f%c and -f%C
if you download it be sure to recalibrate any canonical catalog files you have to the new canonical form

here's to scientific discourse
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Next