## Canonical Puzzle Form

Everything about Sudoku that doesn't fit in one of the other sections

### Canonical Puzzle Form

Can anyone tell me how to put a puzzle in canonical form?

As I see it you just put the puzzle in as close to 123456789123456...... as possible with the top left corner as

Code: Select all
`123456789`

Or am I totally off ?
eclark

Posts: 14
Joined: 26 January 2006

A puzzle is in canonical form, when the solution grid, considered as number with 81 digits (the first 9 digits correspond to the first row - not box, the next 9 to the second row etc.) is the smallest under all equivalent grids. They all have a first row 123456789. See here for operations on a sudoku that keep it equivalent.
Therefore puzzles with zero or multiple solutions don't have a canonical form.
Wolfgang

Posts: 208
Joined: 22 June 2005

Wolfgang wrote:A puzzle is in canonical form, when the solution grid, considered as number with 81 digits (the first 9 digits correspond to the first row - not box, the next 9 to the second row etc.) is the smallest under all equivalent grids.

This is an interesting definition of canonical form, but it is a bit unusual and a bit restrictive.

The basic idea is that there a number of operations that turn a grid (partially filled or complete, valid puzzle or pseudo-puzzle) into one that is entirely equivalent. This means that each grid comes with a large collection of equivalent grids, called the "equivalence class" of that grid. In general, each of these sets of equivalent grids will contain 9! x 6^8 x 2 grids, but sometimes fewer.

For searching, counting and a variety of other mathematical purposes, we usually want to treat an entire equivalence class of grids as a single object, because every grid in that class is "essentially the same".

But for programming, solving and so on, we need to actually use a specific individual grid to represent the entire equivalence class.

So the concept of a CANONICAL grid is

- choose SOME RULE that picks out ONE individual grid from the entire equivalence class and DEFINE that grid to be the "canonical" one.

Wolfgang gave one example of such a rule: pick the grid from the class whose completion is the lowest 81-digit number.

But there are plenty of other possible rules and they don't need to involve finding the grid's completion, and so they don't need to be restricted to valid puzzles.

Here are some choices..

(a) Write each grid as a number in decimal with "0" for the empty spaces and from all of these choose the one that gives the SMALLEST number as the canonical one

[ this will agree with Wolfgang's order for complete grids ]

(b) .. as before .. but choose the LARGEST number as the canonical one

(c) Write each grid as a number, but block-by-block, rather than row-by-row... then choose the smallest, or largest or whatever.

One that I use is

(d) Convert grid to a graph, use the graph-labelling program nauty, and then convert graph back to a grid.

The ONLY advantage of the last one is that it is very fast for puzzles with few clues - for 17 clue puzzles I can find the canonical grids at the rate of about 4000 per second (on a standardish 2.4G PIV). Given that I am processing millions of 17/18-clue puzzles, I need to work quickly with the canonical version to avoid repetitions.

Its disadvantage is that the canonical version of a puzzle has no human interpretation - it just seems like an arbitrarily chosen grid..

Cheers

Gordon
gfroyle

Posts: 214
Joined: 21 June 2005

Thanks for the clarification. What i described above is what people in this forum mean, when they talk about a puzzle in canonical form or a canonicalized puzzle. The other forms have been referred to as normalization, eg (a) as C1-norm (which i personally use - note that a unique sudoku in C1-norm is different from that you get after normalizing the solution grid). Without having an exact definition of "canonical" it implies for me that you can bring them in an order, i.e. for 2 elements (or equivalence classes) you can say which one comes before the other. Is this the case for the "nauty norm" too (i mean an order implied by the method, of course you can order them like the others, because they are unique)?
Wolfgang

Posts: 208
Joined: 22 June 2005

gfroyle wrote:So the concept of a CANONICAL grid is

- choose SOME RULE that picks out ONE individual grid from the entire equivalence class and DEFINE that grid to be the "canonical" one.

thanks Gordon
on the programmer's forum I'm afraid I contrbuted to a restrictive definition similar to wolfgang's
by answering a specific question about how I programmed sudoku canonicalization it came across like it was the only way
gfroyle wrote:(d) Convert grid to a graph, use the graph-labelling program nauty, and then convert graph back to a grid.

The ONLY advantage of the last one is that it is very fast for puzzles with few clues - for 17 clue puzzles I can find the canonical grids at the rate of about 4000 per second (on a standardish 2.4G PIV). Given that I am processing millions of 17/18-clue puzzles, I need to work quickly with the canonical version to avoid repetitions.

are the nauty timings are for 17 clue puzzles (with holes) or for completed grids?
if not for completed grids then do the nauty timing estimates differ for completed grids?
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Is the nauty utility open source and or freely avaliable for *nix ?

Edit: never mind I found it here:
http://www.cs.sunysb.edu/~algorith/implement/nauty/implement.shtml
http://cs.anu.edu.au/people/bdm/

Though a description of how to use that to get a canocilzation would be cool
eclark

Posts: 14
Joined: 26 January 2006

I have been involved with the same discussion inr the Programmers index, Thanx for the info

tarek

Posts: 3759
Joined: 05 January 2006

gsf wrote:are the nauty timings are for 17 clue puzzles (with holes) or for completed grids?
if not for completed grids then do the nauty timing estimates differ for completed grids?

These are for 17 clue puzzles with holes; the times for completed puzzles are SIGNIFICANTLY slower... about 30 per second, rather than 4000 per second. I have not attempted to analyse the reasons behind this and to determine if it is an artefact of my implementation or a fundamental property of canonically labelling certain classes of graphs. My suspicion is that it is the latter, and it would be nice to figure out why, but time is my enemy at the moment... (and with a new baby in three weeks, no prospect of any improvement !)

Gordon
gfroyle

Posts: 214
Joined: 21 June 2005

eclark wrote:Though a description of how to use that to get a canocilzation would be cool

It is rather complex to describe from scratch, though I think it makes a great pedagogical example that I would very much like to pursue.

If you (ie the general reader) already understands the basics of permutation groups, stabiliser subgroups and canonically labelling vertex-coloured graphs, then it becomes a whole lot easier, but I was surprised by how fiddly it was when I implemented it.

I also implemented an elementary "relabel in all possible ways and choose the maximum" algorithm, but that took nearly 1/4 second for a completed grid. There are still some interesting programmatic and algorithmic issues in that as well though..

Unfortunately due to my current time constraints I can't see being able to do a proper complete write up, though if there is interest I can do one assuming prior knowledge of the concepts mentioned above..

Cheers

G
gfroyle

Posts: 214
Joined: 21 June 2005

gfroyle wrote:These are for 17 clue puzzles with holes; the times for completed puzzles are SIGNIFICANTLY slower... about 30 per second, rather than 4000 per second. I have not attempted to analyse the reasons behind this and to determine if it is an artefact of my implementation or a fundamental property of canonically labelling certain classes of graphs. My suspicion is that it is the latter, and it would be nice to figure out why, but time is my enemy at the moment... (and with a new baby in three weeks, no prospect of any improvement !)

interesting

I have coded a completed grid only canonicalization function based on the
sudoku counting techniques and it handles the sudoku17 completed grids
at 1500/sec/Ghz

at the moment I don't see how the function can be modified to handle holes
I'll repost my solver source with the 9x9 canon() code isolated in a separate file
so you can test

congratulations on the new arrival -- your first?
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

1500/s/g is great..

Seems that quite different techniques can be applied to completed grids, whereas for my graph-based technique they are the worst!

I'd like to explore it more.. but...

gsf wrote:congratulations on the new arrival -- your first?

When she arrives, she'll be a baby sister for Amy.. two girls !

Gordon
gfroyle

Posts: 214
Joined: 21 June 2005

at the moment I don't see how the function can be modified to handle holes
I'll repost my solver source with the 9x9 canon() code isolated in a separate file
so you can test

I'd love to see that too. I'm always looking for new code.

When she arrives, she'll be a baby sister for Amy.. two girls !

Congrats.
eclark

Posts: 14
Joined: 26 January 2006

### none

eclark wrote:Is the nauty utility open source and or freely avaliable for *nix ?

Edit: never mind I found it here:
http://www.cs.sunysb.edu/~algorith/implement/nauty/implement.shtml
http://cs.anu.edu.au/people/bdm/

Though a description of how to use that to get a canocilzation would be cool
emma

Posts: 1
Joined: 02 February 2006

eclark wrote:
gsf wrote:I'll repost my solver source with the 9x9 canon() code isolated in a separate file
so you can test

I'd love to see that too. I'm always looking for new code.

source reposted
details in http://forum.enjoysudoku.com/viewtopic.php?t=3009&highlight=
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

gfroyle wrote:1500/s/g is great..

I walked through the code and sped it up by 2X by amortizing the min lexicographic comparisons
so its canonicalizes at 3000/s/Ghz
posted source has been updated
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Next