Canonical Form

Everything about Sudoku that doesn't fit in one of the other sections

Postby gsf » Tue Feb 13, 2007 9:32 pm

the catalog generation/compression just completed
here's part of the writeup that gives the main result
--man wrote:The entire catalog of 5472730538 essentially different grids, in minlex
order, compresses to 5.34GiB, or 8.38 bits/grid. Uncompress rate is
~77K grids/sec/Ghz, or ~7 hours minimum to stream through the entire
catalog on a 3Ghz processor.

I still don't know how to share the 5.34GiB
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby ravel » Tue Feb 13, 2007 10:04 pm

Great (this is meant seriously) - and now all sudokus please:)
ravel
 
Posts: 998
Joined: 21 February 2006

Postby gsf » Wed Feb 14, 2007 4:31 am

ravel wrote:and now all sudokus please:)

so what do you mean by this?
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby udosuk » Wed Feb 14, 2007 6:22 am

I presume ravel means the whole 6670903752021072936960 possible grids...:)

To share the 5.34GiB, you can apply for 6 gmail accounts (each have 1GB of space), and pack the file into 6 different zip or rar files to store them... Or there must be some free public email accounts with spaces up to 10GB... For example each Yahoo China email account has 2.5 GB of space I believe...

Check out this link, there are some free hosting sites that allow files up to 5GB (Wikiupload?), but the downside is, your files might get deleted if no one touch them for a period (30 days or 90 days etc)...

Another option is to split your catalogue to 2 ~3GB archives, burn them into 2-DVD packs and find a publisher to help you put them on market... Perhaps you'll make a fortune out of this...:D
udosuk
 
Posts: 2698
Joined: 17 July 2005

Postby gsf » Wed Feb 14, 2007 7:18 pm

udosuk wrote:I presume ravel means the whole 6670903752021072936960 possible grids...:)

yes, but thats clearly not the answer if we're talking about storing grids
even at 1 bit per grid that's a lot of atoms

the catalog provides each essentially different grid and the #automorphisms
from that you can permute from now until forever to generate the 6e21

if you'r looking for 1 of the 6e21 then use #automorphisms to pick one of the permutations
for a selected essentially different grid
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby gsf » Wed Feb 14, 2007 7:21 pm

the number of automorphisms for any grid is in the range 1..648 inclusive
do we know the impossible ones (number of automorphisms) in that range?
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby coloin » Thu Feb 15, 2007 12:29 am

gsf wrote:well, it turns out we can use a smaller envelope for our calculations ...

But can we go furthur ?
If it is possible to reduce the size furthur - and also to maybe make it easier to recompile ?

This is the [B1]B2B3B4B7 pattern
Code: Select all
***|***|***
***|***|***
***|***|***
---+---+---
***|...|...
***|...|...
***|...|...
---+---+---
***|...|...
***|...|...
***|...|...


For each of the grid solutions of this you can always [?] add 2-6 clues and get a valid puzzle
Code: Select all
+---+---+---+
|123|498|765|
|456|127|983|
|789|365|142|
+---+---+---+
|214|...|...|
|367|..9|..4|
|598|...|...|
+---+---+---+
|631|2..|...|
|842|...|...|
|975|...|...|
+---+---+---+ 3 clues added


We could "filter" out all the possible [B1]B2B3B4B7 possibilities in the minlex grids It could be only 4 million. [?] Possibly much less.

Each of these B2347 will be fairly easily representable with 2-4 clues sometimes [around 10%] 5 extra clues [Edit - even more rarely 6 clues]

What we will have will be 4 miillion minlex B2347s

Each B2347 can have an average of around 2500 grid completions - but because of the min lex it will be much less. Each of the minlex grids could be represented by a subgrid/puzzle

the above grid might be
Code: Select all
1000000  ........9..4......2.................
[minlexB2347] - number 1000000
[B5689 clues] - ........9..4......2.................


Is this "compression" system less than 9 bytes per grid ?

C
coloin
 
Posts: 1624
Joined: 05 May 2005

Postby gsf » Thu Feb 15, 2007 3:10 am

coloin wrote:Is this "compression" system less than 9 bytes per grid ?

once I post the catalog and its layout you can experiment with alternatives
the layout I did takes advantage of diffs between adjacent minlex grids
and at first blush B47 seems to skip over adjacencies
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby gsf » Thu Feb 15, 2007 8:16 am

coloin wrote:Is this "compression" system less than 9 bytes per grid ?

I just reread your post
the catalog I generated averaged 8.34 bits per grid
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby udosuk » Thu Feb 15, 2007 1:43 pm

Even if we represent each solution grid in gsf's catalogue with a 17-19 clues puzzle (assuming it's possible) and then compress them together, I doubt if it would save much more space as of now... That's because his compression mechanism makes use of the similarity between the adjacent grids sorted in minlex order... Am I right gsf?:?:
udosuk
 
Posts: 2698
Joined: 17 July 2005

Postby RW » Thu Feb 15, 2007 2:26 pm

Superb work gsf! I want that catalogue!!
udosuk wrote:I presume ravel means the whole 6670903752021072936960 possible grids...

I presume ravel means all minimal sudoku puzzles... which is a whole lot more than the number above (and a whole lot more interesting:) ). How to find them all? Well, it's easy. Just buy yourself one of these!:D
RW
2010 Supporter
 
Posts: 1000
Joined: 16 March 2006

Postby gsf » Thu Feb 15, 2007 4:58 pm

udosuk wrote:Even if we represent each solution grid in gsf's catalogue with a 17-19 clues puzzle (assuming it's possible) and then compress them together, I doubt if it would save much more space as of now... That's because his compression mechanism makes use of the similarity between the adjacent grids sorted in minlex order... Am I right gsf?:?:

that's the main point
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby gsf » Thu Feb 15, 2007 7:14 pm

gsf wrote:the number of automorphisms for any grid is in the range 1..648 inclusive
do we know the impossible ones (number of automorphisms) in that range?

having the catalog in hand is ... handy
here is a table of #autmorphisms and #essentially-different-grids with that number of automorphisms
Code: Select all
   1 5472170387
   2     548449
   3       7336
   4       2826
   6       1257
   8         29
   9         42
  12         92
  18         85
  27          2
  36         15
  54         11
  72          2
 108          3
 162          1
 648          1
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Postby RW » Thu Feb 15, 2007 9:57 pm

gsf wrote:here is a table of #autmorphisms and #essentially-different-grids with that number of automorphisms

Nice! That confirms the number 6670903752021072936960 once again. Could you possibly post the grids from 27 up? Actually, maybe there should be an own thread for "the catalogue". I suspect this is otherwise going to drift very much off topic.

RW
RW
2010 Supporter
 
Posts: 1000
Joined: 16 March 2006

Postby kjellfp » Mon Feb 19, 2007 10:03 pm

gsf wrote:
gsf wrote:the number of automorphisms for any grid is in the range 1..648 inclusive
do we know the impossible ones (number of automorphisms) in that range?

having the catalog in hand is ... handy
here is a table of #autmorphisms and #essentially-different-grids with that number of automorphisms
Code: Select all
   1 5472170387
   2     548449
   3       7336
   4       2826
   6       1257
   8         29
   9         42
  12         92
  18         85
  27          2
  36         15
  54         11
  72          2
 108          3
 162          1
 648          1

You might have seen my post from November 2005 which leads to the same list as yours.

I see by the way that you have created the grids in compressed form. I think it should be possible with i little bit more than 2 bits/grid (uncomporessed, but I don't think compression will be significant), but I don't know how fast you expect the program that gets the grid to be. And you can't have the grids ordered in whatever way you like - actually they are forced in a very rigid order.

This is how:

First, index the 44 column configuration classes of the bands. Then index the 416 band classes, at least sorted by the configuration index.

Now, the grids are first sorted by the 416-index of the top band. With the top band on its place, we run through the 56*56*56 possible choices for the column configuration of band 2 and 3, and skip those where the 44-index of band 2 and 3 are smaller than the index of band 1 - or when the index of band 3 is smaller than the one for band 2. Then we can run through the possible bands for the configuration of band 2 and 3. For each choice, we need one single bit telling whether this is a grid representing its class or not. Almost 50% (*) of the grids are representing its class, that's why we need 2 bits for each class.

A lookup dictionary for the 416*56*56*56 cases will help us rapidly find the place to look for our grid. Getting all bands from a configuration is fast, it takes me 0.0025 seconds for all 44 configurations on 3 GHz, including generation of lookup tables.

(*) beacuse almost every grid (19999 out of 20000) has a trivial automorphism group, so the representation mentioned above will in general visit every class twice; the representative and some transposed grid. Ironically, if a grid is not identified with it's transposed, we get about twice as many classes (10945437157), but this time the here mentioned method would give a bit stream with basically ones. If we replace the stream with the leap sizes from one zero to another, we might need less than 1 bit/grid!
kjellfp
 
Posts: 140
Joined: 04 October 2005

PreviousNext

Return to General