## Minimalised solution grid

Everything about Sudoku that doesn't fit in one of the other sections

### Minimalised solution grid

Normally, to post a solution grid, we'll post all 81 digits... For example, the MC (most canonical) grid:
Code: Select all
`123456789456789123789123456231564897564897231897231564312645978645978312978312645`

However, many of these digits are just redundant information... Because once we know 8 cells of each row/column we can deduce the 9th easily:
Code: Select all
`12345678.45678912.78912345.23156489.56489723.89723156.31264597.64597831..........`

Also, once we know 8 cells in a box we don't need to see the 9th... That means 4 more redundant digits...
Code: Select all
`12345678.45678912.78.12.45.23156489.56489723.89.23.56.31264597.64597831..........`

But that's not the end... Because if we fill in 8 boxes the 9th box must be determined... So we don't need to see the 4 digits in box 9 here...
Code: Select all
`12345678.45678912.78.12.45.23156489.56489723.89.23.56.312645...645978............`

So 56 digits are what we need (so far)... That's about 2/3 of the initial information...

But can we do more? What I need is a template that when superimposed on any solution grid would still guarantee the solution stay unique...

Here is the symmetrical version of the 56-cell template we got so far:
Code: Select all
`.234.678.4567.91237891.3456231...897.........897...5643126.59786459.8312.783.264.`

This issue has some practical value... Especially for people who store millions/billions of solution grids... I'm sure they wouldn't mind to free up 1/3 or even more of the space occupied!
udosuk

Posts: 2698
Joined: 17 July 2005

The most efficient way would be to save the solution grid by a minimal puzzle with the lowest number of clues (17 to 22 ? clues) => 21% to 27% of the initial information.

But if you want to keep the same template :
You can remove the boxes B1B5B9 (see here ) and one clue in each of the remaining boxes B2B3B4B6B7B8.
You keep only 81 - 3x9 - 6 = 48 digits
= 59% of the initial information.

Code: Select all
` 1 2 3 4 5 6 7 8 9 4 5 6 7 8 9 1 2 3 7 8 9 1 2 3 4 5 6 2 3 1 5 6 4 8 9 7 5 6 4 8 9 7 2 3 1 8 9 7 2 3 1 5 6 4 3 1 2 6 4 5 9 7 8 6 4 5 9 7 8 3 1 2 9 7 8 3 1 2 6 4 5`

will be stored like this :
Code: Select all
` . . . 4 5 6 7 8 9 . . . 7 . 9 1 . 3 . . . 1 2 3 4 5 6 2 3 1 . . . 8 9 7 5 . 4 . . . 2 . 1 8 9 7 . . . 5 6 4 3 1 2 6 4 5 . . . 6 . 5 9 . 8 . . . 9 7 8 3 1 2 . . .`

JPF
JPF
2017 Supporter

Posts: 3755
Joined: 06 December 2005
Location: Paris, France

Thanks JPF! I suspected there must be some systematic method like this!

So is 48 the absolute minimum (for a universal template)?

An equivalent template (keeping the filled cells tightly connected in 2 sections would be):
Code: Select all
`.#####...######...#####....###....#####...#####....###....#####...######...#####.`

I'm studying this problem because I'm trying to find a way to shorten the current even 81-char coding for solutions (and perhaps puzzles)...
udosuk

Posts: 2698
Joined: 17 July 2005

udosuk wrote:So is 48 the absolute minimum (for a universal template)?

As there are 6,670,903,752,021,072,936,960 (~ 6.67 x 10^21) different solution grids, the minimum number of decimal digits to store a grid should be 22.

In that case the universal "template" has to be like this :
Code: Select all
`xxxxxxxxxxxxxxxxxxxxxx`

If you only want to store essentially different grids, you will need 10 decimal digits (5,472,730,538 solutions)

JPF
JPF
2017 Supporter

Posts: 3755
Joined: 06 December 2005
Location: Paris, France

JPF wrote:As there are 6,670,903,752,021,072,936,960 (~ 6.67 x 10^21) different solution grids, the minimum number of decimal digits to store a grid should be 22.

In that case the universal "template" has to be like this :
Code: Select all
`xxxxxxxxxxxxxxxxxxxxxx`

It'd be cool if someone could decide a simple well-defined coding that map the 6,670,903,752,021,072,936,960 grids to 22 decimal digits, but I doubt it... By "template" what I really mean is a "mask"... So is 48 the minimum size of a universal mask you can use to guarantee a unique solution?

Note if we use all 52 letters (2 cases) and 10 digits plus a few symbols we could code all grids using 12 characters only... But the key is how...
udosuk

Posts: 2698
Joined: 17 July 2005

I'm far from being a combinatorics expert, but here are 2 suggestions :

1) There are 9! = 362,880 different boxes.
You only need 6 decimal digits to label all the boxes.
I don't think it's a big task "to code this map".

So, as there are 6 remaining boxes, you will need 6 x 6 = 36 decimal digits to code the grid.

2) If you are ready to only consider essentially different grids :
The first row being 123456789 by definition, B2 and B3 can only have 6! =720 different values. Both B2 & B3 need only 3 decimal digits
For the all grid : 2x3 +4x6 = 30 decimal digits.

We can probably do better by considering the number of possible (B2B3) which is much lower than 720 x 720 = 518,400
If this number is lower than 100,000, we can save one decimal digit (down to 29)

etc...

with a more complex coding, it's possible to go down to 22... (or 10 for the ess. diff. grids)

The decimal system is the not the most appropriate to optimize and to compare the storage of these informations.
It would be better to use bits.

JPF
JPF
2017 Supporter

Posts: 3755
Joined: 06 December 2005
Location: Paris, France

Interesting topic, I've been thinking about the same issue for a while, mostly the possibility of using bits. I think 4 bits is the minimum for one value in the grid. This would give 4*81=324 bits for a whole grid, but using the 48 cell template we would only need 4*48=192 bits. Multiply by 5,472,730,538 and you get 1,050,764,263,296 bits or 131,345,532,912 bytes. So all essentially different grids could be stored in only 132 GB of hard disk space... Of course you could store them in less space if you use a minimal sudoku to define each grid, but if you actually want to do something with the whole collection I think this would probably slow it down too much, even if your solver solved 20,000 puzzles per second, any research on the full set would take 76 hours longer... Question is if it is worth it to even reduce the grids to 48 cells or just get a 222GB harddisk to store all full grids, and save a few hours on every test done on the grids.

Same goes for labelling all possible boxes etc. I'm definitely not a computer programmer, but wouldn't it be much slower to run a search on large amounts of grids for anything specific if they were saved as some complex template than if they actually are saved digit by digit? Or can you make the decoding faster than just reading 324 bits? As an example, say I wanted to count the 2-digit unavoidables in all possible grids and get the absolute statistics. This would require me to look at each grid separately, not just a bunch of box labels. With all full grids saved in a file it could probably be done in a couple of days.

RW
RW
2010 Supporter

Posts: 1000
Joined: 16 March 2006

RW wrote:I think 4 bits is the minimum for one value in the grid. This would give 4*81=324 bits for a whole grid, but using the 48 cell template we would only need 4*48=192 bits.

4 bits would allow you to store a data from 0 to 15, that's 16 different values, but a cell is only of 9 different values, so I consider it a waste of space (i.e. for every 4 bits, you're only using 0001 to 1001, and all the other 7 values are not appearing at all)...

How about this, for every box we only consider the "modular difference" between 2 cells, which must be of values 1 to 8, codable with 3 bits... However the 1st cell must require 4 bits to store anyway...

I agree that for computing purposes then you should just store them in bare format, i.e. no encoding at all... But what I had in mind is like if a site is storing solutions to billions of 9x9 puzzles (not necessarily vanilla ones, but also killers etc), and just need to output a particular solution when requested, then some encoding plus compression could save some storage space...
udosuk

Posts: 2698
Joined: 17 July 2005

Guys, I noticed that on the 48-cell mask each row/column has 5 or 6 filled cells... So I wonder if the following 45-cell masks could guarantee a unique solution:
Code: Select all
`###....######....######.....#####.....#####.....#####.....######....######....###.#####...#####....####....####....####....####....####....#####...#####...#####..`

There are possible 2-bit unavoidables but I wonder if the mechanism from other cells would prevent them from forming... So could anyone search if there is any multiple solutional grid with these masks?

Of course the first grid above has 3 filled boxes so in fact only 42 cells are necessary...
udosuk

Posts: 2698
Joined: 17 July 2005

udosuk wrote:How about this, for every box we only consider the "modular difference" between 2 cells, which must be of values 1 to 8, codable with 3 bits... However the 1st cell must require 4 bits to store anyway...

Interesting idea, but wouldn't you need a bit that tells if this difference is to be added or substracted from the previous cell? You could do 1*4 bits + 80*3 bits if the first four bits define which row have digit 9 in column 9. The next 8*3 bits define the column of the remaining 9s, one row at a time. Now you only need three bits each for the remaining cells. Alternatively you could save some more bits more by saving all digits 1-8 by defining the columns in 1*4+7*3 bits, the column of the last digit doesn't need to be defined, neither does the cells for digit 9. However, this approaches your 48 cell template as it would require some computing to decode the information.

udosuk wrote:So could anyone search if there is any multiple solutional grid with these masks?

Here's one that shows exactly why the last mask doesn't work, the first one falls for the same reason.
Code: Select all
` *-----------* |416|382|759| |925|764|183| |783|915|426| |---+---+---| |841|653|297| |697|128|345| |352|479|618| |---+---+---| |578|231|964| |.39|846|57.| |.64|597|83.| *-----------*`

RW
RW
2010 Supporter

Posts: 1000
Joined: 16 March 2006

RW wrote:
udosuk wrote:How about this, for every box we only consider the "modular difference" between 2 cells, which must be of values 1 to 8, codable with 3 bits... However the 1st cell must require 4 bits to store anyway...

Interesting idea, but wouldn't you need a bit that tells if this difference is to be added or substracted from the previous cell?

In "modular difference" I mean the result is "mod 9", so you don't need to worry about the add/subtract operation, it's always "add"... E.g 5+8=13=4 (mod 9)... Alternatively, each value is in fact one of the following 8 "wrap around" operations from the previous value: -4,-3,-2,-1,+1,+2,+3,+4...

RW wrote:You could do 1*4 bits + 80*3 bits if the first four bits define which row have digit 9 in column 9. The next 8*3 bits define the column of the remaining 9s, one row at a time. Now you only need three bits each for the remaining cells. Alternatively you could save some more bits more by saving all digits 1-8 by defining the columns in 1*4+7*3 bits, the column of the last digit doesn't need to be defined, neither does the cells for digit 9. However, this approaches your 48 cell template as it would require some computing to decode the information.

Nice scheme... I've thought about something similar but perhaps a tad bit more efficient, will post it later...

RW wrote:Here's one that shows exactly why the last mask doesn't work, the first one falls for the same reason.

Thanks! I must have been blinded not to see that...

So I suppose the 48-cell mask is the best we can do? Using the "modular difference" scheme, I could code it using 1*4+47*3 bits=145 bits... But I suspect we can do better...
Last edited by udosuk on Mon Oct 16, 2006 4:16 am, edited 3 times in total.
udosuk

Posts: 2698
Joined: 17 July 2005

udosuk wrote:In "modular difference" I mean the result is "mod 9", so you don't need to worry about the add/subtract operation, it's always "add"...

Thanks! I must have been blinded not to see that...

Still thinking about doing computations on the whole set, I'm not sure if I'd like to do 80*5*10^9 mod 9 calculations. For this my scheme of defining columns should be faster, as the data can be used as it is. For saving solutions, I don't think that it's worth it to save the vanilla solutions, an efficient backtracking solver can get it before the user has time to press the "show solution" button. Don't know how fast you can solve all variants this way.

The 48 cell mask unfortunately doesn't work for irregular box shapes, I don't think you can go much further than this with them:

Code: Select all
`********.********.********.********.********.********.********.********..........`

RW
RW
2010 Supporter

Posts: 1000
Joined: 16 March 2006

I wrote:I've thought about something similar but perhaps a tad bit more efficient, will post it later...

Here is another scheme...
Let's leave b357 empty, and consider the digit 1 in b1... It has 9 possible positions, so 4 bits are needed to locate it:
Code: Select all
`###......###......###............................................................`

Suppose it's in r1c1, that leaves 6 possible cells for the 1s in each of b2 & b4, 3 bits for each:
Code: Select all
`1...........###......###....##.......##.......##.................................`

Suppose they're in r2c4 & r4c2, that leaves 6 possibles cells for the 1s in each of b6 & b8, 3 bits for each:
Code: Select all
`1...........1...............1.............###......###....##.......##.......##...`

Suppose they're in r5c7 & r7c5, that leaves 4 possible cells for the 1s in b9, needing 2 bits at last...
Code: Select all
`1...........1...............1.............1...............1...........##.......##`

So 4+3+3+3+3+2=18 bits are enough to locate all the 1s in b124689, and we only need to locate digits 1 to 8 to determine the 6 boxes...

Therefore the total number of bits required = 18*8 = 144, a tad bit better than our previous 145-bit scheme!

Of course, for subsequent digits we don't need 4 bits to locate it in b1 (as only 8 cells are left), which enable us to use 7 bits fewer...

And we can save some more bits when store the later digits because the possible cells left are fewer and fewer...

Still looking for the most efficient scheme...
udosuk

Posts: 2698
Joined: 17 July 2005

Interesting discussion.
The trivial way to get a shorter representation is to compress it (using tar or zip). I tried it with the million puzzles i have in a file (but including the rating in this case) and got it down to 26% (using bz2). So when i divide all grids into some parts and only uncompress one part at a time to work with, 120 GB should be enough to operate on all grids.
[Added:]The compression rate of course is worse for full grids. I tried the solution grids of Gordon's list and got almost 40%.
ravel

Posts: 998
Joined: 21 February 2006

ravel wrote:Interesting discussion.
The trivial way to get a shorter representation is to compress it (using tar or zip). I tried it with the million puzzles i have in a file (but including the rating in this case) and got it down to 26% (using bz2). So when i divide all grids into some parts and only uncompress one part at a time to work with, 120 GB should be enough to operate on all grids.
[Added:]The compression rate of course is worse for full grids. I tried the solution grids of Gordon's list and got almost 40%.

there are a few driving factors that may be a cross purposes:
(1) full grid vs. puzzle (vs. candidate grid for candidates only)
(2) compress grids individually vs. compress grids in groups

some quick experiments on gordon's 17's (36628 puzzles, 82 bytes per raw puzzle)
shows that the solution grids can be compressed as a group at least down to
13.6 bytes per puzzle and the puzzle grids (17 clues) can be compressed at least down to 9.9 bytes per puzzle

I'll check with my compression colleague to see if any other compression methods
are suited to compressing grids
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Next