## Minimalised solution grid

Everything about Sudoku that doesn't fit in one of the other sections
[Edit: Improved the scheme a bit]

If the purpose is to store ALL grids, not just specific solutions, a combination of canonicalization and udosuks scheme with boxes 357 empty could get the required data at least down to 78 bits. I started with all grids as:
Code: Select all
`123|4..|...456|...|...789|...|4..---+---+---.4.|...|......|...|.4....|..4|...---+---+---..4|...|......|.4.|......|...|..4`

Could even more cells be locked in canonical grids? That's 53GB for a set of all unique grids, and then you might of course still try to zip it.

RW
RW
2010 Supporter

Posts: 1000
Joined: 16 March 2006

78 bits for a canonicalized puzzle is surely impressive!

I'm focusing more on the "generalized" puzzles... My current effort is making a 114 bit coding, using 19 bit to code each box (which is the lowest bound if you consider the boxes independently, because 2^18<9!<2^19)...

After that, I can use 19 64-base digits to represent it, so 19 characters from 81, or 23.4%!

Will post my scheme later...
udosuk

Posts: 2698
Joined: 17 July 2005

RW wrote:If the purpose is to store ALL grids, not just specific solutions, a combination of canonicalization and udosuks scheme with boxes 357 empty could get the required data at least down to 78 bits.

This canonicalization is a nice idea.
How do you get 78 bits ?

JPF
JPF
2017 Supporter

Posts: 3753
Joined: 06 December 2005
Location: Paris, France

JPF wrote:How do you get 78 bits ?

Starting from the locked pattern, first placing digit 1 as in udosuks example:
Code: Select all
`123|4--|... 456|...|... 789|...|4.. ---+---+--- |4.|...|... |..|...|.4. |..|..4|... ---+---+--- ..4|...|... ...|.4.|... ...|...|..4`

In box 4, 5 options=3 bits, box 2, 6 options=3 bits. Then boxes 6 and 8, also 3 bits each, finally box 9 requires only 2 bits. That's 14 bits for digit 1.
Code: Select all
`123|4..|... 456|...|... 789|-1-|4.. ---+---+--- |4.|...|... |.1|...|.4. |..|..4|..1 ---+---+--- ..4|...|.1. ...|14.|... ...|...|..4`

Next, go for digit 7. Max 4 possibilities in box 4 = 2 bits, the other boxes as digit 1, total of 13 bits for digit 7.
Code: Select all
`123|4.7|... 456|...|... 789|.1.|4.. ---+---+--- .4.|...|7.. .71|...|.4. ...|..4|..1 ---+---+--- ..4|.7.|.1. ...|14.|..7 ...|...|..4`

Next, digit 8, max 4 options in box 2, otherwise as digit 1 => 13 bits. Digit 9 is done in the same way, 13 bits.
Code: Select all
`123|4.7|... 456|98.|... 789|.1.|4.. ---+---+--- 84.|...|7.. .71|...|.49 9..|..4|8.1 ---+---+--- ..4|.7.|.18 ...|14.|.97 ...|8.9|..4`

From now on there can be max 4 possibilities for any digit, the next two digits can be solved in 5*2bits each.
After that there's max 2 possibilities left, the 8th digit can be solved in 5 bits, and the last one comes for free. That's 14+3*13+2*10+5=78 bits.

This should work on any grid, and all grids can be done in the same order to get the same 78 bit format. If we instead start with box 1 + all digits 1 locked and then proceed in numerical order, we would reach 80bits/grid.

RW
RW
2010 Supporter

Posts: 1000
Joined: 16 March 2006

I think a bit of care needs to be taken to ensure that you are clear on the goal: Are you storing 'solution grids' (with 81 cells filled in), or are you storing puzzles (with less than 81 cells filled in)? If you are storing puzzles, are we assuming that we are only concerned with puzzles that have a unique solution?

I have been a bit confused about the goal here, given some of the posts.
tinfoil

Posts: 16
Joined: 06 June 2005

tinfoil wrote:Are you storing 'solution grids' (with 81 cells filled in), or are you storing puzzles (with less than 81 cells filled in)? If you are storing puzzles, are we assuming that we are only concerned with puzzles that have a unique solution?

I think both options have been discussed, but mainly I think we wish to store solutions, though not necessarily solutions with 81 cells filled in. The 48 cell pattern presented by JPF would be unique for all grids, the abscent information can be retrieved as naked singles.

RW
RW
2010 Supporter

Posts: 1000
Joined: 16 March 2006

this is more along the lines of general compression
but provides a pretty neat bytes per puzzle lower bound improvement

assume an 82 byte text puzzle form [0-9]*81 + newline for each puzzle
and allow related puzzles to appear near to each other
e.g., solvers that generate minimal puzzles from a given puzzle
typically list the puzzle in some order where only a few clues / empty cells
differ from line to line

a compressor that compresses windows of lines column-wise with move-to-front
run length encoding with a huffman back end compresses my year old
225M puzzle catalog from 18,453,487,706 bytes down to 252,266,884 bytes,
or 1.121 bytes per puzzle
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Previous