High density files for solution grids and 18 clues puzzles

Programs which generate, solve, and analyze Sudoku puzzles

Re: High density files for solution grids and 18 clues puzzl

Postby champagne » Fri Oct 25, 2024 9:16 am

Assuming the virtual catalog tested and cleaned, I am back to the topic "high density files..."

Although I see low interest currently, I am convinced that one day, the problem will come back.

Having in mind the last data base extensions in the field of the tridagon pattern, I think more and more that a canonical form

rank of the solution
81 bit field of the minimal sudoku pattern in this solution grid


is a good target.

This is in text mode something as

10 characters for the rank,
81/6 14 character for the pattern expressed in text mode, 6 bits per printable character


with then a potential for a .zip compressor.

Such a pattern is very easy to decompress using the virtual catalog.
Compression is a little harder with several steps:

Code: Select all
find the solution of the puzzle
morphs it to min lexical (and solve auto morphs if any)
Get the rank.


All tools are there now , but my brute force solver is not in DLL mode. I intend to build this DLL as next step, with all entries used in my programs.
champagne
2017 Supporter
 
Posts: 7697
Joined: 02 August 2007
Location: France Brittany

Re: High density files for solution grids and 18 clues puzzl

Postby coloin » Fri Oct 25, 2024 6:03 pm

champagne wrote:81 bit field of the minimal sudoku pattern in this solution grid ....
81/6 14 character for the pattern expressed in text mode, 6 bits per printable character....

thats a big leap again !!! but maybe it can be done to go from minlex solution grid and minlex pattern -> actual puzzle.

starting from a min lex pattern [of the puzzle] ... what would be the best way to describe and then define the 6^8 x 2 permutes ....[3,359,232] ?

identify a box to be box1 [1 - 9]
+/- transpose - 0 or 1
row swap 1-3 [012345]
col swap 1-3 [012345]
identify box to be box5 [0123]
row swap 4-6 [012345]
col swap 4-6 [012345]
row swap 7-9 [012345]
col swap 7-9 [012345]

9x2x6x6x4x6x6x6x6 = 3,359,232

I guess which ever is easiest to program efficiently ....
coloin
 
Posts: 2624
Joined: 05 May 2005
Location: Devon

Re: High density files for solution grids and 18 clues puzzl

Postby champagne » Fri Oct 25, 2024 7:33 pm

coloin wrote:
champagne wrote:81 bit field of the minimal sudoku pattern in this solution grid ....
81/6 14 character for the pattern expressed in text mode, 6 bits per printable character....

thats a big leap again !!! but maybe it can be done to go from minlex solution grid and minlex pattern -> actual puzzle.

starting from a min lex pattern [of the puzzle] ... what would be the best way to describe and then define the 6^8 x 2 permutes ....[3,359,232] ?

identify a box to be box1 [1 - 9]
+/- transpose - 0 or 1
row swap 1-3 [012345]
col swap 1-3 [012345]
identify box to be box5 [0123]
row swap 4-6 [012345]
col swap 4-6 [012345]
row swap 7-9 [012345]
col swap 7-9 [012345]

9x2x6x6x4x6x6x6x6 = 3,359,232

I guess which ever is easiest to program efficiently ....


In my DLL minlexing a given, the permutation is defined in the following way

Code: Select all
Transpose first or not
reorder rows
reorder columns
1296 * 1296 * 2 = 3 359 232 permutations


And the first rows gives the digit map.

The minlexing operation delivers also the list of auto morphs.

So where I am, I don't see this as a big leap. The only new point is to apply grid auto morphs to get the canonical view.

But a DLL for the brute force would be welcome, to have a relatively simple process.

Another issue is to have it accepted as the canonical form of any puzzle;
champagne
2017 Supporter
 
Posts: 7697
Joined: 02 August 2007
Location: France Brittany

Re: High density files for solution grids and 18 clues puzzl

Postby RichardGoodrich » Sat Jun 21, 2025 11:00 pm

dobrichev wrote:Hi Champagne,

A 10 bytes/grid implementation is here.
You need a 50GB lookup file if you want to extract a grid by its number. gridchecker laso has functionalities to get range of grids by index.

There is one free bit in the grid catalogue which can be used to store one of the 81-bit uncompressed bitmask of the givens. In total 20 bytes/puzzle, w/o 50GB catalogue, and each record is self-consistent.
Using grids catalogue you can store the grid index (17 bits) + somehow compressed givens bitmap.

On the other hand you can ignore the solution and represent the 18 givens in base-9 format (< 56 bits = 7 bytes assuming min-lex relabelling; < 10 bytes for the solution), again plus the pattern.

The pattern could be compressed by storing for each given the distance to previous one and somehow encoding into variable number of bits depending on statistics. If the distance can be encoded in less than 4.5 bits in average (81 bits uncompressed / 18 = 4.5) then pattern compression wins.
Assuming pattern min-lex, several positions at the beginning are guaranted empty and could be excluded from encoding - both uncompressed or compressed.

Note that applying such acrobatics would result in a file that isn't well compressable by standard utilities like ZIP.


[Aside]
Compression upon Compression is probably a path of diminishing returns? I have Python code that compresses and encodes/decodes a puzzle to a 58 char [I think] line with a special aphabet that encodes both the original puzzle and its solution. Could send a reference to the paper. However, that is NOT really relavent here!
Big1952
RichardGoodrich
 
Posts: 136
Joined: 12 December 2012
Location: Josephine, TX (north of Dallas, TX)

Re: High density files for solution grids and 18 clues puzzl

Postby RichardGoodrich » Wed Aug 06, 2025 7:01 pm

JPF wrote:In addition to the link I provided previously, it may be interesting to read this comment provided in the user manual of the gsf program.
gsf wrote:COMPRESSED GRID FORMAT
The sudz format efficiently compresses catalogs of row order minlex
canonical solution grids. Grids are organized by the top band (top 3
rows). There are 416 essentially different minlex bands and 5472730538
essentially different grids. A byproduct of minlex ordering is that
earlier bands tend to account for more grids than later bands. For
example, band 001 contains 1007170 grids, band 006 (the largest)
contains 96229042 grids, and bands 395,397,398,400,402,403,404,406,
408,409,410,412,413,414,415 contain no grids.

The sudz format is a sequence of windows. Each window contains the number
of grids and initial band index. Each grid has the band index (if
different from the previous grid), the number of automorphisms (if > 1),
the number of cells that differ from the previous grid, and the list of
differing cell values encoded using a basic singles constraint solver.
The windows are compressed using the Burrows-Wheeler bzip library.

The entire catalog of 5472730538 essentially different grids, in minlex
order, compresses to 5.70GiB, an average 8.96 bits/grid. Uncompress rate
is ~100K grids/sec/Ghz, or ~5 hours minimum to stream through the entire
catalog on a 2.8Ghz processor.

Note that this only deals with compressing the grid catalog and does not directly concern a puzzle package.

JPF


Well that is my recent experience myself! My SQLite3 database was ~5.06 GiB and took about 4.5 hours to generate. The few searches I did on it took 20-25 minutes. I am working on a dynamic solution that creates no big files....
Big1952
RichardGoodrich
 
Posts: 136
Joined: 12 December 2012
Location: Josephine, TX (north of Dallas, TX)

Previous

Return to Software