The New Sudoku Players' Forum

by **champagne** » Tue Oct 29, 2024 5:23 pm

Having now a first version of a brute force solver in DLL, I had all the tools to make a test on one possible format for a canonical puzzle.

As proposed in this thread,
http://forum.enjoysudoku.com/high-density-files-for-solution-grids-and-18-clues-puzzles-t42669-60.html

I think that a good competitor for this is a puzzle shown as

The rank 1- 5 472 730 538 of the solution grid
A binary field of 81 bit for the active cells of the puzzle.

I made a test on “loki”, the first puzzle known with a tridagon pattern

Code: Select all: 57....9..........8.1.........168..4......28.9..2.9416.....2.....6.9.82.4...41.6.. 576843921493261578218759436951687342647132859832594167184326795365978214729415683 solution grid

Code: Select all: 123456789457189236698372145241893657589627314736514928314968572862735491975241863 min lexical 1.3.5......71.9...69.37......1..3..75.96.73.....51.......96..........4....5...86. min lexical loki

To get the min lexical in this case, we had first to transpose the grid, then
465 978 312 rows reordering
978 645 132 columns reordering
And this solution grid has no auto morph. (expected to be 1/10000 of the solution grids)

The solution grid has the rank 3 636 310 645.
Using this 64 characters equivalence
".123456789abcdefghiklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ&"
The bit field becomes
mx5sxBL13p.84Z

so, in text mode, the canonical “loki” could be
3636310645mx5sxBL13p.84Z

Easy to decompress (some micro seconds per puzzle),
Not too hard to compress (28 milliseconds in my test)
with a sorting sequence per solution grid.

and for sure, 2 puzzles having a different canonical morph are different.
note: in my current code, the auto morphs issue is not covered

I’ll make a test on the 17 clues data base to see what gives the .zip compressor on such a file
but it would be interesting to test the new data base of puzzles having the tridagon pattern.
I am missing the link to download the current status of this data base;

by **coloin** » Tue Oct 29, 2024 10:19 pm

Very good.
I was going to say that you dont need to worry about automorphs in the minlex solution grid.
But I see now that two puzzle isomorphs would be different in the minlex solution grid...
Maybe the solution could be that on the rare occasion the solution grid is isomorphic .. you somehow generate all the isomorphic puzzles and take the min lex.

please could you show how the 64 code works to generate the pattern !

presumably this code gives the minlex solution grid and the puzzle

by **champagne** » Wed Oct 30, 2024 3:56 am

Hi coloin

coloin wrote:Very good.
I was going to say that you dont need to worry about automorphs in the minlex solution grid.
But I see now that two puzzle isomorphs would be different in the minlex solution grid...
Maybe the solution could be that on the rare occasion the solution grid is isomorphic .. you somehow generate all the isomorphic puzzles and take the min lex.

Right for all.
The code missing in my current draft is the comparison of 2 bit fields of size 81 to chose the min lexical one.
But even if it is not a frequent case, this must be done "live" in the clearing of redundant puzzles appearing in the scan of a solution grid to extract puzzles of size X.

coloin wrote:Very good.
please could you show how the 64 code works to generate the pattern !
presumably this code gives the minlex solution grid and the puzzle

The code uses all DLLS recently created.
my draft (main actions)

call the brute force solver to get the solution grid
call the minlexing function and get back the minimal sol and the perm descriptors (usually one with a maximum of 648)
Morph the puzzle to the first permutation (not finished do the same with auto morphs and keep the min lexical 81 bits field)
call the virtual catalog to get the rank
Turn all to the output format.
Here my working draft written yesterday

Hidden Text: Show

Code: Select all: const char* bit6 = ".123456789abcdefghiklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ&"; void PuzzleInBitField(char* puz, char* d) { // d is 14*6=84 bits in 14 char for (int i = 0, k = 0; i < 14; i++) { register int v = 0; for (int j = 0; j < 6; j++,k++) { if (puz[k] != '.')v |= 1 << j; d[i] = bit6[v]; } } } void C11() { cout << " C11_ find canonical for a file of puzzles" << endl; char* ze = finput.ze,zem[82], zem2[82]; zem[81]=0; zem2[81] = 0; int smin[81]; ptpgc= SkbsSetModeWithAutos(); pzhe= SkbfGetZhePointer(); int* vv = pzhe->gsol; if (SkvcatSetModeGetVCDESK(2, &pvcdesc)) { cout << " set mode failed" << endl; return; } while (finput.GetLigne()) { if (strlen(ze) < 81)continue;// no empty line cout << ze << " puzzle to caniconalize" << endl; { int ir1 = SkbfCheckValidityQuick(ze); if (ir1 != 1) { cout << " invalid puzzle ir1="<<ir1 << endl; continue; } for (int i = 0; i < 81; i++) cout << vv[i] + 1; cout << " solback" << endl; //cout << pzhe->gsol << " sol to do min lex" << endl; SkbsGetMin(vv, smin); for (int i = 0; i < 81; i++) cout << smin[i] + 1; cout << " min solback" << endl; } GRIDPERM wgperm; char* pbase = ptpgc->t[0]; for (int i = 0; i < 19; i++) cout << pbase[i]; cout << " pbase nperm=" << ptpgc->nt << endl; wgperm.Import19(vv, pbase); wgperm.MorphPuzzle(ze, zem); cout << zem << " puz morphed" << endl; uint64_t r = SkvcatGetRankFromSolMin( smin); if (!r) { cout << " failed to find rank " << endl; continue; } for (int i = 0; i < 81; i++) cout << smin[i] + 1; cout << " solmin rank " << r << endl; char zbit[15]; zbit[14] = 0; PuzzleInBitField(zem, zbit); cout << zbit << " bbit pattern for the puzzle" << endl; cout << r << ";" << zbit << " proposal for canonical" << endl; if (ptpgc->nt > 1) { cout << " auto morphs seen, check more" << endl; for (int i = 1; i < ptpgc->nt; i++) {// check auto morph get the smallest wgperm.Import19(vv, pbase); wgperm.MorphPuzzle(ze, zem2); } } } cout << "end of file " << endl; }

BTW, as the use of such a canonical morph implies a compressor decompressor tool,
a simpler option can be

33 bits for the rank of the solution grid
81 bits for the puzzle

114 bits in total exactly 19 bytes using the text mode 6 bits per byte, saving 5 bytes.

by **champagne** » Thu Oct 31, 2024 11:43 am

I an running a first validation test on the file of 4 634 322 sudokus in the t&e 3 "mith"s data base.

a stupid bug in the previous post, one character is missing to cover the 64 possible values of a 6 bit field, but this doesn't change the following results.

757 254 puzzles have been processed in a little more than 3 and half hours
around 57 sudokus per seconds to get the "canonical"
Only 5978 different solution grids are there, giving an average 126 sudokus per solution grid.

I suspected something similar. The full run will take about 30 hours, so the final result is expected to-morrow.

next task is to write the canonical => puzzle code and see what is the average run time per canonical.

by **blue** » Thu Oct 31, 2024 11:54 am

champagne wrote:a stupid bug in the previous post, one character is missing to cover the 64 possible values of a 6 bit field, but this doesn't change the following results.

Before you get too far: there were 3 characters missing ... 'j' and 'J' included.
Another thing: this line in your code, is reading 3 bytes of garbage data.
There's a "fix" for it below.

Code: Select all: if (puz[k] != '.')v |= 1 << j; -> if (k < 81 && puz[k] != '.')v |= 1 << j;

by **champagne** » Thu Oct 31, 2024 12:29 pm

blue wrote:
champagne wrote:a stupid bug in the previous post, one character is missing to cover the 64 possible values of a 6 bit field, but this doesn't change the following results.

Before you get too far: there were 3 characters missing ... 'j' and 'J' included.
Another thing: this line in your code, is reading 3 bytes of garbage data.
There's a "fix" for it below.

Code: Select all
if (puz[k] != '.')v |= 1 << j; -> if (k < 81 && puz[k] != '.')v |= 1 << j;

As usual, sharp eyes from blue.
The "6 bits transformer" must have 64 four printable characters. i added the missing "jJ"
and for the second fix, I preferred this

Code: Select all: for (int j = 0; j < 6; j++,k++) { if (puz[k] != '.')v |= 1 << j; d[i] = bit6[v]; if(k==80) break; }

thanks for help

by **champagne** » Sat Dec 07, 2024 10:43 am

I thought that this topic would raise more interest.
As I need it for my own use, I finished the task with my personal choices.

The first target of a “canonical name” is to filter redundancy in the sudokus.

The proposal is the following in several steps

A==============
Take any sudoku
B===============
Find the solution grid and move it to the min lexical morph.
Move the puzzle in the same way, and if the grid has auto morphs, take the min lexical expression of the puzzle.
C============
Replace now the solution grid by its catalog number and the puzzle by a 81 bit field.
Express the 81 bit field as a sequence of 6x13+3 bits and uses a table to have it as a 14 printable characters
D=============
As the catalog require 33 bits, express all as a 19 printable characters
5x6 + 3 for the catalog number
3+13x6 for the 81 bits field.

B<=>C<=>D just requires some lines of code if you have the DLLs

A=>B is mainly 3 calls to the DLLs,
but B=>A is requiring the grid morphing parameters from the “given solution grid” to the ED solution grid. This is in my code a minimum of 28 values (transpose, rows map, columns map, digits map)
Most often, users will not use it but will keep the original puzzle attached to the “canonical name” (with usually other parameters such as the rating).

In this case, the “name” acts as the key for a database. The way the key has been built, the sudokus distribution should be well balanced for any “primary index” made of the first 5 characters

The “C” status is interesting as well. It keeps visible the solution grid number.
I describe the process in the next posts

by **champagne** » Sat Dec 07, 2024 10:49 am

Equivalence table.
The table is made of 64 printable characters, one for each of the 6 bits pattern. Any sequence could work, but to have a sort of the final key giving the same order as the numeric sort on the rank, the printable characters must have an increasing ASCII value. This was not the case with my first shot. I now use
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz()

Find the ED solution grid and convert the puzzle

With the published DLLs, this is a quite simple code

Hidden Text: Show

First, using the brute force, the puzzle is checked and solved.
Then using the minlexing DLL, the solution is pushed to min and the grid auto morphs are in the return area
The virtual catalog gives the rank,
And using the list of grid permutations (including auto morphs) the lowest morph of the puzzle is worked out.

The last step is to change the lowest puzzle to a 14 bytes printable sequence using the following function.

Code: Select all: void PuzzleInBitField(char* puz, char* d) { // d is 14*6=84 bits in 14 char for (int i = 0, k = 0; i < 14; i++) { register int v = 0; for (int j = 0; j < 6; j++,k++) { if (puz[k] != '.')v |= 1 << j; d[i] = bit6[v]; if(k==80) break; } } }

by **champagne** » Sat Dec 07, 2024 10:53 am

Expanding the rank + 14 status

As said, we here restore the min lexical status of the original puzzle.
The process is simple, but requires the virtual catalog DLL;
Here is my code for a given record of the entry file

Hidden Text: Show

Once the entry checked, we have the rank and the 14 bytes
Using the virtual catalog, the rank is expanded to the min solution grid
Then the 14 bytes are expanded to the puzzle to print.

This is done using a small converter printable => 0-511 value

Code: Select all: int Get6Bits(int i, char* bf14) { register int v = bf14[i]; if (v >= 97 && v <= 123)v -= 61; // a z ( else if (v >= 65 && v <= 90)v -= 55; // A Z else if (v >= 48 && v <= 57)v -= 48; // 0-9 else if (v == 125) v = 63; // ) else v = 0; // not valid put to empty return v; }

And the replacement of bits set to 1 by the solution grid value done by this function

Code: Select all: void Expand14to81(char* puz, char* bf14, char * sol) { int k = 0; for (int i = 0, k = 0; i < 14; i++) {// 6 bits to expand register int v = Get6Bits(i,bf14); int nd = (i == 13) ? 3 : 6; for (int j = 0; j < nd; j++,v>>=1,k++) { if (v & 1)puz[k] = sol[k] ; else puz[k] = '.'; } } }

by **champagne** » Sat Dec 07, 2024 10:57 am

Moving from rank + 14 to 19 and reverse.

We could think of a direct operation puzzle<-> 19.
As I think that the rank + 14 will be in use, I preferred to see the final “naming” step as a pure binary transformation of the rank+14 image.

Basically, the target is simple.
The rank needs 33 bits in binary mode.
Take 3 bits from the rank, use them to have 81+3 bits = 14 bytes left
The additional constraint added is to do it in such a way that the sorting sequence stays nearly the same in both cases. The highest bits of the rank must then come first.
But any way to do the conversion and reverse would work.

Here is my code

Code: Select all: //======================================== can r+14 =>19 int i = 0; ze[i] = 0; uint64_t rank = strtoll(ze, 0, 10); char* zeb = &ze[i + 1]; char z19[20]; memset(z19, 0, 20); register uint64_t r = rank, r2; { r2 = (r >> 27) & 0x3f; z19[0] = bit6[r2];// highest bits r2 = (r >> 21) & 0x3f; z19[1] = bit6[r2]; r2 = (r >> 15) & 0x3f; z19[2] = bit6[r2]; r2 = (r >> 9) & 0x3f; z19[3] = bit6[r2]; r2 = (r >> 3) & 0x3f; z19[4] = bit6[r2]; } register int rx=(int)r & 7;// last 3 bits for (int i = 0; i < 14; i++) { register int ry = Get6Bits(i, zeb); rx = (rx << 3) | ry &7; z19[i + 5] = bit6[rx]; rx = ry >>3; } fout1 << z19 << endl;

Code: Select all: //===================== 19 to r+14 6x5+3 14x6-3 uint64_t rank=0; for (int i = 0; i < 5; i++) { register int ry = Get6Bits(i, ze); rank = (rank <<= 6) | ry; } register int rx = Get6Bits(5, ze);// 3 bits rank rank = (rank <<= 3) | (rx >> 3); rx &= 7;// residual bits for the 14 bytes field char zb[15]; zb[14] = 0; for (int i = 6; i < 19; i++) { register int ry = Get6Bits(i, ze); rx = rx | ry&070; zb[i -6] = bit6[rx]; rx = ry &7; } zb[13] = bit6[rx];// last three bits fout1<<rank<<";" << zb << endl;

by **coloin** » Fri Jul 25, 2025 11:26 am

champagne wrote:Hi coloin,
The code doing it is still in dirty mode on my side. I did not see so far enough interest to put it in a DLL...

I was looking at Virtual catalogue DLL but I see now that you were just testing it !

I think your work is commendable and certainly the
puzzle -> solution grid rank plus the 14 digit code
is worth getting working !

Was trying to work out how the 14 digit code works ... does it compute a pattern / mask for the solution grid ?

by **champagne** » Fri Jul 25, 2025 1:56 pm

coloin wrote:I was looking at Virtual catalogue DLL but I see now that you were just testing it !

I did not update the comments, but I use the DLL daily and did not find any bug in it.

coloin wrote:I think your work is commendable and certainly the
puzzle -> solution grid rank plus the 14 digit code
is worth getting working !

I needed one canonical morph, so I use this one. I just (as I wrote) stick to a 10 bytes for the rank plus 14 bytes for the bit field to have a quick "solution grid analysis".

The 19 byte per puzzle could be better for big data bases.
We have also to keep in mind that such files will be "zipped" very efficiently.

coloin wrote:Was trying to work out how the 14 digit code works ... does it compute a pattern / mask for the solution grid ?

Basically, we have a 81 bits field showing cells active in the solution grid.
To print this 81 bits, we group them in 14 "6 bits field".
each bit field can have 64 values and is printed in one character out of the string of 64 characters.
So,
an empty 6 bits field is printed as '0'
and a bit field 111111 is printed as '}'

by **coloin** » Fri Jul 25, 2025 6:36 pm

Thanks for that .... now I understand it as you have explained it above !!!

Code: Select all: 0123456789 ABCDEFGHIJ KLMNOPQRST UVWXYZabcd efghijklmn opqrstuvwx yz()

and each of those characters stands for the 64 binary numbers up to 64 [000000,000001,000011.... up to 111111]
and these are the mask in chunks of six. 14 [14x6=84][13x6 +3 = 81] characters are used to map the mask of 81...

of course this could easily be utilized and does define a particular puzzle's pattern .....

Looking at the command lines .. unless im blinded to it I dont see the command for
puzzle > solution grid rank : 14-digit-code

by **champagne** » Sat Jul 26, 2025 5:03 am

coloin wrote:Looking at the command lines .. unless im blinded to it I dont see the command for
puzzle > solution grid rank : 14-digit-code

As I said this is not a released code. I suggest to continue in private mode if you want to get the current code (sources or .exe)

The New Sudoku Players' Forum

Proposal for a canonical puzzle.

Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Universal 19 bytes name for any sudoku

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.

Re: Proposal for a canonical puzzle.