Proposal for a canonical puzzle.

Programs which generate, solve, and analyze Sudoku puzzles

Proposal for a canonical puzzle.

Postby champagne » Tue Oct 29, 2024 5:23 pm

Having now a first version of a brute force solver in DLL, I had all the tools to make a test on one possible format for a canonical puzzle.

As proposed in this thread,
http://forum.enjoysudoku.com/high-density-files-for-solution-grids-and-18-clues-puzzles-t42669-60.html

I think that a good competitor for this is a puzzle shown as

The rank 1- 5 472 730 538 of the solution grid
A binary field of 81 bit for the active cells of the puzzle.

I made a test on “loki”, the first puzzle known with a tridagon pattern

Code: Select all
57....9..........8.1.........168..4......28.9..2.9416.....2.....6.9.82.4...41.6..
576843921493261578218759436951687342647132859832594167184326795365978214729415683 solution grid


Code: Select all
123456789457189236698372145241893657589627314736514928314968572862735491975241863 min lexical
1.3.5......71.9...69.37......1..3..75.96.73.....51.......96..........4....5...86. min lexical loki


To get the min lexical in this case, we had first to transpose the grid, then
465 978 312 rows reordering
978 645 132 columns reordering
And this solution grid has no auto morph. (expected to be 1/10000 of the solution grids)

The solution grid has the rank 3 636 310 645.
Using this 64 characters equivalence
".123456789abcdefghiklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ&"
The bit field becomes
mx5sxBL13p.84Z

so, in text mode, the canonical “loki” could be
3636310645mx5sxBL13p.84Z

Easy to decompress (some micro seconds per puzzle),
Not too hard to compress (28 milliseconds in my test)
with a sorting sequence per solution grid.

and for sure, 2 puzzles having a different canonical morph are different.
note: in my current code, the auto morphs issue is not covered


I’ll make a test on the 17 clues data base to see what gives the .zip compressor on such a file
but it would be interesting to test the new data base of puzzles having the tridagon pattern.
I am missing the link to download the current status of this data base;
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: Proposal for a canonical puzzle.

Postby coloin » Tue Oct 29, 2024 10:19 pm

Very good.
I was going to say that you dont need to worry about automorphs in the minlex solution grid.
But I see now that two puzzle isomorphs would be different in the minlex solution grid...
Maybe the solution could be that on the rare occasion the solution grid is isomorphic .. you somehow generate all the isomorphic puzzles and take the min lex.

please could you show how the 64 code works to generate the pattern !

presumably this code gives the minlex solution grid and the puzzle
coloin
 
Posts: 2502
Joined: 05 May 2005
Location: Devon

Re: Proposal for a canonical puzzle.

Postby champagne » Wed Oct 30, 2024 3:56 am

Hi coloin

coloin wrote:Very good.
I was going to say that you dont need to worry about automorphs in the minlex solution grid.
But I see now that two puzzle isomorphs would be different in the minlex solution grid...
Maybe the solution could be that on the rare occasion the solution grid is isomorphic .. you somehow generate all the isomorphic puzzles and take the min lex.


Right for all.
The code missing in my current draft is the comparison of 2 bit fields of size 81 to chose the min lexical one.
But even if it is not a frequent case, this must be done "live" in the clearing of redundant puzzles appearing in the scan of a solution grid to extract puzzles of size X.


coloin wrote:Very good.
please could you show how the 64 code works to generate the pattern !
presumably this code gives the minlex solution grid and the puzzle


The code uses all DLLS recently created.
my draft (main actions)

call the brute force solver to get the solution grid
call the minlexing function and get back the minimal sol and the perm descriptors (usually one with a maximum of 648)
Morph the puzzle to the first permutation (not finished do the same with auto morphs and keep the min lexical 81 bits field)
call the virtual catalog to get the rank
Turn all to the output format.
Here my working draft written yesterday

Hidden Text: Show
Code: Select all
const char* bit6 = ".123456789abcdefghiklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ&";

void PuzzleInBitField(char* puz, char* d) {
   // d is 14*6=84 bits in 14 char
   for (int i = 0, k = 0; i < 14; i++) {
      register int v = 0;
      for (int j = 0; j < 6; j++,k++) {
         if (puz[k] != '.')v |= 1 << j;
         d[i] = bit6[v];
      }
   }
}


void C11() {
   cout << " C11_ find canonical for a file of puzzles" << endl;
   char* ze = finput.ze,zem[82], zem2[82]; zem[81]=0; zem2[81] = 0;
   int smin[81];
   ptpgc= SkbsSetModeWithAutos();
   pzhe= SkbfGetZhePointer();
   int* vv = pzhe->gsol;
   if (SkvcatSetModeGetVCDESK(2, &pvcdesc)) {
      cout << " set mode failed" << endl;
      return;
   }
   while (finput.GetLigne()) {
      if (strlen(ze) < 81)continue;// no empty line
      cout << ze << " puzzle to caniconalize" << endl;
      {
         int ir1 = SkbfCheckValidityQuick(ze);
         if (ir1 != 1) {
            cout << " invalid puzzle ir1="<<ir1 << endl;
            continue;
         }

         for (int i = 0; i < 81; i++) cout << vv[i] + 1;
         cout << " solback" << endl;
         //cout << pzhe->gsol << " sol to do min lex" << endl;
         SkbsGetMin(vv, smin);
         for (int i = 0; i < 81; i++) cout << smin[i] + 1;
         cout << " min solback" << endl;

      }
      GRIDPERM wgperm;
      char* pbase = ptpgc->t[0];
      for (int i = 0; i < 19; i++) cout << pbase[i];
      cout << " pbase nperm=" << ptpgc->nt << endl;
      wgperm.Import19(vv, pbase);
      wgperm.MorphPuzzle(ze, zem);
      cout << zem << " puz morphed" << endl;

      uint64_t r = SkvcatGetRankFromSolMin( smin);
      if (!r) {
         cout << " failed to find rank  " << endl;
         continue;
      }
      for (int i = 0; i < 81; i++) cout << smin[i] + 1;
      cout << " solmin rank  " << r << endl;
      char zbit[15]; zbit[14] = 0;
      PuzzleInBitField(zem, zbit);
      cout << zbit << "  bbit pattern for the puzzle" << endl;
      cout << r << ";" << zbit << " proposal for canonical" << endl;
      if (ptpgc->nt > 1) {
         cout << " auto morphs seen, check more" << endl;
         for (int i = 1; i < ptpgc->nt; i++) {// check auto morph get the smallest
            wgperm.Import19(vv, pbase);
            wgperm.MorphPuzzle(ze, zem2);
         }
      }

   }
   cout << "end of file   " << endl;
}


BTW, as the use of such a canonical morph implies a compressor decompressor tool,
a simpler option can be

33 bits for the rank of the solution grid
81 bits for the puzzle

114 bits in total exactly 19 bytes using the text mode 6 bits per byte, saving 5 bytes.
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: Proposal for a canonical puzzle.

Postby champagne » Thu Oct 31, 2024 11:43 am

I an running a first validation test on the file of 4 634 322 sudokus in the t&e 3 "mith"s data base.

a stupid bug in the previous post, one character is missing to cover the 64 possible values of a 6 bit field, but this doesn't change the following results.

757 254 puzzles have been processed in a little more than 3 and half hours
around 57 sudokus per seconds to get the "canonical"
Only 5978 different solution grids are there, giving an average 126 sudokus per solution grid.

I suspected something similar. The full run will take about 30 hours, so the final result is expected to-morrow.

next task is to write the canonical => puzzle code and see what is the average run time per canonical.
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: Proposal for a canonical puzzle.

Postby blue » Thu Oct 31, 2024 11:54 am

champagne wrote:a stupid bug in the previous post, one character is missing to cover the 64 possible values of a 6 bit field, but this doesn't change the following results.

Before you get too far: there were 3 characters missing ... 'j' and 'J' included.
Another thing: this line in your code, is reading 3 bytes of garbage data.
There's a "fix" for it below.

Code: Select all
if (puz[k] != '.')v |= 1 << j;
->
if (k < 81 && puz[k] != '.')v |= 1 << j;
blue
 
Posts: 1052
Joined: 11 March 2013

Re: Proposal for a canonical puzzle.

Postby champagne » Thu Oct 31, 2024 12:29 pm

blue wrote:
champagne wrote:a stupid bug in the previous post, one character is missing to cover the 64 possible values of a 6 bit field, but this doesn't change the following results.

Before you get too far: there were 3 characters missing ... 'j' and 'J' included.
Another thing: this line in your code, is reading 3 bytes of garbage data.
There's a "fix" for it below.

Code: Select all
if (puz[k] != '.')v |= 1 << j;
->
if (k < 81 && puz[k] != '.')v |= 1 << j;

As usual, sharp eyes from blue.
The "6 bits transformer" must have 64 four printable characters. i added the missing "jJ"
and for the second fix, I preferred this
Code: Select all
   for (int j = 0; j < 6; j++,k++) {
         if (puz[k] != '.')v |= 1 << j;
         d[i] = bit6[v];
         if(k==80) break;
      }


thanks for help
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Universal 19 bytes name for any sudoku

Postby champagne » Sat Dec 07, 2024 10:43 am

I thought that this topic would raise more interest.
As I need it for my own use, I finished the task with my personal choices.

The first target of a “canonical name” is to filter redundancy in the sudokus.

The proposal is the following in several steps

A==============
Take any sudoku
B===============
Find the solution grid and move it to the min lexical morph.
Move the puzzle in the same way, and if the grid has auto morphs, take the min lexical expression of the puzzle.
C============
Replace now the solution grid by its catalog number and the puzzle by a 81 bit field.
Express the 81 bit field as a sequence of 6x13+3 bits and uses a table to have it as a 14 printable characters
D=============
As the catalog require 33 bits, express all as a 19 printable characters
5x6 + 3 for the catalog number
3+13x6 for the 81 bits field.


B<=>C<=>D just requires some lines of code if you have the DLLs

A=>B is mainly 3 calls to the DLLs,
but B=>A is requiring the grid morphing parameters from the “given solution grid” to the ED solution grid. This is in my code a minimum of 28 values (transpose, rows map, columns map, digits map)
Most often, users will not use it but will keep the original puzzle attached to the “canonical name” (with usually other parameters such as the rating).

In this case, the “name” acts as the key for a database. The way the key has been built, the sudokus distribution should be well balanced for any “primary index” made of the first 5 characters

The “C” status is interesting as well. It keeps visible the solution grid number.
I describe the process in the next posts
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: Proposal for a canonical puzzle.

Postby champagne » Sat Dec 07, 2024 10:49 am

Equivalence table.
The table is made of 64 printable characters, one for each of the 6 bits pattern. Any sequence could work, but to have a sort of the final key giving the same order as the numeric sort on the rank, the printable characters must have an increasing ASCII value. This was not the case with my first shot. I now use
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz()

Find the ED solution grid and convert the puzzle

With the published DLLs, this is a quite simple code

Hidden Text: Show
Code: Select all
if (SkbfCheckValidityQuick(ze) != 1) {            
   cout << ze  << " invalid puzzle "  << endl;
   continue;
}
SkbsGetMin(vv, smin);
uint64_t r = SkvcatGetRankFromSolMin(smin);
GRIDPERM wgperm;
char* pbase = ptpgc->t[0];
wgperm.Import19(vv, pbase);
wgperm.MorphPuzzle(ze, zem);
for (int i = 1; i < ptpgc->nt; i++) {// check auto morph get the smallest
   wgperm.Import19(vv, ptpgc->t[i]);
   wgperm.MorphPuzzle(ze, zem2);
   register int x = 0;
   for (; x < 81; x++)if (zem[x] != zem2[x])break;
   if (x < 81 && zem2[x] == '.') memcpy(zem, zem2, 81);            
}
char zbit[15]; zbit[14] = 0;
PuzzleInBitField(zem, zbit);
fout1 << r << ";" << zbit  << endl;


First, using the brute force, the puzzle is checked and solved.
Then using the minlexing DLL, the solution is pushed to min and the grid auto morphs are in the return area
The virtual catalog gives the rank,
And using the list of grid permutations (including auto morphs) the lowest morph of the puzzle is worked out.


The last step is to change the lowest puzzle to a 14 bytes printable sequence using the following function.

Code: Select all
void PuzzleInBitField(char* puz, char* d) {
   // d is 14*6=84 bits in 14 char
   for (int i = 0, k = 0; i < 14; i++) {
      register int v = 0;
      for (int j = 0; j < 6; j++,k++) {
         if (puz[k] != '.')v |= 1 << j;
         d[i] = bit6[v];
         if(k==80) break;
      }
   }
}
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: Proposal for a canonical puzzle.

Postby champagne » Sat Dec 07, 2024 10:53 am

Expanding the rank + 14 status

As said, we here restore the min lexical status of the original puzzle.
The process is simple, but requires the virtual catalog DLL;
Here is my code for a given record of the entry file

Hidden Text: Show
Code: Select all
int i = 0;
for (; i < 15; i++)if (ze[i] == ';') break;
if (i > 10) {
   cout<<ze << "not expected format" <<endl;
   continue;
}
ze[i] = 0;
uint64_t rank= strtoll(ze, 0, 10);
char* zeb = &ze[i + 1];
int ll =(int) strlen(zeb);
if (ll != 14) {
   cout << "not rigth length"   << endl;
   continue;
}
//======================================= start expand
char zout[82]; zout[81] = 0;
char zsol[82]; zsol[81] = 0;
//valid to expand
if (SkvcatFinSolForRank(rank) < 0) {
   cout << "rank false" << endl;
}
else memcpy(zsol, pvcdesc->g.b1, 81);
   Expand14to81(zout, zeb, zsol);
fout1 << zout << endl;
continue;


Once the entry checked, we have the rank and the 14 bytes
Using the virtual catalog, the rank is expanded to the min solution grid
Then the 14 bytes are expanded to the puzzle to print.

This is done using a small converter printable => 0-511 value

Code: Select all
int Get6Bits(int i, char* bf14) {
   register int v = bf14[i];
   if (v >= 97 && v <= 123)v -= 61; // a z (
   else if (v >= 65 && v <= 90)v -= 55; // A Z
   else if (v >= 48 && v <= 57)v -= 48; // 0-9
   else if (v == 125) v = 63; // )
   else v = 0; // not valid put to empty
   return v;
}


And the replacement of bits set to 1 by the solution grid value done by this function

Code: Select all
void Expand14to81(char* puz, char* bf14, char * sol) {
   int k = 0;
   for (int i = 0, k = 0; i < 14; i++) {// 6 bits to expand
      register int v = Get6Bits(i,bf14);
      int nd = (i == 13) ? 3 : 6;
      for (int j = 0; j < nd; j++,v>>=1,k++) {
         if (v & 1)puz[k] = sol[k] ;
         else puz[k] = '.';
      }
   }
}
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: Proposal for a canonical puzzle.

Postby champagne » Sat Dec 07, 2024 10:57 am

Moving from rank + 14 to 19 and reverse.

We could think of a direct operation puzzle<-> 19.
As I think that the rank + 14 will be in use, I preferred to see the final “naming” step as a pure binary transformation of the rank+14 image.

Basically, the target is simple.
The rank needs 33 bits in binary mode.
Take 3 bits from the rank, use them to have 81+3 bits = 14 bytes left
The additional constraint added is to do it in such a way that the sorting sequence stays nearly the same in both cases. The highest bits of the rank must then come first.
But any way to do the conversion and reverse would work.

Here is my code

Code: Select all
//======================================== can r+14 =>19
int i = 0;
ze[i] = 0;
uint64_t rank = strtoll(ze, 0, 10);
char* zeb = &ze[i + 1];
char z19[20]; memset(z19, 0, 20);
register uint64_t r = rank, r2;
{
   r2 = (r >> 27) & 0x3f; z19[0] = bit6[r2];//  highest bits
   r2 = (r >> 21) & 0x3f; z19[1] = bit6[r2];
   r2 = (r >> 15) & 0x3f; z19[2] = bit6[r2];
   r2 = (r >> 9) & 0x3f; z19[3] = bit6[r2];
   r2 = (r >> 3) & 0x3f; z19[4] = bit6[r2];
}
register int rx=(int)r & 7;// last 3  bits
for (int i = 0; i < 14; i++) {
   register int ry = Get6Bits(i, zeb);
   rx = (rx << 3) | ry &7;
   z19[i + 5] = bit6[rx];
   rx = ry >>3;
}
fout1 << z19 << endl;



Code: Select all
//===================== 19 to r+14  6x5+3 14x6-3
uint64_t rank=0;
for (int i = 0; i < 5; i++) {
   register int ry = Get6Bits(i, ze);
   rank = (rank <<= 6) | ry;
}
register int rx = Get6Bits(5, ze);// 3 bits rank
rank = (rank <<= 3) | (rx >> 3);
rx &= 7;// residual bits for the 14 bytes field
char zb[15]; zb[14] = 0;
for (int i = 6; i < 19; i++) {
   register int ry = Get6Bits(i, ze);
   rx = rx  | ry&070;
   zb[i -6] = bit6[rx];
   rx = ry &7;
}
zb[13] = bit6[rx];// last three bits
fout1<<rank<<";" << zb << endl;
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany


Return to Software