The New Sudoku Players' Forum

by **Mathimagics** » Tue Feb 27, 2018 12:18 pm

coloin wrote:a 10 clue would be particularly sparse

{311111110} or {221111110}

That's a good point ... and worth thinking about ...

by **Mathimagics** » Tue Feb 27, 2018 2:24 pm

Time to quickly review the errors and omissions to date!

Mathimagics wrote:McGuire & friends had to deal with a staggering number of Sudoku grids. Not just the 5,472,730,538 essentially different Sudoku grids, but for each of these we have 1296 isotopes which give potentially different puzzle forms. So roughly 7 trillion grids needed to be tested by them.

This appears to be incorrect. In Section 5.2 they do talk about checking the isotopes of each essentially different Sudoku grid, but in the context of finding UA sets. Earlier, in section 4.1, they make it clear that isotopes of a Sudoku grid are "clue-equivalent", that is to say if a grid has a 16-clue puzzle, then all of its isotopes (the grids that are isomorphic to this one) have 16-clue puzzles also.

(One can't help the impression, though, that checking all of a grid's isotopes to see if any match a UA "blueprint" (template) the way they describe it seems like a shocking waste of time, when the UA's for any grid are so easily obtained.)

Mathimagics wrote:in all probability, we can eliminate up to 95% of the grids

This refers to the process of UA-checking (Phase I) without doing clue-set generation, and I really stuffed up there. I had a bug which caused this procedure to include ALL Sudoku UA's, not just the valid SudokuP UA's. So the tables I gave are complete rubbish, and a more typical distribution of minimum clues is:

10 or less: 45%

11 clues: 35%

12 or more: 20%

So far less encouraging than first thought, but we do expect that a "red shift" will apply - these min clue figures refer only to the max clique size observed. It will rarely, if ever, be the case that a clique contains a Hitting Set, ie hits every UA set.

In fact the clique-checking process can be smarter. Eg: say we find a clique of size 9, then we check the UA's not included in this clique, if any are size 3 or more then the Hitting Set must have size 12 or more, so min clues is at least 12.

by **blue** » Wed Feb 28, 2018 4:17 pm

Mathimagics wrote:Hi coloin,

Thanks for those links.

I think it was Gordon Royle who posted the 11-clue examples. I can no longer find that post but here they are:

Code: Select all
1.2........3..............4.4..5.....6..............1..7...........8..........7.. 1.2........3..............4.4..5.....6..7...........2..8......................8.. 1.2........3..4...........5.5..6.....7..............1..8......................8.. 1.2........3..4...........5.5..6.....7..............2..8......................8.. 1.2......3..4...........5...6..7.....7..............2..8......................8.. 1.2..3....................4.4..5.....6..............1..7...........8..........7.. 1.2..3....................4.4..5.....6..7...........2..8......................8.. 1.2..3........4...........5.5..6.....7..............1..8......................8.. 1.2..3........4...........5.5..6.....7..............2..8......................8.. 1.23........4...........5...6..7.....7..............2..8......................8..

They were posted by Ocean, here -- Ocean's eleven's !

It was by following a link in a post near the top of that thread, that I found Dan Hoey's post (referenced here).

Ocean's comment -- "... all with eleven clues (not extensively checked for isomorphism)" -- is interesting.
The last 5, in sequence, can be produced by applying the "F" transformation, to the first 5 !

by **blue** » Wed Feb 28, 2018 4:31 pm

Mathimagics wrote:For SudokuP, we have a very much smaller number of different grids to check, just 214,038,113 (see here for details). This reduces the problem by a factor of roughly 33,000.

It's only 53,666,689, of course.
The transformations in the {D,F} group, map valid/"single solution" puzzles to valid/"single solution" puzzles.

by **blue** » Wed Feb 28, 2018 4:38 pm

coloin wrote:and a further thought....

presumably the clue frequency count in a 11-clue would have to be
{411111110} or {321111110} or {222111110}
all the above look like {222111110}
a 10 clue would be particularly sparse
{311111110} or {221111110}

With such a small clue count, you might be right.
However ... the usual "facts concerning Sudoku", don't apply here.
(Either my solver is buggy, or) these are valid SudokuP puzzles:

Code: Select all: +-----+-----+-----+ | ... | ... | ... | | ... | ... | ... | | ... | ... | ... | +-----+-----+-----+ | ... | ... | ... | | ... | 728 | 469 | | ... | 631 | 587 | +-----+-----+-----+ | ... | ... | ... | | ... | 215 | 643 | | ... | 469 | 278 | +-----+-----+-----+ +-----+-----+-----+ | ... | 876 | 954 | | ... | 921 | 863 | | ... | ... | ... | +-----+-----+-----+ | 461 | ... | ... | | 389 | ... | ... | | ... | ... | ... | +-----+-----+-----+ | 215 | ... | ... | | 674 | ... | ... | | ... | ... | ... | +-----+-----+-----+

by **coloin** » Wed Feb 28, 2018 7:23 pm

Well ... i was worried that that might be the case
but there might be an easier way - but probably with thse valid puzzles it will make an exhaustive search somewhat difficult !

take the 6-clue transversal in one of the 11-clue puzzles

Code: Select all: *-----------* |1.2|...|...| |..3|...|...| |...|...|..4| |---+---+---| |.4.|.5.|...| |.6.|.7.|...| |...|...|.2.| |---+---+---| |.8.|...|...| |...|...|...| |...|...|8..| *-----------* *-----------* |1..|...|...| |...|...|...| |...|...|..4| |---+---+---| |...|.5.|...| |.6.|...|...| |...|...|.2.| |---+---+---| |...|...|...| |...|...|...| |...|...|8..| *-----------* add 5 more specific clues will get you the 11 clue

certainly easier to test to get 10-clues or not
1x 9-transversal plus 1 clue
1x 8-transversal plus 2 clues
2x 7-transversal plus 3 clues

the ? 4x 6-transversals plus 4 clues - might take longer - but excluding higher transversals in the potential final 10-puzzle might be an option

The existance of those puzzles mith max transversal of 4 is certainly a spanner in the works .....

by **Mathimagics** » Wed Feb 28, 2018 7:42 pm

Hi blue,

Nothing wrong with your solver, they are indeed valid SudokuP puzzles ...

Thanks for that reminder about PF-equivalence, that reduces the job by a factor of 4! 8-)

by **Mathimagics** » Fri Mar 02, 2018 10:47 am

.
Thinking about the {MC} case above, and blue's observations, can't we in fact reduce the number of grids that need to be tested here to 30,603,892?

That is, any two SudokuP grids that are in the same S-equivalence class will be clue-equivalent: puzzles for one grid are valid for the other, if transformed in the same way as the solution grids.

This argument applies equally to F-equivalence, so we need only look for 10-clue puzzles among the SF-different grids.

(See SudokuP Analysis for definitions of F-equivalence, S-equivalence, etc).

Furthermore, should a grid under consideration have a low UA count, then we can check its "brothers" in the same S-equivalence class and choose the one that has the most valid UA's.

So for example, if our {MC} grid representative in our test set has the unfortunate property of having 0 valid UA's, we simply choose the maximal case, ie the one with 378 UA's.

Does blue agree with this proposition?

by **Mathimagics** » Fri Mar 02, 2018 5:16 pm

.
Ok, we are off and running (again) with Phase 1. We have created a catalog of the 30,603,892 SF-different SudokuP grids, selecting in each case the representative yielding the maximum possible UA count.

Phase 1 just tries to eliminate as many grids as possible as cheaply as possible. No Hitting Sets are enumerated here, we just look at the UA sets, looking for a sizeable clique (a set of mutually disjoint UA's). There is an iteration limit so we don't waste too much time. The largest clique found gives us an initial lower bound for N, the minimum clue requirement.

It's possible to "bump" this value in most cases by 1, 2 or 3. We look to see there is any UA not included in the clique but disjoint wrt all UA's in the clique. If so we can add 1 to N, and then look for pairs of UA's. If pairs are found we add another 1 to N and check for triplets. These checks don't cost much extra but give us a bump in most cases.

This process takes 3-4 minutes for 10,000 grids. Running 4 copies in parallel means we should have completion in around 48 hours. Initial indications are that it will eliminate somewhere between 50% and 65% of cases from 11-clue consideration (N > 11), and 80% from 10-clue consideration (N > 10). That saves us a heck of lot of time in Phase 2, where we start enumerating Hitting Sets (clues that actually hit every UA) and that is very much more time-consuming.

by **blue** » Fri Mar 02, 2018 5:40 pm

Mathimagics wrote:Does blue agree with this proposition?

No.

[ Your last post, came while I was preparing this one. Sorry I wasn't quicker on the reply. ]

The kind of claim that you would be looking to justify, would run like this:

... where "(?) is a valid puzzle for (??)" means that (?) is a puzzle having (??) as its only solution.

In that, if P was a valid puzzle for X, then the Q that we would expect to find, could only be Q = (T')*P, where T' is the inverse of T.
[ Right (? ...): (T*Q = P) => (Q = (T')*P) ]

That Q -- Q = (T')*P -- would have Z as one of it's solutions, since Z is (T')*X, and it's a valid grid, and a "superset" of (T')*P.
Usually, we would argue that Z is the only solution to (T')*P, on the grounds that for any (valid) solution, W, to (T')*P we have:

valid grid

If that's a valid argument, then given the "arbitrariness" of W, we could properly conclude that Z is the only solution to Q := (T')*P, and that the set of T*Q's would, indeed, include P.

The argument doesn't carry through, if T can map (some) valid grids to invalid grids.

For a concrete counterexample to the potential claims, you can refer back to this post.
[ P would be the puzzle on the right, and X its solution. Q would be the puzzle on the left, with two solutions. ]

by **Mathimagics** » Fri Mar 02, 2018 7:06 pm

Hi blue,

You have me bang to rights, I must confess to yet another blunder ...

We'll have to restart Phase 1 (for the third time), using the PF-different data set, ie 53,666,689 grids

Thanks for pointing that out, and the supporting analysis!

by **Mathimagics** » Sat Mar 03, 2018 6:20 pm

.
Ok, just one minor problem, we don't actually have a catalog of PF-different SudokuP grids! 8-)

We only know the number, ie 53,666,689. But this was produced via a Burnside's Lemma calculation using invariants and conjugacy classes.

So to produce them directly, we need to look at the 214,038,113 P-different grids (fortunately we do have that), and identify which of them are PF-different. Here we run into the Canonical Form problem - we do not have an easy way of testing for P-equivalence of SudokuP grids. We do have a simple S-equivalence test, ie comparing Minlex forms, but that doesn't help here.

For each X that is a member of the P-different set, we need to link X somehow with F(X), E(X) and G(x), where E, F and G are the F-transformations. The CF problem is identifying which of the P-different catalog entries does F(X) etc actually correspond to.

I'm running a job which uses brute-force to do the equivalence testing - our catalog of {X} is in normalised form (r1 = 123456789), so we enumerate all 2592 P-transformations of F(X), normalise the result, then search our table for a match. Ditto for G(X) and E(X).

This works, but is very slow. I think it will take around 3 days to complete. Fortunately it only has be done once (assuming I haven't stuffed things up again)!

by **blue** » Sat Mar 03, 2018 9:38 pm

Mathimagics wrote:The CF problem is identifying which of the P-different catalog entries does F(X) etc actually correspond to.

You bypass that issue, by doing this:

Loop over the 214,038,113 inputs, and for each input, X ...
Produce its row-minlex (or similar) P-canonical form, Y.
Do the same thing for F*Y (or F*X).
If the result is "minlex smaller" than Y, move on to the next X.
Do it again, for G*Y (or G*X).
If the result is "minlex smaller" than Y, move on to the next X.
Do it again, for G*(F*Y) (or G*(F*X)) [ or for E*Y or E*X ].
If the result is "minlex smaller" than Y, move on to the next X.
Output Y.

P.S.: You can speed things along, by checking intermediate results against Y, in the 2nd,3rd & 4th canonicalization loops.
If an intermediate result is strictly smaller than Y, you don't need to finish the job -- just move on to the next X.

by **blue** » Sat Mar 03, 2018 10:20 pm

For anyone that's interested, this is C++ code, to produce the row-minlex canonical form for a SudokuP grid.
The routine takes a optional parameter that allows selecting the P-canonical form, over the default FP-canonical form.

It uses "unsigned char" arrays, for grids.
The array values should be numbers 1-9 [ rather than ASCII characters '1'-'9' ]

For Mathimagics: You can modify the routine to do the equivalent of what's in the post above, my having it return true or false, with "true" meaning that the result is valid and should be kept (or ouput to disk). Have it return 'true' if it makes it to the end of the routine, and at the "keep:" label, have it return false if "(n >= 2)".

Hidden Text: Show

Code: Select all: enum { TrCount = 8, }; int transforms[TrCount][81]; void init_transforms() { for (int rb = 0; rb < 3; rb++) for (int rp = 0; rp < 3; rp++) for (int cb = 0; cb < 3; cb++) for (int cp = 0; cp < 3; cp++) { transforms[0][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * rb + rp) + (3 * cb + cp); // r' = r , c' = c , b' = b , p' = p e transforms[1][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * cb + cp) + (3 * rb + rp); // r' = c , c' = r , b' = b*, p' = p* D transforms[2][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * rp + rb) + (3 * cp + cb); // r' = r*, c' = c*, b' = p , p' = b E transforms[3][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * rb + cb) + (3 * rp + cp); // r' = b , c' = p , b' = r , p' = c F transforms[4][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * cp + rp) + (3 * cb + rb); // r' = p*, c' = b*, b' = c*, p' = r* G transforms[5][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * cp + cb) + (3 * rp + rb); // r' = c*, c' = r*, b' = p*, p' = b* D*E transforms[6][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * cb + rb) + (3 * cp + rp); // r' = b*, c' = p*, b' = c , p' = r D*F transforms[7][9 * (3 * rb + rp) + (3 * cb + cp)] = 9 * (3 * rp + cp) + (3 * rb + cb); // r' = p , c' = b , b' = r*, p' = c* D*G } } const int p3[6][3] = { { 0, 1, 2, }, { 2, 0, 1, }, { 1, 2, 0, }, { 2, 1, 0 }, { 0, 2, 1, }, { 1, 0, 2, } }; int sudokup_canonicalize(unsigned char *dst, const unsigned char *src, bool use_restricted_subgroup = false) { // SudokuP (row-minlex) canonicalization routine (using p3[6][3] and transforms[TrCount][81]) if (!transforms[0][1]) init_transforms(); unsigned char best[81]; best[9] = 10; int automorphisms = 0; int tr_cnt = use_restricted_subgroup ? 2 : TrCount; for (int n = 0; n < tr_cnt; n++) { unsigned char ts[81]; { unsigned char *t = transforms[n]; for (int i = 0; i < 81; i++) ts[i] = src[t[i]]; } // choose a row to map to row 1 for (int rb = 0; rb < 3; rb++) for (int rp = 0; rp < 3; rp++) { int r1 = 3 * rb + rp; // make initial choices for other rows int r2 = (rp < 2) ? r1 + 1 : r1 - 2; int r3 = rp ? r1 - 1 : r1 + 2; int r4 = (r1 < 6) ? r1 + 3 : r1 - 6; int r7 = (r1 >= 3) ? (r1 - 3) : r1 + 6; int r_order[9]; r_order[0] = r1; // choose a column order for row 1 for (int pcb = 0; pcb < 6; pcb++) for (int pcp = 0; pcp < 6; pcp++) { int c_order[9]; unsigned char map[10]; for (int cb = 0; cb < 3; cb++) for (int cp = 0; cp < 3; cp++) c_order[3 * cb + cp] = 3 * p3[pcb][cb] + p3[pcp][cp]; // build a symbol map, mapping the final r1, to "123456789" for (int c = 0; c < 9; c++) map[ts[9 * r1 + c_order[c]]] = c + 1; int c1 = c_order[0]; // swap the final rows 2&3, as necessary, to force r2c1 < r3c1 if (map[ts[9 * r2 + c1]] < map[ts[9 * r3 + c1]]) { r_order[1] = r2; r_order[2] = r3; } else { r_order[1] = r3; r_order[2] = r2; } // swap the final bands 2&3, as necessary, to force r4c1 < r7c1 if (map[ts[9 * r4 + c1]] < map[ts[9 * r7 + c1]]) { r_order[3] = r4; r_order[6] = r7; } else { r_order[3] = r7; r_order[6] = r4; } r_order[4] = r_order[1] + (r_order[3] - r_order[0]); r_order[5] = r_order[2] + (r_order[3] - r_order[0]); r_order[7] = r_order[1] + (r_order[6] - r_order[0]); r_order[8] = r_order[2] + (r_order[6] - r_order[0]); for (int r = 1; r < 8; r++) for (int c = 0; c < 8; c++) { if (map[ts[9 * r_order[r] + c_order[c]]] > best[9 * r + c]) goto next; // result will be strictly worse (for "minlex" comparison) if (map[ts[9 * r_order[r] + c_order[c]]] < best[9 * r + c]) goto keep; // result will be strictly better (for "minlex" comparison) } ++automorphisms; goto next; // result will exactly match best case (in "minlex" comparison) keep: // update "best case" for (int r = 0; r < 9; r++) for (int c = 0; c < 9; c++) best[9 * r + c] = map[ts[9 * r_order[r] + c_order[c]]]; automorphisms = 1; next: ; } } } memcpy(dst, best, sizeof(best)); return automorphisms; }

by **Mathimagics** » Sun Mar 04, 2018 12:51 am

.
Thank you blue! That is a significant contribution indeed ...

I have to call this function from standard C, but that shouldn't be too much drama ...

The New Sudoku Players' Forum

SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

SudokuP - Min Clue Project - Errors

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

SudokuP MC Project - Phase 1

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project

Re: SudokuP - Min Clue Project