The goal is finding all valid puzzles with a given number of clues within a given solution grid.

Here is the algorithm:

1) Find several thousands of unavoidable sets (UA).

2) Select a leading sequence of cells based on disjoint UA. This is [Issue 1].

3) Remap the grid to the selected sequence.

4) Perform a full enumeration of puzzles using some optimizations by UA.

5) Check for uniqueness all the puzzles which passed the optimization test.

Details.

Enumeartion is performed by placing all the givens on the rightmost positions, then moving them (traversing trough all possible combinations) to the leftmost positions.

Basic (naive) approach example for 5 givens

- Code: Select all
`Possible positions for the leftmost clue`

.............................................................................0000

The rightmost 5 - 1 = 4 cells are reserved for the rest (less significant) clues.

Initial position of the leftmost clue

00000000000000000000000000000000000000000000000000000000000000000000000000001xxxx

Final position of the leftmost clue

1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The positions marked with "x" are available for the rest of the clues.

Start of enumeration

000000000000000000000000000000000000000000000000000000000000000000000000000011111

000000000000000000000000000000000000000000000000000000000000000000000000000101111

000000000000000000000000000000000000000000000000000000000000000000000000000110111

...

111101000000000000000000000000000000000000000000000000000000000000000000000000000

111110000000000000000000000000000000000000000000000000000000000000000000000000000

End of enumeration

Optimization 1

Order the UA by the rightmost cell position and move each clue to hit the rightmost cell of the first not hit UA set.

Example after Optimization 1

- Code: Select all
`Ordered list of the UA not hit`

000001000100010001000001000001000000000000000000000000000000000000000000000000000

000100100010001000100000100110001000000000000000000000000000000000000000000000000

...

Leftmost clue is placed at the rightmost position of the first UA.

000001000100010001000001000001000000000000000000000000000000000000000000000000000 #topmost UA

00000000000000000000000000000000000000000000000000000000000000000000000000001xxxx #non-optimized

000000000000000000000000000001xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx #optimized

Optimization 2

Since the order of the enumeration is independent of the order of the cells in the grid, it is possible to rearrange (map) the cell indexes in some more convenient order.

Currently the cells are ordered by the first of the Maximal Disjoint Sets having minimum combinations.

In other words, a list of Maximal Disjoint Sets is generated (Maximal Cliques). If there is more than one such set, the number of combinations of each set is calculated (product of the number of cells in each UA) and one with minimum combinations is chosen. This is not neccesarily the best choice [Issue 2].

The cells of the smaller UA are placed at right, respectively the largest UA is placed at the leftmost.

A maximal position for each of the clues is calculated, so that at right at this position there are enough clues to hit the disjoined UA sets.

Example after Optimization 2

Let maximal disjoint set consist ot U4+U4+U6+U40, MCN=4, 54 cells occupied.

Remapping is done so the remapped cel positions become in this order:

- Code: Select all
`<...rest of the cells...><U40 cells><U6 cells><U4 cells><U4 cells>`

For clue 1 (rightmost one) the limit is 3 (the valid position range is [0,1,2,3].

For clue 2 the leftmost possible position is 3 + 4 = 7.

For clue 3 the position limit is 7 + 6 = 13.

For clue 4 the limit is 13 + 40 = 53.

For clue 5 there is no limit (= 80).

TheÑ€e are "degrees of freedom" currently not used in this optimization.

It is obvious that for maximum benefit from Optimization 2, as small as possible and as much as possible UA should be placed at right.

Is it better to choose the Maximal Disjoint Set for the mapping, or some non-maximal but with more small sized UA? [Issue 1]

If there are more than one similar Disjoint Set candidates, how to weight them and choose the best one? [Issue 2]

How the UA of equal size to be ordered while determining the mapping? [Issue 3]

How the cells within each UA to be ordered? [Issue 4]

How the remaining cells (these that are not members of the chosen Disjoint Set) to be ordered? [Issue 5]

One way to resolve the issues above is to collect the possible candidates and to rate them by the estimated benefit.

For example it is easy to calculate the benefit of upper limit for each optimized step.

Each skipped due to the optimization 2 position of the rightmost clue is weighted by 1.

The second clue steps are weighted by size of the rightmost (smallest) UA, in our case by 4.

The third's weight is a product of the second and next UA size, in our case 4 * 4 = 16.

Next is 16 * 6 = 96, next is 96 * 40 = ...

I am at a loss in estimating the frequency of occurence of each of these cases.

The frequency of Optimization 2 (left limit reached) depends on Optimization 1 (right limit), which depends on reordering (Optimization 2) and threfore must be averaged.

So, any help in calculating the probabilities how many times in average each of Optimization 2 cases will occur is welcome.

Let we have a list of Disjoint Sets. My question is how to estimate the optimization benefit for each of the members, and choose the best one?

I tried to collect some statistics empirically, and conclude that

a) Optimized steps distribution depends of the grid and the number of the clues.

b) Scanning for less clues is faster but not representative (couldn't be simply scaled for more clues).

c) First few minutes of scanning are not representative of the whole scan.

d) Scanning itself is slow.

I hope the method in general is not stillborn. It gives good results for some of the grids. For example, on a 2.8GHz PC:

- Code: Select all
`439286157176593284258147396381754629795632841624918735542371968867429513913865472 #MCN14 with 17`

Checked for 17, generated 274 puzzles, found 1 valid.

Iterations done in 256.454 seconds

123456789456789123798132546237915468864273915915648237342567891581394672679821354 #MCN10 with 9 17s

Checked for 17, generated 921280 puzzles, found 9 valid.

Iterations done in 94514.610 seconds (=26h15')

123456789456789123789123456231564897564897231897231564312645978645978312978312645 #Most Canonical

Checked for 17, generated 0 puzzles, found 0 valid.

Iterations done in 1.640 seconds (+ ~15 minutes in enumerating the 495396 maximal cliques)

123456789457893612986217354274538196531964827698721435342685971715349268869172543 #Max Minlex Grid

Checked for 17, generated 0 puzzles, found 0 valid.

Iterations done in 182.563 seconds

123456789456789123789123465238591647617348952945267318374815296591632874862974531 #MCN 12 with 39

Checked for 17, generated 1910 puzzles, found 0 valid.

Iterations done in 12359.031 seconds (=3h26')

123456789457189326689327154231645897745891632896732541318264975574918263962573418 #Pt MCN15

Checked for 20, generated 20234 puzzles, found 6 valid.

Iterations done in 500.563 seconds

594231786786945132123768954965173248378492561241856397432619875619587423857324619 #MCN12 with 20 17s

Checked for 17, generated 11784 puzzles, found 20 valid.

Iterations done in 1420.218 seconds (another 2 attempts with different UA collections gave 1003" and 1825")

Thanks,

MD