Hi,
Afmob!
Afmob wrote:Serg, could you please explain your algorithm? The number of essentially puzzle is far too large (> 64,7 billion) for checking all ED puzzles of each pattern in just 5 CPU hours.
The general idea of my method is borrowed from
blue's exhaustive search for Fractal Pattern done in May, 2010 (many thanks to
blue!). He published his method's description at setbb.com/sudoku forum (see thread
Search for maximal pattern, containing both diagonals).
blue treated his method as very special, being suited for specific patterns only. But I treat now his method of exhaustive seach as universal and very efficient. I successfully used it several times for exhaustive search exploring different patterns. It works well! Certainly I added some techniques to this method to make it more universal and efficient.
Assume we are doing exhaustive search for valid puzzles exploring given pattern. A possible approach to this search is to check out all essentially different solution grids and select those grids that produce puzzles having unique solution for given pattern. It is not necessary to loop over all 5e9 e-d solution grids and check all isomorphs of given patterns for each solution grid. We can
decompose overall search, looping over possible configuirations of alone bands, for example, and considering all combinations of these bands. At the first glance this approach requires the same volume of work as looping over all e-d solution grids, but, but, but... The main idea of the method discussed is decomposition of the grid to several parts, decomposition of the pattern to the same parts, store parts fillings variants (subset of solutions grids) having no unhit UA sets (all UA sets must be hit by this parts clues). At this stage many parts configurations having unhit UA sets are filtered out. So, when we'll combine that parts we'll forced to consider substaintially less variants comparing with looping over all possible solution grids.
For example, let's consider exhaustive search for valid puzzles for map
- Code: Select all
1 1 1
1 1 1
9 9 9
First, we should search all valid configurations of the top band B123. It can have 2 possible e-d patterns only (configurations with 2 empty rows were rejected):
- Code: Select all
A1 A2
+-----+-----+-----+ +-----+-----+-----+
|x . .|x . .|. . .| |x . .|. . .|. . .|
|. . .|. . .|x . .| |. . .|x . .|. . .|
|. . .|. . .|. . .| |. . .|. . .|x . .|
+-----+-----+-----+ +-----+-----+-----+
I checked 653184 configurations for A1 pattern and found 163312 non-isomorphic configurations (I accounted for not all possible automorhisms). 310 configurations only have no unhit UA sets. So, I stored these 310 variants of band's filling for further usage.
I checked 653184 configurations for A2 pattern and found 55228 non-isomorphic configurations (I accounted for not all possible automorhisms). 168 configurations only have no unhit UA sets. So, I stored these 168 variants of band's filling for further usage.
Next, I found all valid configurations of the left (sub)stack B1+B4. It can have 2 possible e-d patterns:
- Code: Select all
C1 C2
+-----+-----+ +-----+-----+
|x . .|x . .| |x . .|. . .|
|. . .|. . .| |. . .|x . .|
|. . .|. . .| |. . .|. . .|
+-----+-----+ +-----+-----+
I checked 6048 configurations for C1 pattern and found 1516 non-isomorphic configurations. 888 configurations only have no unhit UA sets.
I checked 6048 configurations for C2 pattern and found 1582 non-isomorphic configurations. 946 configurations only have no unhit UA sets.
Next, I checked out all valid configurations of the combination: top band B123 plus left (sub)stack B1+B4. It can have 6 possible e-d patterns:
- Code: Select all
D1 D2 D3 D4
+-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+
|x . .|x . .|. . .| |x . .|x . .|. . .| |x . .|. . .|. . .| |x . .|. . .|. . .|
|. . .|. . .|x . .| |. . .|. . .|x . .| |. . .|x . .|x . .| |. . .|x . .|x . .|
|. . .|. . .|. . .| |. . .|. . .|. . .| |. . .|. . .|. . .| |. . .|. . .|. . .|
+-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+ +-----+-----+-----+
|x . .| |. x .| |x . .| |. x .|
|. . .| |. . .| |. . .| |. . .|
|. . .| |. . .| |. . .| |. . .|
+-----+ +-----+ +-----+ +-----+
D5 D6
+-----+-----+-----+ +-----+-----+-----+
|x . .|. . .|. . .| |x . .|. . .|. . .|
|. . .|x . .|. . .| |. . .|x . .|. . .|
|. . .|. . .|x . .| |. . .|. . .|x . .|
+-----+-----+-----+ +-----+-----+-----+
|x . .| |. x .|
|. . .| |. . .|
|. . .| |. . .|
+-----+ +-----+
At this stage I got 153364 variants for D1 pattern, 350788 variants for D2 pattern, 49848 variants for D3 pattern, 113896 variants for D4 pattern, 112610 variants for D5 pattern, 241210 variants for D6 pattern. Because I used
valid variants only of band B123 and substack B1+B4, I avoided checking out huge amount of invalid combinations of band and substack. This is central idea of the method. We can also search for all valid variants of band and substack "from scratch", filling boxes B1-B4 by all possible digits combinations. But we would forced to check out much more variants in that case in comparison with combining "pre-cleaned" (from unhit UA sets) band and substack. So, using "pre-cleaned" band and substack allows us to check out only configurations having unhit UA sets spanning all 4 boxes or boxes B1, B2 and B4, but we don't expect see UA sets spanning 2 boxes only or boxes B1-B3.
At this stage I use a trick speeding up unhit UA check several times. I call this trick as "UA caching".
Suppose we found unhit UA set for some combination of band B123 and substack B1+B4. If we will check a combination of the
same band and another substack we can test only cells of the box B4 beloning to previously found UA set to determine - do that cells contain UA set digits?
For example, assume we found this unhit UA set (U8):
- Code: Select all
+-----+-----+-----+
|1 . .|. 7 .|5 . .|
|. 5 .|. 1 .|7 . .|
|. . .|. . .|. . .|
+-----+-----+-----+
|. . .|
|5 1 .|
|. . .|
+-----+
This implies that during checking out combinataion of the same band and another substack we can simply check cells r5c1 and r5c2 - do r5c1 contains "5" and r5c2 contains "1"? If so - the combination definitely has unhit UA set. So, we should keep "UA cache", containing some patterns of the box B4 cells, during checking out combinations of band and substack having the given band. This trick speeds up the search 2-5 times (or more, I don't remember precise ratio).
Next, we should combine 4-boxes comfiguration (B1+B2+B3+B4) and band B456. It turns out that there are no valid 6-boxes configurations having 1 clue in each box.
Some words about durability of the search stages described above.
1. Finding all valid bands took me 3 minutes of CPU time.
2. Finding all valid 2-boxes subbands took me 1 second of CPU time.
3. Finding all valid B1+B2+B3+B4 configurations took me 6 minutes of CPU time.
4. Finding all valid B1+B2+B3+B4+B5+B6 configurations took me 16804 seconds (4.7 hours) of CPU time.
Serg
[Edited: I deleted wrong statement "If I would use intermediate stage "Finding all valid B1+B2+B3+B4+B5 configurations", I would take about 15 minutes per all stages because there are 20 valid B1+B2+B3+B4+B5 one-clue-box configurations only". Really it is optimal to combine full band (B258) with B1+B2+B3+B4 configuration, as it was described in this post.]