apologies for the long note
I'm purging before a two week sans electronic device holiday
thanks to posts and pm's from
Havard,
coloin, and
ronk I was able to track
down a few problems with the -go (off/on tour) code in my solver, especially for the order of
magnitude difference in runtime between my solver and
Havard's generation sequence at the top of this thread
define
verifier: a backtracking algorithm that verifies that a puzzle has exactly one solution
it turns out that having a fast verifier may encourage lazy programming
Havard uses a DLX verifier which is slightly slower than the small verifier that I use
the main difference is a higher startup cost for the DLX verifier
(that startup cost ends up being a wash for larger problems, but is significant for 9x9 sudoku)
the hard part of off/on tours is the
on phase where one or more clues are added to pseudo-puzzles
(puzzles with more than one solution) in order to generate valid 1 solution puzzles
a straightforward way to do this would be
- Code: Select all
for i in candidates
set candidate in puzzle
verify that the puzzle has one solution
end
Havard provided a hint that instead of #candidates calls to the verifier he amortized the
DLX startup cost by starting up the verifier once, and set/unset each candidate within the verifier
that insight had little effect on my verifier implementation
(because startup cost is not a significant issue)
but the idea of doing one search instead of #candidates stuck
when checking for multiple solutions the verifier backtracking algorithm is allowed to proceed past the first solution:
either it will find no more solutions, which means a valid puzzle, or it will find another solution
the set of all solutions can be modeled as a tree, with candidate assignments at the nodes
this also models the verifier backtrack search tree
early candidate assignments appear higher up in the tree, close to the root
a branch with multiple solutions will have one or more nodes near the root with the same candidate values,
and then fork at a critical candidate (a candidate required for one solution)
each of the candidates that are the same from the root to the critical candidate cannot be a critical candidate
(why? because they were assigned earlier on the branch and the branch still led to more than one solution)
so all of the candidates from the root to the point where the solutions branch can be pruned from
the critical candidate search as the verifier backtracks
this insight, along with a few verifier tweaks, recovered the order of magnitude time loss
once the verifier was fixed I was able to add a better interface that allows multiple
off/on sequences to be combined into one command
for example,
Havard's sequence can be specified as
- Code: Select all
sudoku -goc{-1+1}x3{-2+1}{-2+2}{-1+1}x9{-2+1} -Ff%2F%8#et%9#an%9#sn jpf-19.dat
(three 1-off/1-on, one 2-off/1-on, one 2-off/2-on, nine 1-off/1-on, and a final 2-off/1-on)
-go
c uses the -Fc format to output puzzles grids (the grid with no constraint method stats)
-Ff... lists the elapsed time, #puzzles generated, and #puzzles searched on the standard error
on a linux 3.2Ghz pentium this reported
- Code: Select all
45m23s 6 26286371
~46 min elapsed time
6 puzzles (the same
Havard generated)
26,286,371 possibly dup (pseudo) puzzles checked
I have a small catalog (~20 puzzles) of 22's generated by -gH (using the hardest sudoku clue placement formula)
with -q1 rating > 9999 and extracted two new 17's (#40726 #40727) using
- Code: Select all
sudoku -go{-2+1}x5 -f'%v # %T' h-22.dat
where each application of {-2+1} drops one clue, so 5 rounds goes from 22 to 17 clues
and the -f format lists the puzzle grid and the time it was found
I then ran this on the two new entries to see if there were any more new ones in the neighborhood
- Code: Select all
sudoku -go{-2+2}x2 -Ff%2F%8#et%9#an%9#sn -f'%v # %T' new-17.dat
it ran for ~20m and only produced the original 2 puzzles -- so its a pretty tight neighborhood
you can use -gon... to show what a sequence would do without executing the sequence
some sequences can go on for hours @2Ghz, and some sequences have the potential
to exhaust memory (dup pruning is done at each off/on phase using splay trees on subgrid canonicalized keys)
one (known) caveat -- a recalcitrant bug sometimes omits a small number of puzzles in combined
off/on sequences that would otherwise be listed if the sequence were done separately; if exact
results are required (no omissions) then use separate commands, each with one off/on sequence and optional xN
solver version 2007-06-28 posted