apologies for the long note

I'm purging before a two week sans electronic device holiday

thanks to posts and pm's from

Havard,

coloin, and

ronk I was able to track

down a few problems with the -go (off/on tour) code in my solver, especially for the order of

magnitude difference in runtime between my solver and

Havard's generation sequence at the top of this thread

define

verifier: a backtracking algorithm that verifies that a puzzle has exactly one solution

it turns out that having a fast verifier may encourage lazy programming

Havard uses a DLX verifier which is slightly slower than the small verifier that I use

the main difference is a higher startup cost for the DLX verifier

(that startup cost ends up being a wash for larger problems, but is significant for 9x9 sudoku)

the hard part of off/on tours is the

on phase where one or more clues are added to pseudo-puzzles

(puzzles with more than one solution) in order to generate valid 1 solution puzzles

a straightforward way to do this would be

- Code: Select all
`for i in candidates`

set candidate in puzzle

verify that the puzzle has one solution

end

Havard provided a hint that instead of #candidates calls to the verifier he amortized the

DLX startup cost by starting up the verifier once, and set/unset each candidate within the verifier

that insight had little effect on my verifier implementation

(because startup cost is not a significant issue)

but the idea of doing one search instead of #candidates stuck

when checking for multiple solutions the verifier backtracking algorithm is allowed to proceed past the first solution:

either it will find no more solutions, which means a valid puzzle, or it will find another solution

the set of all solutions can be modeled as a tree, with candidate assignments at the nodes

this also models the verifier backtrack search tree

early candidate assignments appear higher up in the tree, close to the root

a branch with multiple solutions will have one or more nodes near the root with the same candidate values,

and then fork at a critical candidate (a candidate required for one solution)

each of the candidates that are the same from the root to the critical candidate cannot be a critical candidate

(why? because they were assigned earlier on the branch and the branch still led to more than one solution)

so all of the candidates from the root to the point where the solutions branch can be pruned from

the critical candidate search as the verifier backtracks

this insight, along with a few verifier tweaks, recovered the order of magnitude time loss

once the verifier was fixed I was able to add a better interface that allows multiple

off/on sequences to be combined into one command

for example,

Havard's sequence can be specified as

- Code: Select all
`sudoku -goc{-1+1}x3{-2+1}{-2+2}{-1+1}x9{-2+1} -Ff%2F%8#et%9#an%9#sn jpf-19.dat`

(three 1-off/1-on, one 2-off/1-on, one 2-off/2-on, nine 1-off/1-on, and a final 2-off/1-on)

-go

c uses the -Fc format to output puzzles grids (the grid with no constraint method stats)

-Ff... lists the elapsed time, #puzzles generated, and #puzzles searched on the standard error

on a linux 3.2Ghz pentium this reported

- Code: Select all
` 45m23s 6 26286371`

~46 min elapsed time

6 puzzles (the same

Havard generated)

26,286,371 possibly dup (pseudo) puzzles checked

I have a small catalog (~20 puzzles) of 22's generated by -gH (using the hardest sudoku clue placement formula)

with -q1 rating > 9999 and extracted two new 17's (#40726 #40727) using

- Code: Select all
`sudoku -go{-2+1}x5 -f'%v # %T' h-22.dat`

where each application of {-2+1} drops one clue, so 5 rounds goes from 22 to 17 clues

and the -f format lists the puzzle grid and the time it was found

I then ran this on the two new entries to see if there were any more new ones in the neighborhood

- Code: Select all
`sudoku -go{-2+2}x2 -Ff%2F%8#et%9#an%9#sn -f'%v # %T' new-17.dat`

it ran for ~20m and only produced the original 2 puzzles -- so its a pretty tight neighborhood

you can use -gon... to show what a sequence would do without executing the sequence

some sequences can go on for hours @2Ghz, and some sequences have the potential

to exhaust memory (dup pruning is done at each off/on phase using splay trees on subgrid canonicalized keys)

one (known) caveat -- a recalcitrant bug sometimes omits a small number of puzzles in combined

off/on sequences that would otherwise be listed if the sequence were done separately; if exact

results are required (no omissions) then use separate commands, each with one off/on sequence and optional xN

solver version 2007-06-28 posted