Here are results for this dataset across a range of solvers using tdoku's benchmark program:
- Code: Select all
|quasi_uniform_834.fmt | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:|------------:|-----------:|---------------:|
|minisat_minimal S/s...+./m+| 366.5 | 2,728.5 | 0.0% | 309.62 |
|minisat_natural S/s...+./m+| 381.1 | 2,624.1 | 0.0% | 285.63 |
|minisat_complete S/sh..+./m+| 1,904.5 | 525.1 | 3.1% | 37.38 |
|minisat_augmented S/shrc+./m+| 1,458.1 | 685.8 | 5.7% | 29.34 |
|_tdev_dpll_triad_scc_ih S/shrc++/m+| 2,019.9 | 495.1 | 23.6% | 5.85 |
|lhl_sudoku G/s...../m.| 6,685.6 | 149.6 | 0.0% | 268.60 |
|_tdev_basic_heuristic G/s...../m.| 10,557.5 | 94.7 | 0.0% | 292.49 |
|zerodoku G/sh..../m.| 14,659.2 | 68.2 | 3.3% | 27.33 |
|fast_solv_9r2 E/sh..../m.| 10,840.2 | 92.2 | 3.3% | 26.04 |
|kudoku E/sh..../m.| 11,579.6 | 86.4 | 3.3% | 22.70 |
|norvig C/sh..../m.| 6,028.4 | 165.9 | 3.4% | 27.57 |
|bb_sudoku C/shrc../m.| 27,794.9 | 36.0 | 5.6% | 36.77 |
|fsss C/shrc../m.| 32,108.2 | 31.1 | 5.6% | 19.66 |
|jsolve C/shrc../m.| 33,215.5 | 30.1 | 5.6% | 18.38 |
|fsss2_locked D/shrc../m.| 36,768.0 | 27.2 | 5.6% | 19.34 |
|fsss2 D/sh..../m.| 54,460.8 | 18.4 | 3.3% | 23.22 |
|jczsolve B/shr.../m.| 57,504.4 | 17.4 | 4.3% | 26.38 |
|sk_bforce2 B/shrc-./m+| 58,558.6 | 17.1 | 4.9% | 21.63 |
|rust_sudoku B/shr.../m.| 69,744.9 | 14.3 | 4.3% | 25.58 |
|tdoku T/shrc+./m+| 88,578.9 | 11.3 | 8.5% | 13.03 |
The cryptic code following each solver's name is to help group solvers by their similarities and differences. It indicates the primary representation (SAT, Groups, Exact Cover, Cells, Digits, Bands, or Tdoku's thing), as well as supported inference (singles, hidden singles, locked candidates, + for something extra) and supported heuristics (generally min-remaining-values, or + for something extra).
The benchmark program does a couple of things to try to make comparisons as useful as possible:
- makes sure every solver is operating in the same mode (count up to 2 solutions)
- builds an in-memory dataset before benchmarking so I/O is out of the picture
- randomly permutes the puzzles so benchmarking is not biased by presentation (e.g., as would occur if using a minlex dataset to benchmark a solver whose heuristics were tuned on minlex datasets).
- runs for a while before benchmarking to warm cpu caches and branch predictors
- runs for an integral number of passes over the dataset, and at least for a few seconds. long enough to minimize timing variance, and without bias in which puzzles were solved for slow solvers.
Of course, if you really want to compare solvers it's best to give each one the benefit of whatever compiler produces the fastest code for it. I didn't do that here. These results are clang-11 on an i7-1065G7 (a platform that's pretty favorable for tdoku relatively speaking).
I also used to turn off ALSR and pin to a single core that's otherwise excluded from scheduling, which I didn't do here either, those are aimed more at reducing variance than bias.