Benchmark data set

Programs which generate, solve, and analyze Sudoku puzzles

Benchmark data set

Postby m_b_metcalf » Wed Jun 22, 2022 12:22 pm

When comparing different solvers, or timing individual ones, there is often some discussion about the suitability of the data set used. This is most noticeable when the 17.txt file is used, as many think it's a hard test for solvers whereas, in fact, the puzzles are mostly very easy and none is very difficult.

In an attempt to pour more oil on the flames, I attach a new file of 834 puzzles. They are chosen at random from all the Patterns Games, such that all ER values are present 10 times, where possible. Thus, most ER ratings are represented by 10 puzzles each, but some rarer ones by fewer (the rarer ones being concentrated towards the end of the file).

Any solver should be able to solve all of these puzzles; the 3-micosecond solver in 3ms (more if uniqueness is not assumed).

Since the puzzles are from the Patterns Game, they are all symmetric and minimal, and have between 19 and 29 clues. They explore nearly every corner of sudoku space.

FWIW.

Mike
Attachments
quasi_uniform_834.txt
random selection
(90.4 KiB) Downloaded 119 times
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13631
Joined: 15 May 2006
Location: Berlin

Re: Benchmark data set

Postby Hajime » Wed Jun 22, 2022 3:58 pm

Total time: 00:00:02.8843780 seconds for all 834 puzzles (including uniqueness check), with my new dlx solver.
Is that a factor 1000 to slow?
m_b_metcalf wrote: ... to solve all of these puzzles; the 3-micosecond solver in 3ms...
User avatar
Hajime
 
Posts: 1377
Joined: 20 April 2018
Location: Fryslân

Re: Benchmark data set

Postby m_b_metcalf » Thu Jun 23, 2022 6:52 am

Hajime wrote:Total time: 00:00:02.8843780 seconds for all 834 puzzles (including uniqueness check), with my new dlx solver.
Is that a factor 1000 to slow?

Not if it suits your purpose. Most time spent using solvers is in checking whether a candidate puzzle has a single solution, so the important time is that to find a second solution, if present (which it isn't in this file). My singles+brute force solver gets to 1st. solutions in 1.7s for this data set, longer if uniquness not assumed.

Maybe someone who has the 3-microsecond solver could run it on this file?

Regards,

Mike
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13631
Joined: 15 May 2006
Location: Berlin

Re: Benchmark data set

Postby dobrichev » Thu Jun 23, 2022 7:54 am

Up to the second solution.

Code: Select all
cat /proc/cpuinfo
model name: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz

gridchecker --solve < quasi_uniform_834.txt > /dev/null
Total time 0.046 seconds.

fsss2 < quasi_uniform_834.txt > /dev/null
834+0+0 puzzles in 0.035 seconds.

PencilmarkSudoku --solve --vanilla < quasi_uniform_834.txt > /dev/null
Total time 0.031 seconds.
dobrichev
2016 Supporter
 
Posts: 1863
Joined: 24 May 2010

Re: Benchmark data set

Postby Mathimagics » Thu Jun 23, 2022 9:48 am

dobrichev wrote:fsss2 < quasi_uniform_834.txt > /dev/null
834+0+0 puzzles in 0.035 seconds.

A gentle reminder to us all, that Mladen's FSSS2 solver is PDQ! 8-)
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Benchmark data set

Postby blue » Thu Jun 23, 2022 11:58 am

I/O time must be influencing Mladen's timings, too.
His i5-6600 has a turbo-boost speed of 3.9GHz.

If I do this:
  1. limit my processor speed to 3.9GHz
  2. load the puzzles into RAM
  3. do 100 reps through the list, and
  4. divide solving time by 100
I get 0.0170 seconds, or (0.0161 seconds) to process the list once … depending on whether I do (or don't) ask for the solution to be recorded.

For Hajime : I have a "slightly smart" DLX solver, that takes 0.086s (@ 3.9GHz) to do the list.
If I dumb it down, it still only takes 0.130s.

Cheers.
blue
 
Posts: 1046
Joined: 11 March 2013

Re: Benchmark data set

Postby tdillon » Thu Jun 23, 2022 7:15 pm

Here are results for this dataset across a range of solvers using tdoku's benchmark program:

Code: Select all
|quasi_uniform_834.fmt                 |  puzzles/sec|  usec/puzzle|   %no_guess|  guesses/puzzle|
|--------------------------------------|------------:|------------:|-----------:|---------------:|
|minisat_minimal            S/s...+./m+|       366.5 |     2,728.5 |       0.0% |         309.62 |
|minisat_natural            S/s...+./m+|       381.1 |     2,624.1 |       0.0% |         285.63 |
|minisat_complete           S/sh..+./m+|     1,904.5 |       525.1 |       3.1% |          37.38 |
|minisat_augmented          S/shrc+./m+|     1,458.1 |       685.8 |       5.7% |          29.34 |
|_tdev_dpll_triad_scc_ih    S/shrc++/m+|     2,019.9 |       495.1 |      23.6% |           5.85 |
|lhl_sudoku                 G/s...../m.|     6,685.6 |       149.6 |       0.0% |         268.60 |
|_tdev_basic_heuristic      G/s...../m.|    10,557.5 |        94.7 |       0.0% |         292.49 |
|zerodoku                   G/sh..../m.|    14,659.2 |        68.2 |       3.3% |          27.33 |
|fast_solv_9r2              E/sh..../m.|    10,840.2 |        92.2 |       3.3% |          26.04 |
|kudoku                     E/sh..../m.|    11,579.6 |        86.4 |       3.3% |          22.70 |
|norvig                     C/sh..../m.|     6,028.4 |       165.9 |       3.4% |          27.57 |
|bb_sudoku                  C/shrc../m.|    27,794.9 |        36.0 |       5.6% |          36.77 |
|fsss                       C/shrc../m.|    32,108.2 |        31.1 |       5.6% |          19.66 |
|jsolve                     C/shrc../m.|    33,215.5 |        30.1 |       5.6% |          18.38 |
|fsss2_locked               D/shrc../m.|    36,768.0 |        27.2 |       5.6% |          19.34 |
|fsss2                      D/sh..../m.|    54,460.8 |        18.4 |       3.3% |          23.22 |
|jczsolve                   B/shr.../m.|    57,504.4 |        17.4 |       4.3% |          26.38 |
|sk_bforce2                 B/shrc-./m+|    58,558.6 |        17.1 |       4.9% |          21.63 |
|rust_sudoku                B/shr.../m.|    69,744.9 |        14.3 |       4.3% |          25.58 |
|tdoku                      T/shrc+./m+|    88,578.9 |        11.3 |       8.5% |          13.03 |

The cryptic code following each solver's name is to help group solvers by their similarities and differences. It indicates the primary representation (SAT, Groups, Exact Cover, Cells, Digits, Bands, or Tdoku's thing), as well as supported inference (singles, hidden singles, locked candidates, + for something extra) and supported heuristics (generally min-remaining-values, or + for something extra).

The benchmark program does a couple of things to try to make comparisons as useful as possible:
  • makes sure every solver is operating in the same mode (count up to 2 solutions)
  • builds an in-memory dataset before benchmarking so I/O is out of the picture
  • randomly permutes the puzzles so benchmarking is not biased by presentation (e.g., as would occur if using a minlex dataset to benchmark a solver whose heuristics were tuned on minlex datasets).
  • runs for a while before benchmarking to warm cpu caches and branch predictors
  • runs for an integral number of passes over the dataset, and at least for a few seconds. long enough to minimize timing variance, and without bias in which puzzles were solved for slow solvers.
Of course, if you really want to compare solvers it's best to give each one the benefit of whatever compiler produces the fastest code for it. I didn't do that here. These results are clang-11 on an i7-1065G7 (a platform that's pretty favorable for tdoku relatively speaking).
I also used to turn off ALSR and pin to a single core that's otherwise excluded from scheduling, which I didn't do here either, those are aimed more at reducing variance than bias.
tdillon
 
Posts: 66
Joined: 14 June 2019

Re: Benchmark data set

Postby m_b_metcalf » Fri Jun 24, 2022 8:15 am

Lots of intersesting results. Thanks.

It did occur to me that we make these timings on valid puzzles, always looking for a second solution that never exists. I attached a new file of the same 834 puzzles, but each with one clue removed, at random. In these puzzles, with bewtween 18 and 28 clues, a second solution does exist. This is the situation we have when testing that a puzzle is minimal. The surprise for me was that my humble code finds that second solution twice as fast as it finds the first solution in the original file.

Curious.

Mike
Attachments
mult_solns.txt
Multiple solutions
(74.12 KiB) Downloaded 95 times
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13631
Joined: 15 May 2006
Location: Berlin

Re: Benchmark data set

Postby Hajime » Fri Jun 24, 2022 9:29 am

Without I/O my dlx solver is a bit faster, but not as near as all those turbo solvers. Impressive.
m_b_metcalf wrote:The surprise for me was that my humble code finds that second solution twice as fast as it finds the first solution in the original file.
Curious.

Not so curious: If each puzzle has, lets say, 100 solutions and you stop searching after the 2nd solution, the total time needed is much smaller.
I did a full search for all the solutions. See file for result.
The first puzzle noted: 15 1=1270/2=60/ which means 15 solutions 1270 naked/hidden clues processed and 60 times a guess between 2 candidates in 1 cell or 2 same candidates in a house was needed.
Of all the puzzles only one puzzle (#617) needs a guess in 3 cells/houses to find all the 1078 solutions ( 1078 1=12503/2=1128/3=8/).
In the quasi_uniform_834.txt all puzzles are solved with 2 candidates in a cell or 2 same candidates in a house (row/col/box) and no "/3" is needed.
User avatar
Hajime
 
Posts: 1377
Joined: 20 April 2018
Location: Fryslân


Return to Software