The New Sudoku Players' Forum

by **tdillon** » Sat Jun 15, 2019 4:33 pm

Hi all,

Lately I've caught the Sudoku solver-writing bug and have produced a new tool I hope may be of interest to folks in this forum.

You can find the code here, and you can find a (long) writeup of the motivation and design here.

To whet your appetite, I offer the following benchmark results:

Code: Select all: |puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |jczsolve | 123,779.7 | 8.1 | 2.3% | 12.53 | |tdoku_dpll_triad_simd | 165,371.1 | 6.0 | 7.9% | 6.13 |

The aim was to make a solver that's fast for harder puzzles, and it's been an interesting journey.

Here are results relative to JCZSolve on a number of datasets of increasing difficulty:

Cheers,

by **champagne** » Tue Jun 18, 2019 4:52 am

Interesting, but too late for me.
On the other side, working with Microsoft Visual Basic, I see in your comments some "warnings" about the performances using it.
And I have to confess that I am too old to have learned at scool the underlying basis

by **Mathimagics** » Tue Jun 18, 2019 7:50 am

It would be interesting to see Mladen's fsss2 solver included in this comparison. It's faster than jczsolve, I think ...

by **tdillon** » Tue Jun 18, 2019 8:37 am

In my experiments fsss2 has indeed been faster for the easy puzzles in the kaggle dataset. But for the harder puzzles which have been my chief interest it's been slower. Here's a rerun for all data sets including fsss2:

UPDATE based on what was learned below: FSSS2 was handicapped in these runs vs other solvers because it was looking for 2 solutions instead of 1. This doesn't make a difference for datasets where no search is required, but for the harder datasets FSSS2 should be ~2x faster. FSSS2 also benefits from using locked candidates on the hard datasets, and these were not used here. So basically ignore this comparison and see updates below.

Code: Select all: |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 1,104,367.8 | 0.9 | 100.0% | 0.00 | |jczsolve | 595,396.5 | 1.7 | 100.0% | 0.00 | |tdoku_dpll_triad_simd | 787,194.9 | 1.3 | 100.0% | 0.00 | |data/puzzles1_17_clue | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 210,084.9 | 4.8 | 44.6% | 4.43 | |jczsolve | 317,164.6 | 3.2 | 69.6% | 1.25 | |tdoku_dpll_triad_simd | 304,769.0 | 3.3 | 78.7% | 0.48 | |data/puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 42,455.1 | 23.6 | 0.0% | 37.39 | |jczsolve | 123,248.7 | 8.1 | 2.2% | 12.55 | |tdoku_dpll_triad_simd | 166,265.4 | 6.0 | 7.9% | 6.11 | |data/puzzles3_forum_hardest_1905 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 14,067.1 | 71.1 | 0.0% | 117.89 | |jczsolve | 30,561.1 | 32.7 | 0.0% | 73.23 | |tdoku_dpll_triad_simd | 44,999.4 | 22.2 | 0.0% | 30.61 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 11,506.7 | 86.9 | 0.0% | 139.28 | |jczsolve | 23,936.7 | 41.8 | 0.0% | 89.97 | |tdoku_dpll_triad_simd | 37,867.1 | 26.4 | 0.0% | 35.83 | |data/puzzles5_forum_hardest_1106 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 6,197.0 | 161.4 | 0.0% | 279.77 | |jczsolve | 12,751.2 | 78.4 | 0.0% | 188.35 | |tdoku_dpll_triad_simd | 24,711.8 | 40.5 | 0.0% | 60.32 |

As before, i5-8600k@3.6GHz; clang++-8 -O3 -mavx2 -march=native; Ubuntu 18.04

by **dobrichev** » Tue Jun 18, 2019 8:49 am

Hi tdillon,

Your work is really impressive!
This brings me back to the hope that other revolutionary improvements in sudoku solvers are possible prior to switching to parallel GPU algorithms.

I just began reading your article.
Few comments so far, mainly from the code you published.
- You focused benchmarking on scrambled puzzles, which is OK. Till now to my knowledge only lazy benchmarks were done.
- You focused on finding first solution which isn't OK. For practical purposes usually an answer "no solutions" or "single solution" or "multiple solutions" answer is needed.
- Benchmarking code using older instructions family that a particular processor supports usually isn't very representative. Setting compiler option -march=native by default and testing on different hardware is more reliable.
- Minimization of the number of guesses is only of academical interest, although it is an approximation of the effectiveness. Seems the real power of your solvers family is in the overall performance.

by **dobrichev** » Tue Jun 18, 2019 9:31 am

The results from my machine.

I used my local copy of fsss2 which can differ from the one used by tdillon.
build info: GNU 6.4.0 -O3 -march=native (reported by ./build/run_benchmark -h)
i5-3470 CPU @ 3.20GHz

For searching for second solution in run_benchmarks.cc I modified output buffer size from 82 to 256 and number of solutions from 1 to 2

Code: Select all: char output[256]{0};

and later

Code: Select all: if (!solver.Solve(puzzle.c_str(), 2, output, &num_guesses) || (validate_ && !ValidateSolution(output))) { cout << "Error during benchmark" << endl;

Solving up to the first solution

Hidden Text: Show

Code: Select all: |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 453,638.3 | 2.2 | 0.0% | 32.89 | |tdoku_basic{0x1} | 100,985.0 | 9.9 | 100.0% | 0.00 | |tdoku_dpll_triad_scc | 5,665.8 | 176.5 | 100.0% | 0.00 | |tdoku_dpll_triad_scc{0x1} | 5,318.2 | 188.0 | 100.0% | 0.00 | |tdoku_dpll_triad_scc{0x2} | 5,321.4 | 187.9 | 100.0% | 0.00 | |tdoku_dpll_triad_scc{0x3} | 5,329.4 | 187.6 | 100.0% | 0.00 | |tdoku_dpll_triad_simd | 516,376.1 | 1.9 | 100.0% | 0.00 | |fsss2 | 1,228,958.5 | 0.8 | 100.0% | 0.00 | |data/puzzles1_17_clue | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 2.8 | 358,715.7 | 0.0% | 10,010,427.23 | |tdoku_basic{0x1} | 150.1 | 6,660.7 | 0.0% | 2,995.37 | |tdoku_dpll_triad_scc | 5,337.2 | 187.4 | 78.6% | 0.61 | |tdoku_dpll_triad_scc{0x1} | 4,699.9 | 212.8 | 86.8% | 0.23 | |tdoku_dpll_triad_scc{0x2} | 4,792.6 | 208.7 | 78.6% | 0.26 | |tdoku_dpll_triad_scc{0x3} | 4,824.8 | 207.3 | 86.8% | 0.15 | |tdoku_dpll_triad_simd | 203,760.4 | 4.9 | 78.6% | 0.48 | |fsss2 | 189,804.0 | 5.3 | 44.5% | 4.45 | |data/puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 38.9 | 25,691.0 | 0.0% | 743,543.89 | |tdoku_basic{0x1} | 1,147.9 | 871.1 | 0.0% | 319.26 | |tdoku_dpll_triad_scc | 2,944.6 | 339.6 | 7.9% | 8.15 | |tdoku_dpll_triad_scc{0x1} | 1,486.0 | 672.9 | 12.7% | 4.51 | |tdoku_dpll_triad_scc{0x2} | 1,994.2 | 501.5 | 7.8% | 3.35 | |tdoku_dpll_triad_scc{0x3} | 1,956.2 | 511.2 | 12.7% | 2.65 | |tdoku_dpll_triad_simd | 112,759.3 | 8.9 | 7.9% | 6.11 | |fsss2 | 37,963.4 | 26.3 | 0.0% | 37.28 | |data/puzzles3_forum_hardest_1905 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 290.1 | 3,447.2 | 0.0% | 89,964.92 | |tdoku_basic{0x1} | 794.9 | 1,258.0 | 0.0% | 377.43 | |tdoku_dpll_triad_scc | 655.2 | 1,526.2 | 0.0% | 58.07 | |tdoku_dpll_triad_scc{0x1} | 307.7 | 3,249.6 | 0.0% | 28.28 | |tdoku_dpll_triad_scc{0x2} | 448.1 | 2,231.8 | 0.0% | 22.22 | |tdoku_dpll_triad_scc{0x3} | 463.4 | 2,158.2 | 0.0% | 16.58 | |tdoku_dpll_triad_simd | 30,724.4 | 32.5 | 0.0% | 30.63 | |fsss2 | 12,475.7 | 80.2 | 0.0% | 117.90 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 210.8 | 4,743.7 | 0.0% | 124,058.69 | |tdoku_basic{0x1} | 656.3 | 1,523.7 | 0.0% | 441.15 | |tdoku_dpll_triad_scc | 521.3 | 1,918.1 | 0.0% | 70.53 | |tdoku_dpll_triad_scc{0x1} | 241.7 | 4,138.1 | 0.0% | 35.34 | |tdoku_dpll_triad_scc{0x2} | 361.0 | 2,770.3 | 0.0% | 26.70 | |tdoku_dpll_triad_scc{0x3} | 370.2 | 2,701.2 | 0.0% | 20.39 | |tdoku_dpll_triad_simd | 26,095.0 | 38.3 | 0.0% | 35.81 | |fsss2 | 10,209.6 | 97.9 | 0.0% | 139.13 | |data/puzzles5_forum_hardest_1106 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 72.9 | 13,709.9 | 0.0% | 369,968.86 | |tdoku_basic{0x1} | 318.8 | 3,136.8 | 0.0% | 960.21 | |tdoku_dpll_triad_scc | 295.9 | 3,379.4 | 0.0% | 138.38 | |tdoku_dpll_triad_scc{0x1} | 151.1 | 6,618.0 | 0.0% | 57.95 | |tdoku_dpll_triad_scc{0x2} | 188.1 | 5,316.1 | 0.0% | 54.29 | |tdoku_dpll_triad_scc{0x3} | 199.4 | 5,014.5 | 0.0% | 39.19 | |tdoku_dpll_triad_simd | 16,733.6 | 59.8 | 0.0% | 60.43 | |fsss2 | 5,385.6 | 185.7 | 0.0% | 280.23 |

Solving up to the second solution

Hidden Text: Show

Code: Select all: |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 299,774.8 | 3.3 | 0.0% | 54.36 | |tdoku_basic{0x1} | 105,974.3 | 9.4 | 100.0% | 0.00 | |tdoku_dpll_triad_scc | 5,635.6 | 177.4 | 100.0% | 0.00 | |tdoku_dpll_triad_scc{0x1} | 5,306.7 | 188.4 | 100.0% | 0.00 | |tdoku_dpll_triad_scc{0x2} | 5,317.3 | 188.1 | 100.0% | 0.00 | |tdoku_dpll_triad_scc{0x3} | 5,305.1 | 188.5 | 100.0% | 0.00 | |tdoku_dpll_triad_simd | 519,773.3 | 1.9 | 100.0% | 0.00 | |fsss2 | 1,234,165.1 | 0.8 | 100.0% | 0.00 | |data/puzzles1_17_clue | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 1.5 | 683,599.9 | 0.0% | 19,588,925.61 | |tdoku_basic{0x1} | 77.0 | 12,986.7 | 0.0% | 6,133.72 | |tdoku_dpll_triad_scc | 5,118.9 | 195.4 | 78.9% | 0.81 | |tdoku_dpll_triad_scc{0x1} | 4,578.4 | 218.4 | 87.0% | 0.27 | |tdoku_dpll_triad_scc{0x2} | 4,755.7 | 210.3 | 78.9% | 0.27 | |tdoku_dpll_triad_scc{0x3} | 4,806.4 | 208.1 | 87.0% | 0.15 | |tdoku_dpll_triad_simd | 195,873.9 | 5.1 | 78.8% | 0.61 | |fsss2 | 186,010.1 | 5.4 | 44.7% | 4.41 | |data/puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 19.4 | 51,560.6 | 0.0% | 1,563,017.98 | |tdoku_basic{0x1} | 591.8 | 1,689.7 | 0.0% | 641.73 | |tdoku_dpll_triad_scc | 1,969.5 | 507.8 | 7.8% | 12.74 | |tdoku_dpll_triad_scc{0x1} | 1,091.4 | 916.2 | 12.4% | 6.25 | |tdoku_dpll_triad_scc{0x2} | 1,570.1 | 636.9 | 7.8% | 4.46 | |tdoku_dpll_triad_scc{0x3} | 1,590.8 | 628.6 | 12.5% | 3.35 | |tdoku_dpll_triad_simd | 77,484.3 | 12.9 | 7.9% | 9.05 | |fsss2 | 36,896.2 | 27.1 | 0.0% | 37.35 | |data/puzzles3_forum_hardest_1905 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 145.7 | 6,863.2 | 0.0% | 182,446.90 | |tdoku_basic{0x1} | 409.3 | 2,443.0 | 0.0% | 746.99 | |tdoku_dpll_triad_scc | 335.1 | 2,984.5 | 0.0% | 111.23 | |tdoku_dpll_triad_scc{0x1} | 167.7 | 5,963.1 | 0.0% | 51.02 | |tdoku_dpll_triad_scc{0x2} | 222.8 | 4,487.9 | 0.0% | 45.53 | |tdoku_dpll_triad_scc{0x3} | 236.5 | 4,229.2 | 0.0% | 32.62 | |tdoku_dpll_triad_simd | 16,549.3 | 60.4 | 0.0% | 55.02 | |fsss2 | 12,121.3 | 82.5 | 0.0% | 117.95 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 102.4 | 9,765.0 | 0.0% | 257,049.58 | |tdoku_basic{0x1} | 342.6 | 2,919.3 | 0.0% | 871.18 | |tdoku_dpll_triad_scc | 267.3 | 3,741.6 | 0.0% | 132.40 | |tdoku_dpll_triad_scc{0x1} | 129.5 | 7,721.4 | 0.0% | 64.98 | |tdoku_dpll_triad_scc{0x2} | 181.0 | 5,524.0 | 0.0% | 53.32 | |tdoku_dpll_triad_scc{0x3} | 190.0 | 5,263.0 | 0.0% | 39.44 | |tdoku_dpll_triad_simd | 13,788.1 | 72.5 | 0.0% | 65.03 | |fsss2 | 9,881.2 | 101.2 | 0.0% | 139.35 | |data/puzzles5_forum_hardest_1106 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_basic | 37.8 | 26,472.5 | 0.0% | 709,085.96 | |tdoku_basic{0x1} | 161.5 | 6,192.5 | 0.0% | 1,961.00 | |tdoku_dpll_triad_scc | 144.6 | 6,914.6 | 0.0% | 272.94 | |tdoku_dpll_triad_scc{0x1} | 79.7 | 12,548.4 | 0.0% | 109.65 | |tdoku_dpll_triad_scc{0x2} | 93.4 | 10,708.1 | 0.0% | 112.39 | |tdoku_dpll_triad_scc{0x3} | 100.1 | 9,986.7 | 0.0% | 79.58 | |tdoku_dpll_triad_simd | 8,899.9 | 112.4 | 0.0% | 113.14 | |fsss2 | 5,376.2 | 186.0 | 0.0% | 280.15 |

by **dobrichev** » Tue Jun 18, 2019 10:22 am

This is the result for solving the top 2M and bottom 2M puzzles from the 34M puzzles cache for patterns game 314 cache I found on my machine. Up to the second solution.

Hidden Text: Show

The cache is undoubtedly highly biased - all puzzles share the same pattern.
On the other hand, the pattern is expected to be biased to have harder puzzles than average (pure speculation), and the puzzles are harder than average for this pattern.
All puzzles are minimal and have exactly one solution.
Of course, all puzzles are far below the hardest.

Below are the distributions by ER for the whole set and for the top and bottom subsets

Hidden Text: Show

Top 2M

Code: Select all: cat <(head -n2000000 /home/mladen/sudoku/pg/314/57 | cut -f2 -d" " | cut -f1 -d"/" | cut -f2 -d"=" | sort -n | uniq -c) 1764 1.2 80196 1.5 2546 1.7 54459 2.0 11507 2.3 3646 2.5 27315 2.6 14793 2.8 15736 3.0 6470 3.2 20766 3.4 30399 3.6 45550 3.8 1610 4.0 30441 4.2 5425 4.4 31339 4.5 53789 4.6 1736 4.7 203 4.8 142 5.0 502 5.2 5437 5.6 814 5.7 17 5.8 1 5.9 339 6.2 985 6.5 112077 6.6 34583 6.7 5117 6.8 3887 6.9 8187 7.0 193761 7.1 213786 7.2 104766 7.3 5799 7.4 2856 7.5 35286 7.6 19512 7.7 30272 7.8 2995 7.9 241 8.0 5 8.1 13911 8.2 156850 8.3 150995 8.4 75489 8.5 12486 8.6 9457 8.7 55690 8.8 110509 8.9 133380 9.0 39385 9.1 17852 9.2 2159 9.3 110 9.4 101 9.5 151 9.6 226 9.7 98 9.8 6 9.9 1 10.0 7 10.1 18 10.2 37 10.3 19 10.4 5 10.5 1 10.6

Bottom 2M

Code: Select all: cat <(tail -n2000000 /home/mladen/sudoku/pg/314/57 | cut -f2 -d" " | cut -f1 -d"/" | cut -f2 -d"=" | sort -n | uniq -c) 9106 1.2 124986 1.5 1730 1.7 64747 2.0 22694 2.3 5318 2.5 19099 2.6 6791 2.8 10916 3.0 3987 3.2 16649 3.4 63681 3.6 295809 3.8 636 4.0 108675 4.2 10298 4.4 12926 4.5 1474 4.6 207 4.7 37 4.8 110 5.0 47 5.2 7697 5.6 1076 5.7 21 5.8 13 5.9 1 6.0 475 6.2 596 6.5 76985 6.6 103831 6.7 8871 6.8 5528 6.9 9283 7.0 319044 7.1 260035 7.2 100314 7.3 3884 7.4 2190 7.5 10947 7.6 6561 7.7 7000 7.8 524 7.9 34 8.0 2 8.1 8896 8.2 94162 8.3 78537 8.4 30730 8.5 4435 8.6 2858 8.7 14866 8.8 27229 8.9 25581 9.0 5696 9.1 1994 9.2 150 9.3 3 9.4 4 9.5 8 9.6 9 9.7 6 9.8 1 10.2

Entire set

Code: Select all: cat /home/mladen/sudoku/pg/314/57 | cut -f2 -d" " | cut -f1 -d"/" | cut -f2 -d"=" | sort -n | uniq -c 103100 1.2 1711803 1.5 39152 1.7 985352 2.0 310553 2.3 77687 2.5 383194 2.6 193192 2.8 226119 3.0 91009 3.2 254948 3.4 1178015 3.6 2743104 3.8 16324 4.0 960029 4.2 136009 4.4 349303 4.5 87444 4.6 7723 4.7 799 4.8 7 4.9 2148 5.0 2460 5.2 111403 5.6 15730 5.7 313 5.8 145 5.9 19 6.0 1 6.1 6701 6.2 13128 6.5 1493937 6.6 1451039 6.7 106803 6.8 84148 6.9 144163 7.0 4385823 7.1 4008716 7.2 1743156 7.3 87175 7.4 45962 7.5 336388 7.6 289664 7.7 235497 7.8 17825 7.9 2410 8.0 40 8.1 203924 8.2 2258561 8.3 2077834 8.4 980085 8.5 154144 8.6 111272 8.7 634832 8.8 1280719 8.9 1584998 9.0 523807 9.1 259753 9.2 33506 9.3 1669 9.4 1906 9.5 2550 9.6 3541 9.7 1213 9.8 116 9.9 31 10.0 78 10.1 267 10.2 296 10.3 207 10.4 22 10.5 4 10.6

by **dobrichev** » Tue Jun 18, 2019 11:55 am

And finally a comparison between tdoku_dpll_triad_simd and fsss2 with locked candidates and pairs enabled.

They are both disabled by default because in real puzzle generation, when loads of invalid puzzles are involved, overall performance with them gets worse.
Even when enabled, they are performed once per puzzle, just before the first guess.

Hidden Text: Show

fsss2 with locked candidates enabled.
in fsss2.h the following row is uncommented

Code: Select all: #define USE_LOCKED_CANDIDATES

Code: Select all: |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 511,622.1 | 2.0 | 100.0% | 0.00 | |fsss2 | 1,204,295.3 | 0.8 | 100.0% | 0.00 | |data/puzzles1_17_clue | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 196,093.2 | 5.1 | 78.6% | 0.62 | |fsss2 | 254,068.8 | 3.9 | 72.5% | 1.32 | |data/puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 77,589.6 | 12.9 | 7.9% | 9.04 | |fsss2 | 61,292.8 | 16.3 | 1.7% | 19.15 | |data/puzzles3_forum_hardest_1905 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 16,762.3 | 59.7 | 0.0% | 55.00 | |fsss2 | 12,436.9 | 80.4 | 0.0% | 117.53 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 14,023.0 | 71.3 | 0.0% | 65.03 | |fsss2 | 10,212.8 | 97.9 | 0.0% | 138.63 | |data/puzzles5_forum_hardest_1106 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 8,828.5 | 113.3 | 0.0% | 113.18 | |fsss2 | 5,532.8 | 180.7 | 0.0% | 277.00 |

fsss2 with both locked candidates and pairs enabled
in fsss2.h the following rows are uncommented

Code: Select all: #define USE_LOCKED_CANDIDATES #define USE_SUBSETS

Code: Select all: |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 516,905.0 | 1.9 | 100.0% | 0.00 | |fsss2 | 1,212,056.5 | 0.8 | 100.0% | 0.00 | |data/puzzles1_17_clue | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 196,489.4 | 5.1 | 78.7% | 0.62 | |fsss2 | 255,922.9 | 3.9 | 83.3% | 0.51 | |data/puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 77,383.4 | 12.9 | 7.9% | 9.06 | |fsss2 | 61,631.6 | 16.2 | 8.7% | 15.15 | |data/puzzles3_forum_hardest_1905 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 16,766.5 | 59.6 | 0.0% | 55.00 | |fsss2 | 11,995.4 | 83.4 | 0.0% | 117.58 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 13,953.8 | 71.7 | 0.0% | 65.04 | |fsss2 | 9,898.8 | 101.0 | 0.0% | 138.13 | |data/puzzles5_forum_hardest_1106 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 8,801.1 | 113.6 | 0.0% | 113.34 | |fsss2 | 5,424.2 | 184.4 | 0.0% | 275.81 |

fsss2 with pairs enabled but locked candidates disabled. Just for completeness.

Code: Select all: |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 519,121.0 | 1.9 | 100.0% | 0.00 | |fsss2 | 1,198,383.7 | 0.8 | 100.0% | 0.00 | |data/puzzles1_17_clue | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 195,032.5 | 5.1 | 78.7% | 0.61 | |fsss2 | 217,317.6 | 4.6 | 60.2% | 2.01 | |data/puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 76,805.3 | 13.0 | 7.9% | 9.06 | |fsss2 | 47,141.1 | 21.2 | 0.8% | 25.01 | |data/puzzles3_forum_hardest_1905 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 16,764.0 | 59.7 | 0.0% | 54.97 | |fsss2 | 12,081.6 | 82.8 | 0.0% | 117.91 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 13,982.7 | 71.5 | 0.0% | 65.06 | |fsss2 | 9,969.6 | 100.3 | 0.0% | 138.97 | |data/puzzles5_forum_hardest_1106 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 8,874.2 | 112.7 | 0.0% | 113.42 | |fsss2 | 5,300.4 | 188.7 | 0.0% | 279.31 |

by **tdillon** » Tue Jun 18, 2019 8:26 pm

Thanks very much for the feedback and the detailed comparison!

I've been permuting the puzzles for benchmarking because in early experiments the impact of different ordering heuristics seemed strongly contingent on the dataset. Also I wasn't sure what bias might arise from datasets which are presented in a normalized form. What do you mean by lazy benchmarks?

Thanks for pointing out that the most practical use case is to determine whether there are 0,1,2+ solutions. The way I was calling FSSS2 in other_solvers.cc didn't anticipate this and was just using getSingleSolution whether we're looking for 1 or 2. I've changed this to use hasOneSolution instead when we pass a limit other than 1. I expected this to slow FSSS2 down by ~50% on the hard data sets, but actually there was no change because getSingleSolution was already looking for 2 solutions:

Code: Select all: bool getSingleSolution::solutionFound() { return (++nsol == 2); //stop after possible second solution }

This seems link an unnecessary handicap, no?

Here are some results after changing getSingleSolution to stop when the first solution is found, which puts FSSS2 on the same footing as other benchmarked solvers when we just look for one solution:

Code: Select all: $ lscpu | grep Model.name Model name: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz $ CC=clang-8 CXX=clang++-8 ./BUILD.sh -DJCZSOLVE=on -DFSSS2=on -DNATIVE=on $ build/run_benchmark -t20 -w5 -m0 -s fsss2,jczsolve,tdoku_dpll_triad_simd data/puzzles0_kaggle data/puzzles4_forum_hardest_1905_11+ |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 908,150.8 | 1.1 | 100.0% | 0.00 | |jczsolve | 452,353.7 | 2.2 | 100.0% | 0.00 | |tdoku_dpll_triad_simd | 535,667.6 | 1.9 | 100.0% | 0.00 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 17,928.5 | 55.8 | 1.1% | 69.40 | |jczsolve | 17,202.5 | 58.1 | 0.0% | 90.00 | |tdoku_dpll_triad_simd | 25,984.5 | 38.5 | 0.0% | 35.78 | $ build/run_benchmark -t20 -w5 -m1 -s fsss2,jczsolve,tdoku_dpll_triad_simd data/puzzles0_kaggle data/puzzles4_forum_hardest_1905_11+ |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 958,620.7 | 1.0 | 100.0% | 0.00 | |jczsolve | 452,055.5 | 2.2 | 100.0% | 0.00 | |tdoku_dpll_triad_simd | 530,393.7 | 1.9 | 100.0% | 0.00 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 9,458.5 | 105.7 | 0.0% | 139.09 | |jczsolve | 9,039.3 | 110.6 | 0.0% | 170.87 | |tdoku_dpll_triad_simd | 14,017.9 | 71.3 | 0.0% | 64.99 |

In the tables above the first pair of runs looks for one solution and the second pair of runs looks for two. As expected, none of the solvers slow down on the kaggle data set because none of the puzzles require search to find the unique solution; and all of the solvers slow down by ~50% on the permuted puzzles in the hard data set because on average the unique solution is found halfway through the search space.

Note: I've generally been running benchmarking with clang since it produces faster code than gcc for both JCZSolve and Tdoku. But it does look like the opposite is true for FSSS2:

Code: Select all: $ CC=gcc-6 CXX=g++-6 ./BUILD.sh -DJCZSOLVE=on -DFSSS2=on -DNATIVE=on $ build/run_benchmark -t20 -w5 -m1 -s fsss2,jczsolve,tdoku_dpll_triad_simd data/puzzles0_kaggle data/puzzles4_forum_hardest_1905_11+ |data/puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 1,218,974.0 | 0.8 | 100.0% | 0.00 | |jczsolve | 438,702.6 | 2.3 | 100.0% | 0.00 | |tdoku_dpll_triad_simd | 505,281.5 | 2.0 | 100.0% | 0.00 | |data/puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |fsss2 | 9,773.4 | 102.3 | 0.0% | 139.26 | |jczsolve | 8,723.8 | 114.6 | 0.0% | 171.06 | |tdoku_dpll_triad_simd | 12,934.9 | 77.3 | 0.0% | 65.12 |

WRT -march=native, I've had the different instruction targets not so much for benchmarking but to make sure things will work for different architectures (including testing AVX512BITALG for Ice Lake via the Intel SDE). But I've made -march=native the default now.

by **dobrichev** » Wed Jun 19, 2019 12:32 am

tdillon wrote:Also I wasn't sure what bias might arise from datasets which are presented in a normalized form.

For example representing puzzle in its lexicographically minimal form trends to move the denser band to the bottom of the puzzle. fsss2 exploits this fact.

tdillon wrote:What do you mean by lazy benchmarks?

I mean that despite the efforts of the guys that collected these excellent puzzle sets, some testers (me and some others) neglected the necessity of buffering, scrambling, automatically adjusting the set size, etc. This results to lower quality benchmarking.
Maybe here is reasonable to note that in your benchmark in theory the compiler could see that nothing is done with the solutions (they aren't validated by default at the benchmark phase) and then progressively optimize the code up to counting the guesses.

tdillon wrote:Thanks for pointing out that the most practical use case is to determine whether there are 0,1,2+ solutions. The way I was calling FSSS2 in other_solvers.cc didn't anticipate this and was just using getSingleSolution whether we're looking for 1 or 2. I've changed this to use hasOneSolution instead when we pass a limit other than 1. I expected this to slow FSSS2 down by ~50% on the hard data sets, but actually there was no change because getSingleSolution was already looking for 2 solutions:

Code: Select all
bool getSingleSolution::solutionFound() { return (++nsol == 2); //stop after possible second solution }

This seems link an unnecessary handicap, no?

hasAnySolution::solve(const char* p) solves up to the first solution and returns 0 for no-solution or 1 for 1+ solutions.
hasSingleSolution::solve(const char* p) solves up to the second solution and returns 0 or 1 or 2 respectively for no-solution, single solution, and 2+ solutions.
solutionFound() returns true if the solving process must be forced to stop.

tdillon wrote:Note: I've generally been running benchmarking with clang since it produces faster code than gcc for both JCZSolve and Tdoku. But it does look like the opposite is true for FSSS2:

This isn't surprise because fsss2 is developed and optimized on GCC.

P.S. To minimize the off-topic here, I will send you few more comments in private message.

by **champagne** » Wed Jun 19, 2019 5:50 am

tdillon wrote:Thanks for pointing out that the most practical use case is to determine whether there are 0,1,2+ solutions. ...

More on this :

IMO, looking for the unique solution of a puzzle is not a key issue (in terms of performances).
This is the first step in my logical solver, but this is only to be sure that the solver can use uniqueness properties in the logical solving process.

I have currently a program calling the brute force billions times per day. (the check that all 17s are known); the answer is always "multiple solutions" except for the 49157 known 17.
The vicinity search is another intensive user of the brute force. Here, most puzzles have multiple solutions, and a majority of them are "easy" (the rating is meaningless in a puzzle with multiple solutions). Depending on the way the code is written, the vicinity search can produce many puzzles with 0 solutions, but often, there are ways to avoid such puzzles in most cases.

At the end, a benchmark done exclusively on sudokus can be highly misleading for the choice of the brute force to use.

by **tdillon** » Fri Jun 21, 2019 7:15 am

Thanks to Mladen for calling attention offline to the fact that I'd only been focusing on valid puzzles and had a performance bug for invalid ones.

This is fixed, and I've updated the benchmarks in doc as follows:

* all results now reflect a search for up to 2 solutions instead of 1.
* fsss2 is now included in the comparison (configured to use locked candidates since this is fastest for the datasets I focus on).
* results for each solver reflect that solver's most favorable compiler (clang-8 for tdoku, jczsolve; gcc-6 for fsss2).

As Mladen and champagne both noted that performance on invalid and multi-solution puzzles is of great practical importance, I've added results for a dataset rich in such puzzles (http://www.enjoysudoku.com/gen_puzzles.zip).

A rough summary:
* fsss2 is especially fast for invalid or easy almost-solved puzzles; for these jczsolve and tdoku are similar to each other.
* all 3 solvers perform about the same for 17-clue puzzles.
* tdoku is especially fast for hard puzzles; for these jczsolve and fsss2 are similar to each other.

Code: Select all: |gen_puzzles | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 1,592,351.2 | 0.6 | 97.4% | 0.29 | |jczsolve | 1,779,151.9 | 0.6 | 97.4% | 0.31 | |fsss2 | 2,337,581.1 | 0.4 | 97.4% | 0.03 | |puzzles0_kaggle | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 623,855.8 | 1.6 | 100.0% | 0.00 | |jczsolve | 596,311.1 | 1.7 | 100.0% | 0.00 | |fsss2 | 1,456,305.4 | 0.7 | 100.0% | 0.00 | |puzzles1_17_clue | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 287,053.8 | 3.5 | 78.7% | 0.62 | |jczsolve | 289,520.0 | 3.5 | 69.6% | 1.86 | |fsss2 | 299,584.6 | 3.3 | 72.5% | 1.32 | |puzzles2_magictour_top1465 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 112,243.0 | 8.9 | 7.9% | 9.07 | |jczsolve | 77,760.3 | 12.9 | 2.3% | 20.77 | |fsss2 | 70,821.8 | 14.1 | 1.7% | 19.16 | |puzzles3_forum_hardest_1905 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 24,335.7 | 41.1 | 0.0% | 55.00 | |jczsolve | 16,197.5 | 61.7 | 0.0% | 138.69 | |fsss2 | 14,470.4 | 69.1 | 0.0% | 117.71 | |puzzles4_forum_hardest_1905_11+ | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 20,277.9 | 49.3 | 0.0% | 64.94 | |jczsolve | 12,564.5 | 79.6 | 0.0% | 170.99 | |fsss2 | 11,881.4 | 84.2 | 0.0% | 138.67 | |puzzles5_forum_hardest_1106 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 12,944.1 | 77.3 | 0.0% | 113.24 | |jczsolve | 6,567.1 | 152.3 | 0.0% | 366.84 | |fsss2 | 6,483.2 | 154.2 | 0.0% | 277.75 |

Cheers, Tom

by **Serg** » Thu Jun 27, 2019 5:14 pm

Hi, Tom!
Your solver demonstrates impressive performance!
Just to "crosscheck" your performance numbers - would you run benchmark with my benchmark file, downloadable by the link http://sites.google.com/site/sergsudoku/benchmark.zip. Benchmark file contains 10000 puzzles, generated in real exhaustive search project. All puzzles have multiple solutions (it is typical in my sudoku studies) and are very simple. For every puzzle must be determined - has this puzzle unique solution or not (case with 0 solutions doesn't differ from the case "2+ solutions" for me).

Serg

by **tdillon** » Fri Jun 28, 2019 2:50 am

Hi Serg,

Here's what I get for your benchmark (searching for up to 2 solutions):

Code: Select all: $ lscpu | grep Model.name Model name: Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz $ CC=clang-8 CXX=clang++-8 ./BUILD.sh -DJCZSOLVE=on -DFSSS2=on $ build/run_benchmark -t30 -w10 -s tdoku_dpll_triad_simd,jczsolve,fsss2 data/benchmark.txt |data/benchmark.txt | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 373,831.6 | 2.7 | 0.0% | 7.13 | |jczsolve | 308,694.3 | 3.2 | 0.0% | 7.09 | |fsss2 | 261,295.2 | 3.8 | 0.0% | 1.41 | $ CC=gcc-6 CXX=g++-6 ./BUILD.sh -DJCZSOLVE=on -DFSSS2=on $ build/run_benchmark -t30 -w10 -s tdoku_dpll_triad_simd,jczsolve,fsss2 data/benchmark.txt |data/benchmark.txt | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 333,917.7 | 3.0 | 0.0% | 7.13 | |jczsolve | 284,379.3 | 3.5 | 0.0% | 7.09 | |fsss2 | 285,270.4 | 3.5 | 0.0% | 1.41 |

Tom

by **Mathimagics** » Fri Jun 28, 2019 5:41 am

Hi Tom,

Firstly, a little problem with your guess counting is evident, your figures for fsss2 should be more or less the same as the others.

I have Serg's benchmark dataset, I assume it is the same as that which he sent me a few months back. I consider that set to be a fairly good basis for comparing solver performance under "battlefield conditions", which is where we usually are with common tasks like minimal-puzzle searching, puzzle reduction etc.

My results are:

Code: Select all: CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz g++: MinGW-64 (8.2.0) |Serg\benchmark.txt Win10(64) | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle| |--------------------------------------|------------:| -----------:| ----------:| --------------:| |tdoku_dpll_triad_simd | 233,263 | 4.3 | 0.0% | 7.04 | |jczsolve | 448,631 | 2.2 | 0.0% | 7.42 | |fsss2 | 496,031 | 2.0 | 0.0% | 7.45 |

Clearly we have some anomalies here! I have a faster CPU than yours, but get significantly worse tdoku performance than you.

Perhaps running on Windows is one complicating factor (it usually is!).

Perhaps compiler options? I just used "-O2 -march=native".

I don't have CLANG, but will try and get hold of it, and try that …

Cheers
MM

The New Sudoku Players' Forum

Tdoku: a speedy Sudoku solver

Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver

Re: Tdoku: a speedy Sudoku solver