Hi, Tom!
Thank you for benchmarking my file!
At the first glance performance numbers are quite realistic. I need some time to ponder your results.
Serg
Mathimagics wrote:I have Serg's benchmark dataset, I assume it is the same as that which he sent me a few months back.
Mathimagics wrote:My results are:
...
Clearly we have some anomalies here!
|38puz2123548 | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 146,185.9 | 6.8 | 0.1% | 3.92 |
|fsss2 | 228,022.7 | 4.4 | 0.0% | 5.49 |
|../data/serg.txt | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 228,896.7 | 4.4 | 0.0% | 7.13 |
|fsss2 | 330,324.1 | 3.0 | 0.0% | 7.76 |
m
Tom in https://t-dillon.github.io/tdoku/#GettingVectorized wrote:When determining the next guess to make instead of finding the most constrained cell or box, it turns out to be best to find the band with the fewest possible configurations across all values, and then to branch on the value that has the fewest configurations in the band.
dobrichev wrote:but likely when guessing bi-value cells the preferred band (on otherwise equal conditions) should be the one with least pencilmarks.
######################
# Desktop1
######################
$ grep -i model <(lscpu); grep -i flags <(lscpu) | tr ' ' '\n' | egrep "(sse|avx)" | xargs
Model: 158
Model name: Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
sse sse2 ssse3 sse4_1 sse4_2 avx avx2
## build 1:
$ build/run_benchmark -h
build info: GNU 6.5.0 -O3 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 317,339.9 | 3.2 | 0.0% | 7.13 |
|jczsolve | 278,821.3 | 3.6 | 0.0% | 7.09 |
|fsss2 | 362,665.7 | 2.8 | 0.0% | 1.44 |
|fsss2{0x1} | 278,274.6 | 3.6 | 0.0% | 1.41 |
## build 2:
$ build/run_benchmark -h
build info: Clang 8.0.1 -O3 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 363,350.3 | 2.8 | 0.0% | 7.13 |
|jczsolve | 301,407.6 | 3.3 | 0.0% | 7.09 |
|fsss2 | 326,511.8 | 3.1 | 0.0% | 1.44 |
|fsss2{0x1} | 254,249.8 | 3.9 | 0.0% | 1.41 |
######################
# Desktop2
######################
$ grep -i model <(lscpu); grep -i flags <(lscpu) | tr ' ' '\n' | egrep "(sse|avx)" | xargs
Model: 62
Model name: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
sse sse2 ssse3 sse4_1 sse4_2 avx
## build 1:
$ build/run_benchmark -h
build info: GNU 5.5.0 -O3 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 220,922.6 | 4.5 | 0.0% | 7.13 |
|jczsolve | 210,224.4 | 4.8 | 0.0% | 7.09 |
|fsss2 | 286,110.7 | 3.5 | 0.0% | 1.44 |
|fsss2{0x1} | 222,263.6 | 4.5 | 0.0% | 1.41 |
## build 2:
$ build/run_benchmark -h
build info: GNU 6.5.0 -O3 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 219,589.2 | 4.6 | 0.0% | 7.13 |
|jczsolve | 211,990.3 | 4.7 | 0.0% | 7.09 |
|fsss2 | 283,084.8 | 3.5 | 0.0% | 1.44 |
|fsss2{0x1} | 224,056.9 | 4.5 | 0.0% | 1.41 |
## build 3:
$ build/run_benchmark -h
build info: Clang 8.0.1 -O3 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 237,551.6 | 4.2 | 0.0% | 7.13 |
|jczsolve | 220,150.0 | 4.5 | 0.0% | 7.09 |
|fsss2 | 269,831.4 | 3.7 | 0.0% | 1.44 |
|fsss2{0x1} | 208,989.4 | 4.8 | 0.0% | 1.41 |
## build 4:
$ build/run_benchmark -h
build info: GNU 6.5.0 -O2 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 201,596.9 | 5.0 | 0.0% | 7.13 |
|jczsolve | 207,840.0 | 4.8 | 0.0% | 7.09 |
|fsss2 | 267,627.5 | 3.7 | 0.0% | 1.44 |
|fsss2{0x1} | 204,536.7 | 4.9 | 0.0% | 1.41 |
######################
# GCP n1-standard-1 (Haswell) windows-server-2016 + CygWin
######################
$ grep -i model /proc/cpuinfo; grep -i flags /proc/cpuinfo | tr ' ' '\n' | egrep "(sse|avx)" | xargs
model : 63
model name : Intel(R) Xeon(R) CPU @ 2.30GHz
sse sse2 ssse3 sse4_1 sse4_2 avx avx2
$ build/run_benchmark.exe -h
build info: GNU 7.4.0 -O3 -march=native
$ build/run_benchmark.exe -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 167,451.0 | 6.0 | 0.0% | 7.13 |
|jczsolve | 157,077.9 | 6.4 | 0.0% | 7.09 |
|fsss2 | 205,386.9 | 4.9 | 0.0% | 2.87 |
|fsss2{0x1} | 159,964.2 | 6.3 | 0.0% | 2.83 |
$ build/run_benchmark.exe -h
build info: Clang 5.0.1 -O3 -march=native
$ build/run_benchmark.exe -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 187,838.6 | 5.3 | 0.0% | 7.13 |
|jczsolve | 168,512.0 | 5.9 | 0.0% | 7.09 |
|fsss2 | 192,897.5 | 5.2 | 0.0% | 2.87 |
|fsss2{0x1} | 148,797.4 | 6.7 | 0.0% | 2.82 |
######################
# Pixelbook
######################
$ grep -i model <(lscpu); grep -i flags <(lscpu) | tr ' ' '\n' | egrep "(sse|avx)" | xargs
Model: 142
Model name: 06/8e
sse sse2 ssse3 sse4_1 sse4_2 avx avx2
## build 1:
$ build/run_benchmark -h
build info: GNU 6.3.0 -O3 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 175,763.0 | 5.7 | 0.0% | 7.13 |
|jczsolve | 160,024.1 | 6.2 | 0.0% | 7.09 |
|fsss2 | 218,493.1 | 4.6 | 0.0% | 1.44 |
|fsss2{0x1} | 166,447.2 | 6.0 | 0.0% | 1.41 |
## build 2:
$ build/run_benchmark -h
build info: Clang 3.8.1 -O3 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 190,984.8 | 5.2 | 0.0% | 7.13 |
|jczsolve | 170,079.8 | 5.9 | 0.0% | 7.09 |
|fsss2 | 190,961.1 | 5.2 | 0.0% | 1.44 |
|fsss2{0x1} | 144,091.1 | 6.9 | 0.0% | 1.41 |
## build 3:
$ build/run_benchmark -h
build info: GNU 6.3.0 -O2 -march=native
$ build/run_benchmark -s tdoku_dpll_triad_simd,jczsolve,fsss2,fsss2:1 data/puzzles6_serg_benchmark
|data/puzzles6_serg_benchmark | puzzles/sec| usec/puzzle| %no_guess| guesses/puzzle|
|--------------------------------------|------------:| -----------:| ----------:| --------------:|
|tdoku_dpll_triad_simd | 163,148.0 | 6.1 | 0.0% | 7.13 |
|jczsolve | 155,002.3 | 6.5 | 0.0% | 7.09 |
|fsss2 | 197,388.3 | 5.1 | 0.0% | 1.44 |
|fsss2{0x1} | 147,621.8 | 6.8 | 0.0% | 1.41 |
dobrichev wrote:Surely for his implementation it is the best choice, but likely when guessing bi-value cells the preferred band (on otherwise equal conditions) should be the one with least pencilmarks.
champagne wrote:After many tests, I came just to the opposite conclusion.
Take in priority the band with more candidates. The underlying strategy being { a guess killing more candidates has a good chance to be the best guess).
c:\tdSolver\tdoku-master\buildCL>clang -O2 -march=native -c tdSolver.cpp -o tdSolver.o
In file included from tdSolver.cpp:2:
In file included from ./simd_vectors.h:5:
In file included from C:\LLVM\lib\clang\8.0.0\include\immintrin.h:32:
In file included from C:\LLVM\lib\clang\8.0.0\include\xmmintrin.h:39:
In file included from C:\LLVM\lib\clang\8.0.0\include\mm_malloc.h:27:
c:\LLVM\include\stdlib.h:9:10: fatal error: 'crtdefs.h' file not found
#include <crtdefs.h>
^~~~~~~~~~~
1 error generated.
c:\tdSolver\tdoku-master\buildCL>
clang -O2 -march=native -c tdSolver.cpp -o tdSolver.o -Ic:\MinGW\x86_64-w64-mingw32\include
In file included from tdSolver.cpp:2:
In file included from ./simd_vectors.h:5:
In file included from C:\LLVM\lib\clang\8.0.0\include\immintrin.h:28:
C:\LLVM\lib\clang\8.0.0\include\mmintrin.h:81:40: error: cannot initialize a parameter of type
'__attribute__((__vector_size__(2 * sizeof(int)))) int' (vector of 2 'int' values) with an rvalue of type '__v2si'
(aka 'int')
return __builtin_ia32_vec_ext_v2si((__v2si)__m, 0);
^~~~~~~~~~~
(… many errors of similar form follow …)
Serg wrote:I should say that your solver is rather sensitive to C++ compiler in use. It optimized for clang compiler.
Serg wrote:My computer is too outdated. The most modern feature of its CPU - SSE4.1 instruction set, processing 128-bit registers. But AVX and AVX2 process 256-bit registers.
champagne wrote:a guess killing more candidates has a good chance to be the best guess
dobrichev wrote:champagne wrote:a guess killing more candidates has a good chance to be the best guess
I tried this. On some really exotic puzzle batches the performance boost is 50 times!
For ordinary puzzles the preliminary implementation works ~30% slower.