The New Sudoku Players' Forum

by **champagne** » Wed Feb 06, 2019 3:56 am

emerentius_ wrote:I compiled it without optimizations and now it works. With optimizations, all puzzles are counted in the leftmost column as invalid sudokus.
You may be depending on undefined behaviour somewhere. I'm using g++ version 7.3.0 on Ubuntu.

Hi emerentius,
I have seen something wrong in the "One digit guess"
I'll do more checks to-day before the fix of the code

EDIT: fix done, I did not run the regressive test on hardest puuzzles after the last code change. There was an error.
I also increased to be safe the size of the "one digit solutions" table. (a problem in the 17 clues search, may be linked to the other error).
The repository is ok,

by **Serg** » Wed Feb 13, 2019 11:23 pm

Hi, people!
I need fast Sudoku solver for my exhaustive search projects. For 3 last years I used JCZSolver 1.0 and I was quite satisfacted by that solver. Two weeks ago I had a possibility to test fsss2 solver by Mladen Dobrichev. Mathimagics help me to plug in fsss2 solver and explain the way to call this solver from traditional C programs.I am very grateful to him for his help!

Some words about by computer. It's rather old notebook with 64-bit CPU Intel Core2 Duo CPU T8300 @ 2.40GHz. (CPU supports SSE4.1 instruction set.) OS - MS Windows 7 32-bit, MinGW32 with gcc.

In fact I need 2 functions only to perform exhaustive search.

is_puzzle_valid(puzzle) - it returns "yes", if the puzzle is valid (has unique solution), otherwise - "no". No solutions at the output are required.

fill_solution_grid(puzzle,solution_grid) - it returns "yes" and solution_grid (1st available solution), if the puzzle has at least one solution, "no" - if the puzzle has no solutions.

Speed of the fill_solution_grid(puzzle,solution_grid) function isn't important, because it is called less frequently, than the is_puzzle_valid(puzzle) function. But speed of the is_puzzle_valid(puzzle) function is crucial for my exhaistive search procedure. So, is_puzzle_valid(puzzle) function only was tested in benchmark tests. This function was modelled by dSolve(puzzle,NULL,NULL) call for fsss2 solver and by JCZSolver(puzzle,NULL,2) call for JCZSolver.

I prepared benchmark file with 10000 puzzles from my real exhaustive search project. All puzzles have multiple solutions. Each puzzle is solved 1000 times to remove time to read this puzzle from the benchmark file. So, 10000 puzzles are solved 1000 times each, totally 10 millions puzzles are solved during benchmark run. Here are results.

Code: Select all: JCZSolver - 113000 puzzles/sec (8.8 us per puzzle) fsss2 - 254000 puzzles/sec (3.9 us per puzzle)

So, you can see that on my benchmark fsss2 is 2.25 times faster than JCZSolver. I suspect, that cause is that fsss2 uses SSE4.1 (128-bit registers), but perhaps JCZSolver doesn't exploit 128-bit registers feature.

One can find my benchmark puzzles by the link http://sites.google.com/site/sergsudoku/benchmark.zip, JCZSolver benchmark program - by the link http://sites.google.com/site/sergsudoku/JCZSolver_benchmark.c, fsss2 benchmark program - by the link http://sites.google.com/site/sergsudoku/fsss2-32_benchmark.c.

Serg

by **champagne** » Thu Feb 14, 2019 2:10 pm

Hi Serg,
As I wrote earlier, a benchmark on the brute force is not easy.

I down loaded your file. It is a very special one. On my laptop one run lasted 59 ms say 5.9 µs per puzzle.
With such a lot of puzzles, the guess sequence can be very simple. Likely in sk_bruteforce only the bivalue in cell is used in guess.

And zcjsolve uses the 128 bits set of instructions but the process is heavily band oriented (32 bits)

by **Mathimagics** » Thu Feb 14, 2019 5:54 pm

blue and I are also somewhat stunned by Serg's benchmark results.

One theory is that maybe the binaries I supplied really do employ the full MMX register capabilities of Serg's cpu, even in Win32 mode, whereas normally one might expect the compiler to use only those that would be available in "native" 32-bit mode?

It's certainly true that the guy who built my GCC compilers is very smart! If it's possible, then I think that he would have done this ...

by **Serg** » Thu Feb 14, 2019 9:31 pm

Hi, all!
I'd like to comment Mathimagics words:

Mathimagics wrote:... the binaries I supplied really do employ the full MMX register capabilities of Serg's cpu, even in Win32 mode, whereas normally one might expect the compiler to use only those that would be available in "native" 32-bit mode?

It's certainly true that the guy who built my GCC compilers is very smart! If it's possible, then I think that he would have done this ...

First, I should say that I use fsss2 in already compliled form, i.e. Mathimagics compiled fsss2 and sent me objective file "fsss2-32.o", which was called from my program (traditional C).

Here is fragment from Mathimagics PM, addressed to me.

Mathimagics wrote:I build fsss2 for my Win64 system with this:
g++ -march=native -O2 -c fsss2.cpp -o fsss2.o
To make your Win32 version I only changed this to:
g++ -m32 -msse4.1 -O2 -c fsss2.cpp -o fsss2-32.o

The -m32 option says "compile for Win32 target", and the -msse4.1 option says "please assume that the Win32 target also has SSE 4.1 instruction support".

So, if I am not mistaken, fsss2 for my benchmark was compiled by C++ compiler, described by Mathimagics in the thread Painless Win32/64 FSSS2.

Serg

P.S. I got similar to benchmark figures in my exhaustive search task. That task was executed 3.49 hours with JCZSolver and 1.83 hours with fsss2.

by **blue** » Fri Feb 15, 2019 1:46 am

Hi Serg,

Serg wrote:P.S. I got similar to benchmark figures in my exhaustive search task. That task was executed 3.49 hours with JCZSolver and 1.83 hours with fsss2.

I guess that's that, then.

For my sanity, would you please do this: modify your benchmark code, so that it loads the full list of puzzles, and then loops over the list 1000 times, rather than solving each each puzzle 1000 times in a row.

On newer machines, with 64-bit code and fsss2, there's an apparent 2-to-1 speedup if each puzzle is solved 1000 times before moving on to the next one, rather than the entire list being looped 1000 times.

I don't know what you'll see on your machine, and if there are changes, whether the JCZSolve and fsss2 times will maintain thier ratio.
Your "P.S." suggests that both will be effected in the same way, though ... if at all.

TIA,
Blue.

by **champagne** » Fri Feb 15, 2019 3:17 am

I added a 100x loop in my code and got on my laptop an average 2.7 µs per puzzle.
It will come in sk_bruteforce as an option for the process 11

Seems to be similar results

by **Mathimagics** » Fri Feb 15, 2019 3:21 am

I think blue is on the money here. That 2x factor should come down a notch or 3 when a different puzzle is solved at each call ...

by **champagne** » Fri Feb 15, 2019 3:26 am

blue wrote:Hi Serg,

For my sanity, would you please do this: modify your benchmark code, so that it loads the full list of puzzles, and then loops over the list 1000 times, rather than solving each each puzzle 1000 times in a row.

On newer machines, with 64-bit code and fsss2, there's an apparent 2-to-1 speedup if each puzzle is solved 1000 times before moving on to the next one, rather than the entire list being looped 1000 times.

Blue.

What I did is
read a puzzle
solve it n times.
loop

Doing as suggested blue should give s slight bonus throuh the cache effect. I'll test it with lots of 256 puzzles and a loop 1000x

but this is another bias in the benchmark to compare to operative conditions

by **Serg** » Fri Feb 15, 2019 9:11 am

Hi, blue!

blue wrote:Hi Serg,

Serg wrote:P.S. I got similar to benchmark figures in my exhaustive search task. That task was executed 3.49 hours with JCZSolver and 1.83 hours with fsss2.

I guess that's that, then.

For my sanity, would you please do this: modify your benchmark code, so that it loads the full list of puzzles, and then loops over the list 1000 times, rather than solving each each puzzle 1000 times in a row.

Done. I've got 194000 puzzles/sec instead of previous 254000 puzzles/sec. Therefore, some effect is visible ...
I am planning to repeat such experiment for JCZSolver tonight.

Serg

by **Serg** » Sat Feb 16, 2019 12:23 am

Hi, blue!
I've done experiments with both solvers - JCZSolver and fsss2.
I considered 2 benchmark procedures.

Code: Select all: Benchmark A Benchmark B read_10000_puzzles(); read_10000_puzzles(); start_timer(); start_timer(); for (i = 0; i < 1000; i++) { for (j = 0; j < 10000; j++) { for (j = 0; j < 10000; j++) { for (i = 0; i < 1000; i++) { solve(puzzle[j]); solve(puzzle[j]); } } } } stop_timer(); stop_timer();

So, you can see, that "Benchmark A" is your test, but "Benchmark B" is my original test (each puzzle is solved 1000 times successively).
Here are results (numbers denote "puzzles/sec").

Code: Select all: Solver Benchmark A Benchmark B Ratio JCZSolver 106,340 115,806 1 : 1.09 fsss2 195,265 258,099 1 : 1.32

So, you can see that "speeding up" effect of successively solving the same puzzle many times is more stronger for fsss2 (1.32 times speeding up), but some effect is visible for JCZSolver too (1.09 times speeding up).

fsss2 is 1.84 times faster, than JCZSolver on Benchmark A (non-repetitive puzzles).
fsss2 is 2.23 times faster, than JCZSolver on Benchmark B (repetitive puzzles).

Serg

by **Serg** » Sat Feb 16, 2019 12:27 am

Hi, champagne!

champagne wrote:I added a 100x loop in my code and got on my laptop an average 2.7 µs per puzzle.
It will come in sk_bruteforce as an option for the process 11

Could you compile sk_bruteforce solver for Windows 32-bit environment and send me objective file to be called from traditional C?

Serg

by **Mathimagics** » Sat Feb 16, 2019 1:35 am

Serg wrote:fsss2 is 1.84 times faster, than JCZSolver on Benchmark A (non-repetitive puzzles).

Zounds!

So, gentlemen, still a suprising result, methinks. What are we to make of this? :?:

Serg, I need a copy of your 10000 benchmark puzzles, can you zip them up and send to me, please?

Meanwhile I will get JCZsolve from here.

by **champagne** » Sat Feb 16, 2019 2:14 am

Serg wrote:Hi, champagne!
champagne wrote:I added a 100x loop in my code and got on my laptop an average 2.7 µs per puzzle.
It will come in sk_bruteforce as an option for the process 11

Could you compile sk_bruteforce solver for Windows 32-bit environment and send me objective file to be called from traditional C?

Serg

Sorry Serg, as I wrote earlier, sk_bruteforce uses some 64 bits instructions. No chance to compile it in a 32 bits environment.
What could be possible would be to revise the code in ZCJsolve (guess sequence) to include the best process, another possibility would be to see what you get from "emerentius" changes in ZCJsolve,

But having a "good solution" with fss2, you have no reason to go in this direction.

BTW; in the seacrh of 17 clues, I have some clue sets to test similar to your test file. I have a specific guess sequence for these sets using the fact that the solution is known..

by **Mathimagics** » Sat Feb 16, 2019 3:54 am

Ok, I think that I know what is going on here!

I made a JCZSolver from the link above, and in my Win64 x86_64 environment, it's perhaps 3 x faster than fsss2.

My bench test here was solving 2 puzzles, average 50 guesses required (for fsss2, anyway), repeated 500,000 times. 1 million solves all up. fsss2 took 9s, JCSolver took 3s. Not very rigorous, but revealing.

Given that these 2 puzzles are "harder" than typical 17-clue puzzles (I removed 4-6 clues from 2 different 17-clue puzzles), that JCZsolver time for my 4GHz CPU seems to be pretty darn good, and I am suitably impressed, gentlemen!! :shock:

IMHO, this points the finger firmly in the direction of Serg's copy of JCZsolver. I understand that it was also a bespoke version, made just for him, and since this was done before any of us knew he actually had SSE4.1 instruction support, perhaps this explains the mystery?

I will make a 32-bit linkable JCZSolver for him, and I expect that he will find that will give very different results! 8-)

The New Sudoku Players' Forum

3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)