The New Sudoku Players' Forum

by **Mathimagics** » Sat Feb 16, 2019 4:22 am

Oh dear, I've been led into a trap, it seems!

My theory above was based on the (false) assumption that JCZSolver used "128-bit registers", eg SSE4.1. A comment in the code seemed to confirm this, so I didn't really look any further. I did wonder why it compiled cleanly without requiring "-march=native" or "-msse4.1" or similar.

But it doesn't use any 128-bit register code at all, it seems ...

Thus the version I just sent to Serg will probably have no effect whatsoever, and his performance mystery remains unresolved.

If anything, the mystery deepens. From 3x faster to nearly 2x slower is some kind of turnaround (!), and one that is really, really weird :!:

[EDIT] Unless, of course, my GCC compiler is very much smarter than whatever was used to produce Serg's JCZSolver … but that seems rather far-fetched! Data sensitivity? fsss2 seems to be better at singles-only cases, but only by 15-20%. Perhaps when I see his actual benchmark puzzle set that may shed some light ?

by **Serg** » Sat Feb 16, 2019 10:56 am

Hi, Mathimagics!

Mathimagics wrote:Serg, I need a copy of your 10000 benchmark puzzles, can you zip them up and send to me, please?

I sent my bencmark file to you by e-mail (though it can be also downloaded by the link from one of my previous posts).

Mathimagics wrote:Meanwhile I will get JCZsolve from here.

Yes, I use the same distributive of JCZSolver (1.0 version), posted by Jason.

Serg

by **Serg** » Sat Feb 16, 2019 11:21 am

Hi, champagne!

champagne wrote:Sorry Serg, as I wrote earlier, sk_bruteforce uses some 64 bits instructions. No chance to compile it in a 32 bits environment.

My CPU is 64-bit. It can execute SSE4.1 and EM64T sets of instructions, so CPU can use 128-bit registers. I installed 32-bit Windows 7, because my notebook contains 2 GB memory only. So, using 64-bit address space has small sense in my case.

What kind of 64-bit instructions do sk_bruteforce use? If it has array larger than 2 GB, then I cannot use it, but otherwise it looks like possible to execute it in my environment. I think you should produce objective file in Windows-format by command like that

g++ -m32 -msse4.1 -O2 -c bk_bruteforce.cpp -o bk_bruteforce.o

and send result to me with solver functions description (to call them from my traditional C program).

champagne wrote:But having a "good solution" with fss2, you have no reason to go in this direction.

Unfortunately I cannot be quite sure that my variant of fsss2 works correctly (performance measurements results are too curious). And it would be very interesting to compare your solver on the same machine and for the same benchmark.

champagne wrote:BTW; in the seacrh of 17 clues, I have some clue sets to test similar to your test file. I have a specific guess sequence for these sets using the fact that the solution is known.

In my case one solution is always known. The question is - has the puzzle additional solutions or no?

Serg

by **Mathimagics** » Sat Feb 16, 2019 11:39 am

Serg wrote:Unfortunately I cannot be quite sure that my variant of fsss2 works correctly (performance measurements results are too curious).

That should be a simple matter to check! Surely you can knock up a batch of puzzles where the total number of solutions (0, 1 or 2) is known, and check that each solver gets this right.

That is exactly what I do for any solver I send you, including fsss2. I have a batch that is half unique-solution, half multi-solution. So the total expected solutions is 3N/2, where N is the batch size.

Getting this value right doesn't necessarily prove that the solver works in all cases, but it does do wonders for the confidence level! :lol:

by **Mathimagics** » Sat Feb 16, 2019 12:01 pm

.
Ok, we have made some progress!

Serg sorted out his benchmark code, and now reports that, whereas the JCZsolver that he compiled on his machine gave these results (posted above):

Code: Select all: Solver Benchmark A Benchmark B Ratio JCZSolver 106,340 115,806 1 : 1.09 fsss2 195,265 258,099 1 : 1.32

... now we find the very same JCZsolver code compiled on my system (but targeted for his), gets THIS:

Code: Select all: Solver Benchmark A Benchmark B Ratio JCZSolver 210,655 256,509 1 : 1.22 fsss2 195,265 258,099 1 : 1.32

So he finds fsss2] and JCZsolver to be very similar for this benchmark set. I will verify that on my system (he has supplied the actual puzzles), but I am willing to bet that these are generally "low-guess" puzzles, and that JCZsolver (MY version) will most likely prove to have a clear advantage as we make the puzzle sets "harder" (ie more guessing!).

Thus the mystery would finally be resolved, and my admiration for JCZsolver would increase a notch! 8-)

Ah, but can it do SudokuPX? How about SudokuJW? :lol:

by **Serg** » Sat Feb 16, 2019 12:33 pm

Hi!

Mathimagics wrote:... now we find the very same JCZsolver code compiled on my system (but targeted for his), gets THIS:
Code: Select all
Solver Benchmark A Benchmark B Ratio JCZSolver 210,655 256,509 1 : 1.22 fsss2 195,265 258,099 1 : 1.32

So he finds fsss2] and JCZsolver to be very similar for this benchmark set.

I confirm these numbers.

Does Mathimagics found magic C++ compliler, speeding up solvers twice? :shock:

Serg

by **champagne** » Sat Feb 16, 2019 2:04 pm

Serg wrote:Hi, champagne!

My CPU is 64-bit. It can execute SSE4.1 and EM64T sets of instructions, so CPU can use 128-bit registers. I installed 32-bit Windows 7, because my notebook contains 2 GB memory only. So, using 64-bit address space has small sense in my case.

What kind of 64-bit instructions do sk_bruteforce use? If it has array larger than 2 GB, then I cannot use it, but otherwise it looks like possible to execute it in my environment. I think you should produce objective file in Windows-format by command like that

g++ -m32 -msse4.1 -O2 -c bk_bruteforce.cpp -o bk_bruteforce.o

Serg

let me try to answer first to this point

I did a similar action compiling in "x86 mode" instead of x6 mode and I got this list of errors

Hidden Text: Show

So the problem is not in the SSE set of instructions, but in the intrinsic intructions.

It would not be too hard to adjust the code switching to the 32 bit mode for these instructions, but this would be another code to maintain, just the opposite of what I am trying to do.

From the above list, It seems that only the BF128 class has to be changed.

by **Mathimagics** » Sat Feb 16, 2019 2:55 pm

.
I had thought that my old PC (Win7 32-bit) would provide the answer to the "magic compiler" question. Serg has a standard MinGW-32 build of GCC v6.3, and until recently I had the same on my old PC (which is a true 32-bit CPU, running Win7).

However, when I compile JCZsolver on that PC, and test this against the version I produced for Serg, it is just as fast. But now I realise this compiler is also quite different to Serg's!

Apparently I upgraded the GCC on that old PC a few months ago, but forgot all about it! It's GCC is actually a v8.1 build, and was produced by the MinGW-64 team. That project has advanced the compiler to its current form, and I think that any "magic" is probably down to them. MinGW-64 builds run on both Win32 and Win64, obviously. They can also produce code for either one.

by **blue** » Sat Feb 16, 2019 11:21 pm

Hi Serg,

Thank you for reporting on the modified benchmark code ... interesting results.

Mathimagics' last post, was interesting.
It seems like he should send you a "JCZSolve.o" built with the same tools that he used for "fsss2.o".
(Probably, he already has).

Best regards,
Blue.

by **champagne** » Sun Feb 17, 2019 2:52 am

Hi Serg,
I'll try to-day to produce a 32 bits version of sk_bruteforce. If the changes needed are limited to the BF128 class, it should be easy.

Serg wrote:In my case one solution is always known. The question is - has the puzzle additional solutions or no?
Serg

If your file can be something as
"81 characters puzzle";"81 characters known solution"
I can duplicate the process used in the 17 clues search.

by **Mathimagics** » Sun Feb 17, 2019 4:08 am

blue wrote:It seems like he should send you a "JCZSolve.o" built with the same tools that he used for "fsss2.o".
(Probably, he already has).

Indeed, I did just that! The "fsss2 twice as fast" myth is busted! 8-)

(see 6 and 7 posts up)

by **champagne** » Sun Feb 17, 2019 4:59 am

champagne wrote:Hi Serg,
I'll try to-day to produce a 32 bits version of sk_bruteforce. If the changes needed are limited to the BF128 class, it should be easy.

Done. I see a small deterioration in the performance eg on the file of potential hardest 1 637 062 hard puzzles

2mn 25 s with sk_bruteforce
2mn 33 s with the 32 bits version

note serg's file does not need the last guess step, but I see a similar effect

Serg, I can post somewhere the corresponding sources, may-be after an attempt to have a tailor made guess sequence for your specific case.

by **Serg** » Sun Feb 17, 2019 7:07 pm

Hi, colleagues!
I have good news, but I am confused, because my illiteracy was the cause of my problems :oops:

I tried to use GCC optimization flags "-O1"/"-O2"/"-O3" (letter "O", not zero), having "problem" setup: 32-bit Winows 7 + 32-bit GCC (version 8.1.0 from MinGW-W64 team). Here are results for "Benchmark A" (non-repetitive puzzles solving).

Code: Select all: No flags 100052 puzzles/sec -O1 204149 puzzles/sec -O2 212158 puzzles/sec -O3 207418 puzzles/sec

As you can see, performance of benchmark program (including the source code of JCZSolver), compiled with flag -O2 is similar to variant, when benchmark program was linked with objective file "JCZSolver32.o", compiled in 64-bit environment with the same flag -O2 (210655 puzzles/sec)!

I spied the idea to use these flags from Mathimagics, many thanks to him!

I think, variant of using objective file of another solver is still actual, when that solver is written in C++. Mathimagics explained me the way to call such C++ solvers from the main program, written in traditional C, by linking with corresponding objective file.

I am still interesting in trying BK_BRUTEFORCE solver. It's very interesting to compare it with other "grands" - JCZSolver and fsss2.

Serg

by **Serg** » Sun Feb 17, 2019 7:23 pm

Hi, champagne!

champagne wrote:I'll try to-day to produce a 32 bits version of sk_bruteforce. If the changes needed are limited to the BF128 class, it should be easy.

I am waiting for your result. Hope, producing a 32-bit version of sk_bruteforce will be not complicated.

champagne wrote:
Serg wrote:In my case one solution is always known. The question is - has the puzzle additional solutions or no?
Serg

If your file can be something as
"81 characters puzzle";"81 characters known solution"
I can duplicate the process used in the 17 clues search.

Do you mean new benchmark file, having 163-symbol strings of such form: puzzle + ";" + solution (81+1+81 symbols)?

Serg

by **Serg** » Sun Feb 17, 2019 7:39 pm

Hi, people!

Mathimagics wrote:The "fsss2 twice as fast" myth is busted!

Yes, it's obvious now, that both solvers - JCZSolver and fsss2 have similar performance and are grands in Sudoku solver's world!

Serg

P.S. I executed my exhaustive search task with "JCZSolver32.o" objective file, produced by Mathimagics in 64-bit environment (and with -O2 GCC flag), and got execution time 1.79 hours (6439 sec). Similar run with "fsss2-32.o" objective file lasted 1.83 hours (6596 sec). So, for my particular task and in my particular environment JCZSolver was 1.024 times faster than fsss2.

The New Sudoku Players' Forum

3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)