3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Programs which generate, solve, and analyze Sudoku puzzles

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Fri Jan 08, 2016 7:55 am

JasonLion wrote:Back when I was working on my adaptation of zhouyundong's code, I found significant speed improvements (20%) when compiled for 32 bit rather than 64 bit. This was very much not what I was expecting, as nearly all applications speed up noticeably by simply compiling with the 64 bit compiler flag, with no other changes. This might not still be true for champagne's version, but I suspect it probably is.


good topic to keep my brain working along the beach for the next 10 weeks.

starting from my last version of the "zhou" process, I see what to do to optimize the code for that, but this will be C++ and will keep use of some native instructions as popcount and bitscanforward
champagne
2017 Supporter
 
Posts: 7367
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby Serg » Fri Jan 08, 2016 8:43 am

Hi, tarek!
tarek wrote:
Serg wrote:OS: Microsoft Windows XP Professional version 2002 SP3 (Is it 64-bit? I don't know.)
MinGW (I use "gcc" compiler, but again don't know is it 64-bit).
Wow Serg, I think that is a bit out of date!!!!

Thanks, but to my mind it would be best if Microsoft would keep Windows unchanged from XP times. Instead we are forced to migrate to new Windows versions. Each new version is, as rule, uncompatible with previous one at GUI level, etc. It can be possible, of course, migrate to Linux, but I must use Office and some other important for me applications. So, I stay at Windows XP. It works fine and doesn't install every day Microsoft patches :) (because it is out of date).

Serg
Serg
2018 Supporter
 
Posts: 865
Joined: 01 June 2010
Location: Russia

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby Serg » Fri Jan 08, 2016 8:52 am

Hi, JasonLion!
JasonLion wrote:gcc supports both 32 bit and 64 bit, as do most other compilers. Likewise ZSolve works in both 32 bit and 64 bit modes. Newer compiler versions tend to default to 64 bit, older compiler versions tend to default to 32 bit. Both old and new compilers accept command line options to switch which mode to use. This isn't something you need to worry about unless you really want that last 15-20% of speed optimization and the solver accounts for most of your programs run time.

Thanks for information! I am planning to experiment with gcc options.

Serg
Serg
2018 Supporter
 
Posts: 865
Joined: 01 June 2010
Location: Russia

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby Serg » Fri Jan 08, 2016 6:40 pm

Hi, colleagues!
I've just done some experiments with "gcc" switches while compiling and executing ZSolver 1.0. I tried several switches of "gcc" during complilation of my program including ZSolver:

1. "gcc test.c" (all default switches values)
2. "gcc -mtune=generic test.c"
3. "gcc -mtune=native test.c"
4. "gcc -mtune=core2 test.c"

I observed equal performance in all 4 cases during test running, i.e. no dependence on this parameter. "gcc" says its version: 4.8.1 (rather up to date).

Testing program generates puzzles for given pattern, solves them to find valid puzzles and writes valid puzzles to file.
54 millions of puzzles were solved by ZSolver during test execution. Only 5% of them were valid (the rest have multiple solutions). Total time of test execution - 17 minutes. Speed of ZSolver's puzzles solving for this "mix" of puzzles - 81800 puzzles/sec. (My PC characteristics were published above in this thread.) Excellent result! One can compare this result with the speed of my home-made solver for the same "mix" of puzzles - 8880 puzzles/sec - 9.2 times slower.

Serg
Serg
2018 Supporter
 
Posts: 865
Joined: 01 June 2010
Location: Russia

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Fri Jan 08, 2016 9:24 pm

Try the -m32 command line option and see if that does anything.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby Serg » Fri Jan 08, 2016 11:58 pm

Hi, Jason!
JasonLion wrote:Try the -m32 command line option and see if that does anything.

Compilation with this switch "gcc -m32 test.c" gives the same result as compilation without this switch. It is very likely that my MinGW contains 32-bit "gcc" only. I found nothing about it at their site http://sourceforge.net/projects/mingw/?source=typ_redirect, but calling "gcc -v" shows:
...
... --host=mingw32 --build=mingw32 ...
...
Thread model: win32

Moreover, I found another MinGW project - "MinGW-w64 - for 32 and 64 bit Windows" (http://sourceforge.net/projects/mingw-w64/). It's curious, but my 64-bit notebook contains 2 Gb memory only. So, I see little sense in replacing existing my old MinGW "gcc" by this new (maybe 64-bit, but - ?).

Jason, I see very little sense in futher experiments with 32/64 bit compilation modes. Thank you for your help!

Serg
Serg
2018 Supporter
 
Posts: 865
Joined: 01 June 2010
Location: Russia

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sat Jan 09, 2016 2:09 am

So you are already in 32 bit mode, which is ideal for ZSolve.

64 bit mode has some significant advantages even in very small memory environments. Don't make the mistake of thinking it is only about using more memory. However, given what you have said, things are probably ideal for your specific uses as they are.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Tue Jan 12, 2016 3:28 pm

JasonLion wrote:Back when I was working on my adaptation of zhouyundong's code, I found significant speed improvements (20%) when compiled for 32 bit rather than 64 bit. This was very much not what I was expecting, as nearly all applications speed up noticeably by simply compiling with the 64 bit compiler flag, with no other changes. This might not still be true for champagne's version, but I suspect it probably is.


I made a quick adjustment in my current version of the zhouyundong's process to be nearly optimal (IMO) for a 32 bits processor.

My first tests lead to the following conclusions using microsoft visual C++ to compile

the new code is faster by about 5% in 64 bits mode
but the lay-out prepared for a 64/128 bits process remains faster by about 30%

I'll check later whether the last changes I made can improve the current 64/128 bits version.


BTW I have been impressed looking at the assembly code of the Update() function (80% of the runtime usually) to see the optimization made at compilation .
champagne
2017 Supporter
 
Posts: 7367
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Mon Jan 25, 2016 2:13 pm

I tried this out and it seemed to slow everything down by a tiny tiny bit. Then I tried taking out all use of TblMaskDouble, and that speed up solving, so my latest versions don't use TblMaskDouble at all.

zhouyundong_2012 wrote:void CreateTblMaskDouble()
{
int T[8] = { 0, 0, 0, 4, 0, 2, 1, 7 };
for (int i = 0; i < 512; i++)
{
int t1 = T[i >> 6];
int t2 = T[i >> 3 & 0x7];
int t3 = T[i & 0x7];
if (t1 == 0 || t2 == 0 || t3 == 0)
TblMaskDouble[i] = 0;
else
{
TblMaskDouble[i] = ((t1 << 6) | (t2 << 3) | t3) ^ 0x1FF;
TblMaskDouble[i] = TblCombineMask[TblMaskDouble[i]];
}
}
}
Maybe it should be fix for the sake of U[].
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Mon Jan 25, 2016 2:35 pm

Since I have been snowed in the last few days, I have been working on an easy to compile, portable, C code version of champagne's version of zhouyundong_2012's solver. I now have JCZSolve, which uses the key ideas from champagne's version but codes everything in C, uses my standard interface, removes all of the 128 bit register stuff (which isn't very portable) and adds some optimizations of my own.

A source code archive for version 1.0 is attached.

Removing the use of 128 bit registers slowed things down a little, but I was able to find enough new optimizations to get things back into the same range of speed as champagne's version. The code is now all standard C, much cleaner and far more portable.

This solver is 10% to 30% faster than ZSolve, and often, but not always, faster than champagne's v1 release.
Attachments
JCZSolve10.zip
(12.47 KiB) Downloaded 1040 times
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby Serg » Mon Jan 25, 2016 11:53 pm

Hi, JasonLion!
JasonLion wrote:... I now have JCZSolve, which uses the key ideas from champagne's version but codes everything in C, uses my standard interface, removes all of the 128 bit register stuff (which isn't very portable) and adds some optimizations of my own.

...

This solver is 10% to 30% faster than ZSolve, and often, but not always, faster than champagne's v1 release.

I saw 45% higher speed of JCZSolve relative to ZSolve on my test (a step of exhaustive search - a bunch of 54 millions of puzzles with 5% of them being valid (the rest have multiple solutions)). Speed of JCZSolve on my test - 118400 puzzles/sec. (Speed of ZSolve on my test was 81800 puzzles/sec.)

Congratulations to zhouyundong_2012, champagne and JasonLion! Thanks to JasonLion for portable C code making! Excellent result! I couldn't imagine 1.5 times speed improvement of ultimate fast ZSolve solver.

Serg
Serg
2018 Supporter
 
Posts: 865
Joined: 01 June 2010
Location: Russia

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Tue Jan 26, 2016 2:41 am

Hi Jasonlion,

good work.

On my side, I had no problem for a quick overview of your code.

2 small remarks in that first reading.

UPDN

Your idea to test whether the stack cleaning through the double mask pays was very good.
If this is ok, you should kill the now unnecessary code

Code: Select all
   B=g->bands[I*3+K]; \
   C=g->bands[I*3+L]; \



Guess sequence

You go directly from the bivalue in cell to the first unsolved cell.


This gives a much shorter code for the guess sequence.
I know that by far, the pair in cell is the most frequent guess in my code.

Do you have a measurement effect of the direct jump in the escape lane.
In my code, I test then bi_value in row (original first selection) and triplet in cell.

EDIT BTW the idea to use first the bi-value in cell was in fss2
champagne
2017 Supporter
 
Posts: 7367
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Tue Jan 26, 2016 5:06 am

champagne,

Thanks for pointing out the now unused B and C variables. Apparently my compiler was optimizing them out, as there was no further speedup. Still the code simplification is welcome.

I saw an improvement of roughly 1% getting rid of GuessHiddenBivalueinRow(). GuessTripletInCell() is more equivocal. It seems to help just a hair in your code, and hurt just a hair in my code.

The idea of searching for bi-value cells as preferred places to guess goes way back. I first saw it in code from Glenn Fowler in 2007. For me, the beauty of your version is that you get naked singles, cells with no candidates, and good guess locations from the same unified scan routine. I had previously tried naked singles and various guess routines in ZSolve one at a time, and discarded each individually as not worth the overhead. It appears to be the combination of the three uses for the scan that makes the payback attractive.

One minor note on your code that you might have missed in my version. In ZHOU::ApplySingleOrEmptyCells you have a line "if (idig == 8) return 1;", which I am almost certain should be 9 instead of 8. I couldn't find a puzzle in my collection where it made any difference however.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Tue Jan 26, 2016 5:38 am

JasonLion wrote:

One minor note on your code that you might have missed in my version. In ZHOU::ApplySingleOrEmptyCells you have a line "if (idig == 8) return 1;", which I am almost certain should be 9 instead of 8. I couldn't find a puzzle in my collection where it made any difference however.



In fact, I have seen that.

In my code, the test is inside the loop, after a digit appears as non active.

Your test is outside the loop, assuming that the exit of the loop is granted to be the limit value of the loop index.

This is better but I was not sure of that.

EDIT the change I'll do

a)adding a label
nextr1: at the end of the loop
b) replacing the break by a goto nextr1
c) having the
return 1
without test if the index exceeds the limit
champagne
2017 Supporter
 
Posts: 7367
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby zhouyundong_2012 » Sat Jan 30, 2016 9:30 am

Hi Jason and champagne.
Good Job! The Code Update keeps clean and gives a right direction to improve. I think it can be improved furtherly.

A little improvement:
if (!((AR>>9) & 0x1ff)) goto digit2;
==> if (!(AR & 0x3FE00)) goto digit2;
zhouyundong_2012
 
Posts: 148
Joined: 10 February 2012

PreviousNext

Return to Software