3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Programs which generate, solve, and analyze Sudoku puzzles

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Mon Jan 25, 2016 2:35 pm

Since I have been snowed in the last few days, I have been working on an easy to compile, portable, C code version of champagne's version of zhouyundong_2012's solver. I now have JCZSolve, which uses the key ideas from champagne's version but codes everything in C, uses my standard interface, removes all of the 128 bit register stuff (which isn't very portable) and adds some optimizations of my own.

A source code archive for version 1.0 is attached.

Removing the use of 128 bit registers slowed things down a little, but I was able to find enough new optimizations to get things back into the same range of speed as champagne's version. The code is now all standard C, much cleaner and far more portable.

This solver is 10% to 30% faster than ZSolve, and often, but not always, faster than champagne's v1 release.
Attachments
JCZSolve10.zip
(12.47 KiB) Downloaded 1018 times
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby Serg » Mon Jan 25, 2016 11:53 pm

Hi, JasonLion!
JasonLion wrote:... I now have JCZSolve, which uses the key ideas from champagne's version but codes everything in C, uses my standard interface, removes all of the 128 bit register stuff (which isn't very portable) and adds some optimizations of my own.

...

This solver is 10% to 30% faster than ZSolve, and often, but not always, faster than champagne's v1 release.

I saw 45% higher speed of JCZSolve relative to ZSolve on my test (a step of exhaustive search - a bunch of 54 millions of puzzles with 5% of them being valid (the rest have multiple solutions)). Speed of JCZSolve on my test - 118400 puzzles/sec. (Speed of ZSolve on my test was 81800 puzzles/sec.)

Congratulations to zhouyundong_2012, champagne and JasonLion! Thanks to JasonLion for portable C code making! Excellent result! I couldn't imagine 1.5 times speed improvement of ultimate fast ZSolve solver.

Serg
Serg
2018 Supporter
 
Posts: 860
Joined: 01 June 2010
Location: Russia

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Tue Jan 26, 2016 2:41 am

Hi Jasonlion,

good work.

On my side, I had no problem for a quick overview of your code.

2 small remarks in that first reading.

UPDN

Your idea to test whether the stack cleaning through the double mask pays was very good.
If this is ok, you should kill the now unnecessary code

Code: Select all
   B=g->bands[I*3+K]; \
   C=g->bands[I*3+L]; \



Guess sequence

You go directly from the bivalue in cell to the first unsolved cell.


This gives a much shorter code for the guess sequence.
I know that by far, the pair in cell is the most frequent guess in my code.

Do you have a measurement effect of the direct jump in the escape lane.
In my code, I test then bi_value in row (original first selection) and triplet in cell.

EDIT BTW the idea to use first the bi-value in cell was in fss2
champagne
2017 Supporter
 
Posts: 7352
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Tue Jan 26, 2016 5:06 am

champagne,

Thanks for pointing out the now unused B and C variables. Apparently my compiler was optimizing them out, as there was no further speedup. Still the code simplification is welcome.

I saw an improvement of roughly 1% getting rid of GuessHiddenBivalueinRow(). GuessTripletInCell() is more equivocal. It seems to help just a hair in your code, and hurt just a hair in my code.

The idea of searching for bi-value cells as preferred places to guess goes way back. I first saw it in code from Glenn Fowler in 2007. For me, the beauty of your version is that you get naked singles, cells with no candidates, and good guess locations from the same unified scan routine. I had previously tried naked singles and various guess routines in ZSolve one at a time, and discarded each individually as not worth the overhead. It appears to be the combination of the three uses for the scan that makes the payback attractive.

One minor note on your code that you might have missed in my version. In ZHOU::ApplySingleOrEmptyCells you have a line "if (idig == 8) return 1;", which I am almost certain should be 9 instead of 8. I couldn't find a puzzle in my collection where it made any difference however.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Tue Jan 26, 2016 5:38 am

JasonLion wrote:

One minor note on your code that you might have missed in my version. In ZHOU::ApplySingleOrEmptyCells you have a line "if (idig == 8) return 1;", which I am almost certain should be 9 instead of 8. I couldn't find a puzzle in my collection where it made any difference however.



In fact, I have seen that.

In my code, the test is inside the loop, after a digit appears as non active.

Your test is outside the loop, assuming that the exit of the loop is granted to be the limit value of the loop index.

This is better but I was not sure of that.

EDIT the change I'll do

a)adding a label
nextr1: at the end of the loop
b) replacing the break by a goto nextr1
c) having the
return 1
without test if the index exceeds the limit
champagne
2017 Supporter
 
Posts: 7352
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby zhouyundong_2012 » Sat Jan 30, 2016 9:30 am

Hi Jason and champagne.
Good Job! The Code Update keeps clean and gives a right direction to improve. I think it can be improved furtherly.

A little improvement:
if (!((AR>>9) & 0x1ff)) goto digit2;
==> if (!(AR & 0x3FE00)) goto digit2;
zhouyundong_2012
 
Posts: 148
Joined: 10 February 2012

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sat Jan 30, 2016 7:30 pm

Good catch zhouyundong_2012! There are five other instances of very similar issues (shifts followed by comparing to a constant) in the Update routine. Fixing all of them speeds things up by less than 1/2 of a percent.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Removing candidates

Postby Afmob » Sun Jan 31, 2016 9:12 am

Guess I must be one of the few people for whom JCZSolve is actually slower (by about 6-7% on the subset method) than ZSolve (Compiler: TDM/MinGW 4.7.0, CPU: Core i5 2400, Win 7).

On another note, I tried adding a routine to InitSudoku to remove candidates (useful for checking minimality) like here. The code addition I use for ZSolve is:
Code: Select all
B = TblMult3[num-1] + TblBoard_Block[field]; %num from  1..9, field from 0...80
F[B] &= ~(1 << (field%27));

I tried porting this to JCZSolve but with no success. Do you've got any hints on how to do this?
Afmob
 
Posts: 132
Joined: 28 June 2011

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby zhouyundong_2012 » Sun Jan 31, 2016 1:29 pm

Hi Jason,
Why do you use T1[v >> 9] .. T2[v >> 18] .. T3[v >> 27] to calculate Shrink?
I think to cut v to 2 sections will be more effecient, although the table is large. see it in my code, T1[the high 15 bits of v]..T2[the low 15 bits of v]

Maybe we should make it the same way to calculate Shrink when we test speed.
zhouyundong_2012
 
Posts: 148
Joined: 10 February 2012

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sun Jan 31, 2016 3:35 pm

Afmob,

Speed depends rather a lot on what kinds of puzzles you are trying to solve. JCZSolve scarifies some speed on very "easy" puzzles in order to detect invalid puzzles more quickly. There are other related but more difficult to describe tradeoffs as well.

I believe that the code you want is:
Code: Select all
int band = digitToBaseBand[field] + cellToSubBand[field];
g->bands[band] &= ~TblSelfMask[field];
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sun Jan 31, 2016 3:57 pm

zhouyundong_2012,

Using two lookups into a larger shrink table uses fewer instructions, but results in more cache misses. Cache misses are expensive, and almost exactly cancel out the advantages of fewer instructions. Given they are about the same speed, using the smaller table saves memory and leaves more of the cache available to whatever other code is running that is calling the solver.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Postby Afmob » Sun Jan 31, 2016 4:58 pm

Thanks for the help! Though the code should probably read:
Code: Select all
int band = digitToBaseBand[zahl-1] + cellToSubBand[field];
g->bands[band] &= ~(1 << mod27[field]);

Still slower than ZSolve. :?
Afmob
 
Posts: 132
Joined: 28 June 2011

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sun Jan 31, 2016 5:13 pm

Afmob,

I was wrong when I said TblSelfMask. Still, a table lookup is faster and there is already a suitable table to use. cellToMask is the one you actually want. Of course your code will work and is only trivially slower.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby Afmob » Sun Jan 31, 2016 6:00 pm

You're right about "cellToMask". I missed that table. You can also use it to speed up the "SetSolvedDigit" routine by replacing:
Code: Select all
g->unsolvedRows[rowBit/27] &= ~(1<<(mod27[rowBit]));

with
Code: Select all
g->unsolvedRows[rowBit/27] &= ~cellToMask[rowBit];

Maybe one should also use a lookup table for the division?

Edit: Note that this makes the "mod27" table useless.
Last edited by Afmob on Thu Feb 11, 2016 10:00 am, edited 1 time in total.
Afmob
 
Posts: 132
Joined: 28 June 2011

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sun Jan 31, 2016 6:36 pm

Division is often a hair better than a table lookup, though it can come out either way (depending on instruction scheduling and cache pressure).
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

PreviousNext

Return to Software

cron