3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Programs which generate, solve, and analyze Sudoku puzzles

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby zhouyundong_2012 » Fri Nov 29, 2013 11:10 am

I don't think so.
Research of this code is just starting.
Don't give up.
Update is not the most effecient, Guess is not the most effecint.
zhouyundong_2012
 
Posts: 148
Joined: 10 February 2012

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby dukuso2 » Fri Nov 29, 2013 3:08 pm

the Chinese description, can we have it ?
It could be translated with bing or such
dukuso2
 
Posts: 13
Joined: 28 November 2013

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby dukuso2 » Fri Nov 29, 2013 3:11 pm

what's the most inner critical loop ?

you could include some counters at different places in the program
and see, how often each piece is executed in average

then you could write that critical part in assembly
I remember, we did this in CLAX 1-2 decades ago ...

would it be better to use 128-bit-registers ?

in theory they could make a sudoku-processor
dukuso2
 
Posts: 13
Joined: 28 November 2013

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby dukuso2 » Fri Nov 29, 2013 3:45 pm

snippets from the code :

//High, Middle, Low 9 bit of a 27-bit number
int F[27]; //30-bit Full Mask
int Block; //The Block Number of guess [0, 27)
int Mask; //The 9-bit Mask of guess
int Left; //The 9-bit Left Mask of guess

static byte Board[81+1]; //Board
static CGame G[60]; //Guess Board
static int Index; //Guess Index
static int BlockMask[27]; //Block
static int BlockMaskSum[3]; //Sum of BlockMask
static int F[27]; //27-bit Full Mask
static int CompF[27]; //Use for Compare


if (CompF[1] != F[1]) {
Shrink = FULL_TO_SHRINK(F[1]);
if ((F[1] &= TblComplexMask[Shrink]) == 0) return false;
S = FULL_TO_COLUMN(F[1]);
C1 = S | FULL_TO_COLUMN(F[0]); C2 = S | FULL_TO_COLUMN(F[2]);
F[2] &= TblMaskSingle[S] & TblMaskDouble[C1];
F[0] &= TblMaskSingle[S] & TblMaskDouble[C2];
S = TblRowUniq[TblShrinkSingle[Shrink] & TblColumnSingle[S]];
if ((F[1] >> 27) != S) {
F[1] &= S << 27 | BIT_SET_27;
S = ~(F[1] & TblRowMask[S]);
AN(F[4],S);AN(F[7],S);AN(F[10],S);AN(F[13],S); AN(F[16],S); AN(F[19], S); AN(F[22], S); AN(F[25], S);
} CompF[1] = F[1]; }

if (CompF[2] != F[2]) {
...
CompF[26] = F[26];}
dukuso2
 
Posts: 13
Joined: 28 November 2013

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Fri Nov 29, 2013 6:35 pm

dukuso2 wrote:what's the most inner critical loop ?

you could include some counters at different places in the program
and see, how often each piece is executed in average

then you could write that critical part in assembly
I remember, we did this in CLAX 1-2 decades ago ...

would it be better to use 128-bit-registers ?

in theory they could make a sudoku-processor


Thanks to mladen dobritchev, we made an intensive use of the 128 bits logic in the SKFR project.

I just started investigations in that code, my first feeling is that the main reason for the good performance could come from another point.

BTW, the statistics I have seen here show a high performance, but in a solver looking for other rules, the check of the validity of a puzzle is not by far the bottleneck. The situation is slightly different when we check the validity of a puzzle generated.

EDIT I was just analysing the piece of code you posted
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Fri Nov 29, 2013 6:39 pm

There is one quite large loop where almost all of the time is spent, the Update function. Converting it to assembly is very unlikely to help. Using 128 bit registers would require a complete redesign from scratch as the current code is heavily optimized towards 32 bit operations.

dukuso2 wrote:what's the most inner critical loop ?

you could include some counters at different places in the program
and see, how often each piece is executed in average

then you could write that critical part in assembly
I remember, we did this in CLAX 1-2 decades ago ...

would it be better to use 128-bit-registers ?

in theory they could make a sudoku-processor
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Sat Nov 30, 2013 1:15 pm

zhouyundong_2012 wrote:I don't think so.
Research of this code is just starting.
Don't give up.
Update is not the most effecient, Guess is not the most effecint.


After a first quick overview,

I see 2 or 3 very specific and efficient ideas. Both the band bit map and the shrinking are new concepts for me.

I also see some possibilities of improvement to explore.

I have one question.

The code is heavily oriented in band processing. Reversely. I see no use of column singles and no use of cell singles.
I can understand that using a column single would bring a negative balance in efficiency.
Reversely, my feeling would be in favour of using cell singles on top of row singles.

Did you try that and if yes, why did you give up.

Another question. say

Code: Select all
         C1 = S | FULL_TO_COLUMN(F[0]);
         C2 = S | FULL_TO_COLUMN(F[1]);
         F[1] &= TblMaskSingle[S] & TblMaskDouble[C1];
         F[0] &= TblMaskSingle[S] & TblMaskDouble[C2];


in update() seems to clear alignments in column (locked digit in a mini column)
1) is it correct
2) are you sure that this is a global improvement



Part of the code
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sat Nov 30, 2013 3:41 pm

champagne, the code you quoted appears to be involved with the clearing of candidates in other bands from solved digits and locked candidates in mini-columns (those two are not really distinguished from each other) as well as detection of locked candidates & hidden singles in columns.

FULL_TO_COLUMN is a simple OR of the three rows in the band.

TblMaskSingle is used to clear candidates in other bands which are in the same column as a locked block mini-column (which could also be a solved digit or block hidden single).

TblMaskDouble is used to clear candidates in a band when the candidates in the other two bands corresponding blocks only have candidates in two of their three columns. That is equivalent to locked candidates in column->block and gets you hidden singles in columns on the next pass (as they will now be hidden in their block).
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Sat Nov 30, 2013 5:08 pm

This was my understanding, but I have doubts that this is a good choice, compared to for example

the detection of singles in cells
a better control of the while loop in the Update function. (IMO we have at least one full cycle for nothing)
a true recursive mode to handle guesses


My feeling is that we don't save that much "guess steps" trying to eliminate locked candidates in columns. In rows, it comes nearly for free.

I am looking at drafting a revised code stressing on the points I addressed above, sometimes downgrading slightly the current performance, but keeping untouched what I see as main breakthrough.
May be later I'll compare to a 128 bitmaps solution.

Note: I have in my code the fss process, so I'll benchmark the code against fss
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby dukuso2 » Sat Nov 30, 2013 6:26 pm

didn't gsf have a fast solver in 2005
I remember several contests

----------------------------------------------------------

Code: Select all
17 20x1468 50k 500K gen
Zsorver 0.484 1.468 0.687 1.953
ZsolverO 0.593 1.874 0.859 2.093
Jsolve12 0.781 1.687 1.046 1.843
fsss 0.718 1.562 1.062 1.343
dukuso2
 
Posts: 13
Joined: 28 November 2013

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion » Sat Nov 30, 2013 6:33 pm

champagne, I look forward to seeing what you come up with.

I spent a week trying similar things and got nowhere. My take on why the current code is so fast is four fold.

First, the data structure provides for a great deal of parallelism.

Second, there is very little code that isn't a direct consequence of eliminating candidates contradicted by solved cells. Most of the work beyond that is being done simply by having more complex tables, without any additional code. That streamlining makes extra loops very cheap, so any code additions need to have relatively large paybacks. For example, naked singles are fairly rare given that all hidden singles have already been processed, so whatever you use to detect them will need to be very efficient.

Third, the line:
Code: Select all
if ((F[26] &= TblComplexMask[Shrink]) == 0) return false;
is remarkably efficient at getting out of the loop as soon as a contradiction is detected. This is something JSolve, and other similar solvers, spend a relatively higher percentages of their time taking care of.

And finally, there is a high ratio of arithmetic operations for each conditional branch. That gets you higher utilization of the various functional units internal to the CPU on modern processors.

dukuso2, zhouyundong_2012's solver is roughly twice as fast as any of the solvers from the JSolve/FSSS era, which were in turn more than twice as fast as gsf's code.
User avatar
JasonLion
2017 Supporter
 
Posts: 642
Joined: 25 October 2007
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Sat Nov 30, 2013 9:15 pm

Hi Jason,

Basically, I agree with you,

I am convinced that the band + shrink mode is a great breakthrough. Your third point is part of that.

I did not catch clearly your last point, but it's not so important.

What I intend to understand in that work is to see more precisely what are the options of value in that code.
I 'll keep the main option, the band process, in that deep investigation

One week should be a good target to get the first results.
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby zhouyundong_2012 » Sun Dec 01, 2013 12:36 pm

Another Software named Chinese Chess Continuous Check Designer and Cracker, is the most fast cracker and the most helpful designer.

It can produce a cchess game puzzle (you can consider it as a sudoku puzzle), that normal person hardly to crack out, only 0.01% of people can crack(need a large of time).

They call me the Father of modern Chinese Chess Continuous Check Game.

In CChess Code, I also use a mount of bit operation, and a specific algrithm(although the specific algrithm has a little little wrong :) )

8 years has past from V7.7.x is published.The new version is V7.8, nobody could write a same cracker beyond mine within next 10 years,I think so.
zhouyundong_2012
 
Posts: 148
Joined: 10 February 2012

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby dukuso2 » Sun Dec 01, 2013 1:18 pm

spend your time on writing something more useful

analysing and predicting financial markets from historical quotes and economic statistics data

analysing DNA.sequences

...
dukuso2
 
Posts: 13
Joined: 28 November 2013

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Tue Dec 03, 2013 5:48 pm

EDIT of the previous post.

I tried a process mixing zhouyundong_2012's code and a cell view. In line wit Jasonlion forecast, I had very disappointing results.
I then came back to a slightly downgraded version of the update function (sub program) coupled with a recursive mode.

The overall situation should be fairly balanced.

I made 2 tests against my integrated version of fss.

a) a file of 4.847.466 puzzles rating 1.2 to 4.4 with a pattern of a past game
b) a file of 963.779 puzzles "potential hardest".

All puzzles where qualified "valid" (I knew it for fss), but I had to filter, in my guess step, 2 cases that I expected solved within the Update() function. Is it a bug I introduced or something already in the previous code, I don't know. There is some "opacity" in the precise definition of each table and it would take to much time for me to dig in that part of the code. (I did not see the code generating the tables, what could have been a way to go in more details)

The results where nearly equal, with a tiny advantage to fss.

a) fss:1mn 10 against 1mn 18
b) fss:4mn 37 against 4mn 42.

I would be very reluctant to change my process with that preliminary results.
champagne
2017 Supporter
 
Posts: 7465
Joined: 02 August 2007
Location: France Brittany

PreviousNext

Return to Software