3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Programs which generate, solve, and analyze Sudoku puzzles

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby JasonLion-Admin » Sat Feb 03, 2018 1:52 am

Hey, this is getting more and more off the topic of Sudoku, which is the purpose of this forum. Please stick to Sudoku related topic

Thanks
Jason
System Admin
User avatar
JasonLion-Admin
Site Admin
Site Admin
 
Posts: 89
Joined: 17 April 2010
Location: Silver Spring, MD, USA

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby emerentius_ » Sat Feb 17, 2018 2:23 am

champagne wrote:In most cases, the user does not care about the results, only by the count of solutions (cancelling the task if the count is >1).
In some cases (UA collector on my side for example) the solution is needed, but in a specific context and a tailor made extraction can be done.


Need is a strong word. I'd be surprised if there are any users of my library :)
I'm just offering a few apis and one of them returns the solutions. Another just counts. I don't have any specific use cases in mind just yet but I'm still expanding the feature set. These are the libraries docs btw. External dependencies are very easy to use in Rust.

The recent clang versions give a nice boost to the solver. I'm seeing a ~2-3% performance jump going from LLVM 4 -> 6 (in Rust) and similarly ~3% for clang 4 -> clang 5 with JCZsolve.
emerentius_
 
Posts: 23
Joined: 09 January 2018

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby emerentius_ » Mon May 28, 2018 7:10 pm

I think I've found a sizeable performance improvement. I'm not sure if it's valid under all circumstances.

In the places that look like the following code snippet in the Update routine, just before calling UPWCL, deleting the if-branch has improved the speed for most sudoku sets by 5-12% (in my Rust code).
Code: Select all
if (((AR>>3) & 7) != S) {
   AR &= 0777777707 | (S<<3);
   UPWCL(1, 1, 4, 7, 13, 16, 19, 22, 25);
   }

It should have the same speedup in C given that both use LLVM. I also have a small dataset of 10 sudokus for which there's a pessimization of ~15%, but top50k, sudoku17 and 20 of the hardest sudokus all show speedups.

Was the check added for performance only or is it needed for correctness in some edge case?
I didn't find any deviations in recognizing multi-solution sudokus or finding the correct solution for proper sudokus. I've checked sudoku17, top50k and FT_1_124, which is 130k sudokus total, among them 15k non-proper ones. I've also generated 1.000.000 proper sudokus. No differences.

Edit: What I find even more exciting is this: Rust has bounds checks on array indexing. With this change and a few others, the performance penalty for that vanishes. My implementation can be 100% memory safe Rust at full speed. I'll refrain from guessing what exactly it is that lets the optimizer do its magic.
emerentius_
 
Posts: 23
Joined: 09 January 2018

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Tue May 29, 2018 10:33 pm

emerentius_ wrote:I think I've found a sizeable performance improvement. I'm not sure if it's valid under all circumstances.

In the places that look like the following code snippet in the Update routine, just before calling UPWCL, deleting the if-branch has improved the speed for most sudoku sets by 5-12% (in my Rust code).
Code: Select all
if (((AR>>3) & 7) != S) {
   AR &= 0777777707 | (S<<3);
   UPWCL(1, 1, 4, 7, 13, 16, 19, 22, 25);
   }

....


What you are doing is not clear, but in the original code, this says

If you have new assignments in the band for the current digit,
update the new "unassigned status"
update the current digit status in other bands
update the other digits status.


Each of these operations has to be done and must be evaluated in the general scope.
I don't foresee a possible improvement skipping the test but ...
champagne
2017 Supporter
 
Posts: 7453
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby emerentius_ » Wed May 30, 2018 1:19 pm

champagne wrote:What you are doing is not clear, but in the original code, this says

Sorry, I meant: Skip the check and execute the contained code unconditionally.

champagne wrote:If you have new assignments in the band for the current digit,
update the new "unassigned status"
update the current digit status in other bands
update the other digits status.

Where exactly are you quoting this from?
emerentius_
 
Posts: 23
Joined: 09 January 2018

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Wed May 30, 2018 1:57 pm

emerentius_ wrote:
champagne wrote:If you have new assignments in the band for the current digit,
update the new "unassigned status"
update the current digit status in other bands
update the other digits status.

Where exactly are you quoting this from?



This looks like a code written by me, so I hope that it is doing what I expected :lol:

emerentius_ wrote:Sorry, I meant: Skip the check and execute the contained code unconditionally.


This was my understanding, but if I identified the code, I confirm that I see no reason to speed up the process
champagne
2017 Supporter
 
Posts: 7453
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby emerentius_ » Wed May 30, 2018 7:12 pm

champagne wrote:This looks like a code written by me, so I hope that it is doing what I expected :lol:

Did you post the code in this thread already? If so, where? I'd be happy for anything that'd help me understand the intricacies of the solver.

champagne wrote: This was my understanding, but if I identified the code, I confirm that I see no reason to speed up the process

Do you mean, you don't see why that would speed it up? There is not much code inside and branch misprediction can be costly.
emerentius_
 
Posts: 23
Joined: 09 January 2018

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Wed May 30, 2018 9:09 pm

emerentius_ wrote:
champagne wrote:This looks like a code written by me, so I hope that it is doing what I expected :lol:

Did you post the code in this thread already? If so, where? I'd be happy for anything that'd help me understand the intricacies of the solver.


See the first post page 13 in this thread. JasonLion also posted a C version of the same code

emerentius_ wrote:
champagne wrote: This was my understanding, but if I identified the code, I confirm that I see no reason to speed up the process

Do you mean, you don't see why that would speed it up? There is not much code inside and branch misprediction can be costly.


I don't understand "branch misprediction". This code has been studied by many guys (and is directly derived from the original code) The inside code is activated if and only if an assignment took place in the update sequence just above. Even using the pre processor, it contains about 10 instructions.
champagne
2017 Supporter
 
Posts: 7453
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby zhouyundong_2012 » Thu May 31, 2018 8:46 am

I think there are few people actually understand the code, this block the research of this code.
So I have an idea to make a viewable and operatable program which has next step button, prev step button.
what can reader view? in each one grid, it displays candidate digits

1 2 3
4 5 6
7 8 9
next step
------------------->
1 2 3
4 5 6
7 8 9
next step
------------------->
1 2 3
4 5 6
7 8 9

this is like the “Watch Window" in visual studio, action occurred when F10, F11 is clicked

Maybe one week is more than enough, but I am so lazy now!
If you want to do this helpful work, you can ask me for original code which is written by me. A viewable soduku with GT,LT,Killer, GT/Killer,
and graphics library, animation library, photoshop libraray, text libraray(simplified)
zhouyundong_2012
 
Posts: 148
Joined: 10 February 2012

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby emerentius_ » Thu May 31, 2018 4:53 pm

champagne wrote:I don't understand "branch misprediction". This code has been studied by many guys (and is directly derived from the original code) The inside code is activated if and only if an assignment took place in the update sequence just above. Even using the pre processor, it contains about 10 instructions.

At a branch the cpu may try to predict the correct path and will speculatively execute it. If it guessed wrong it has to flush the pipeline and revert all changes. A correct guess makes the branch almost free, an incorrect guess is very costly. There are three such if-blocks inside another if-block. If the outer if is entered, then at least one of the inner ones must also be entered but not necessarily all of them. This seems hard to predict.
The branch condition only contains things that should already be readily available. I'm not an expert on this and it seems to me like it shouldn't need speculative execution but my measurements show a definite improvement in branch misses.

Here are some high level stats collected with perf (linux profiling tool). The numbers for instructions, branches and misses are cumulative. Execute an instruction twice, it counts twice.
My solver in the 2nd column has some slight differences but is mostly equivalent to the original. In my solver in the 3rd column the innermost if-checks in the update function were removed.
Note how the removal of the branches drops the branch misses by 1/3 and how the instructions/cycle jump up. Of course, more instructions are executed but the total speed increases.
Code: Select all
                jczsolve         mine            mine with change
sudoku17           
time [s]        0.202            0.204           0.182
instructions    683,716,125      744,529,659     825,465,246
instr/cycle     1.16             1,25            1.5
branches        110,258,289      94,151,698      95,898,610
misses          10,447,095       10,346,431      7,355,023
           
top50k           
time [s]        0.34             0.335           0.294
instructions    1,060,183,483    1,137,223,944   1,221,260,048
instr/cycle     1.08             1,13            1.41
branches        170,937,968      143,608,199     145,150,639
misses          19,495,500       19,436,784      13,925,122
           
hard20x500           
time [s]        1.646            1.665           1.536
instructions    6,226,479,344    6,650,678,851   6,905,523,959
instr/cycle     1,28             1,35            1,53
branches        1,015,336,637    923,752,040     913,476,946
misses          82,979,035       83,671,900      61,916,728


It's not a complete apples to apples comparison because my solver loads the file completely in advance but my parser does more work. Parsing is not a significant part of the total runtime though.
The jczsolve version I'm using is from Jason Lion's post on p15 (direct link).
I've compiled it with clang5 which makes it a couple % faster than clang 4, as I've noted before. The rust compiler version is 1.27 (nightly, 2018-05-16). Rust uses clang6 already but the big improvement came with clang5.
emerentius_
 
Posts: 23
Joined: 09 January 2018

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Thu May 31, 2018 7:09 pm

I never fight again facts, so this is something to test on my computer later and to try to understand
champagne
2017 Supporter
 
Posts: 7453
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby emerentius_ » Wed Jun 06, 2018 4:02 am

I've changed the heuristic in GuessFirstCell to always check the first cell for each of the 3 bands and to choose the cell with the minimum number of possibilities (rather than stop at the first existing unsolved cell). This has no effect on easy sudokus where GuessFirstCell is never called but speeds up very hard sudokus on average by over 10%. I've checked it with your collection of potential hardest from 2017-12. The guess function is also sometimes called in sudoku17 but has no meaningful effect on speed there.

I'm hosting my code here, in case someone is interested. This is the solver's source file and this is the command line program I'm using for benchmarks.

Here are some updated timings (all times in seconds):
Code: Select all
                10x_hard375.txt  500x_hard20.txt  10x_top1465.txt  sudoku17.txt  top50k.txt
jczsolve                  0.834            1.625            0.286         0.201       0.341
rusty jczsolve            0.684            1.350            0.252         0.177       0.286

ratio                      1.22              1.2             1.13          1.14        1.19
Last edited by emerentius_ on Wed Jun 06, 2018 12:54 pm, edited 1 time in total.
emerentius_
 
Posts: 23
Joined: 09 January 2018

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby champagne » Wed Jun 06, 2018 6:42 am

emerentius_ wrote:I've changed the heuristic in GuessFirstCell to always check the first cell for each of the 3 bands and to choose the cell with the minimum number of possibilities (rather than stop at the first existing unsolved cell). This has no effect on easy sudokus where GuessFirstCell is never called but speeds up very hard sudokus on average by over 10%. I've checked it with your collection of potential hardest from 2017-12. The guess function is also sometimes called in sudoku17 but has no meaningful effect on speed there.

I'm hosting my code here, in case someone is interested. This is the solver's source file and this is the command line program I'm using for benchmarks.


Not a very common case, but one could also extract the cell with the smallest number of candidates using the parallel process as in the search of singles in cells, bi-values in cell ... with a good chance to speed up the process. (4 candidates in cell should be always possible, so it is 3 or 4 candidates)

I tested also a morphing of the puzzle, but with nothing exciting
champagne
2017 Supporter
 
Posts: 7453
Joined: 02 August 2007
Location: France Brittany

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby emerentius_ » Sun Jun 24, 2018 4:13 am

I went over the entire solver to clean up the code and understand it better. I've improved the variable names, simplified the code and added a lot of comments. I could get rid of the manual loop unrolling in ApplySinglesOrEmpty and UPWCL. I've deleted TblRowUniq and I'm using TblShrinkMask in its stead. Several more LUTs turned out to be not really helping earlier.
In total, the solver's file is now at 660 lines of code (a good deal of that just boiler plate) as opposed to jczsolve's 950 (or 800 without the code commented out via #if 0). I never figured out what UPDN or UPWCL was supposed to stand for. :lol:
I think it's much easier to understand now.

As I wrote earlier, I've removed the innermost branches in Update. I've also made some changes to how I unset the bits in unsolvedRows after UPWCL where I've used xor instead of and-masking. At first I didn't get why it worked. Now I understand, what I've been doing was complete bogus but also harmless because, worst case, the solver just falls back to guessing. That indicated however, that the unsolvedRows checks were not that important at all anymore with the innermost if-checks already removed. I've removed any use of unsolvedRows which actually sped up the solver a bit more. All the changes during the cleanup together improved performance by another 4-5%.
I still have to test how that affected solving speed on impossible sudokus. I don't have a data set handy.

In the process of removing unsolvedRows, I've run into a very weird performance issue. This if-check is always true because I never delete any bits from unsolvedRows but the optimizer can't recognize that and therefore doesn't remove the branch. If I remove the if-branch myself or replace it with something predictable, I lose 5%(!) performance. I have no idea why that is. Edit: Some profiling leads me to believe that the if check that the optimizer can't see through leads it to be more conservative with optimizations that would increase the amount of code. Without the branch, the amount of instructions executed rises by 15%. Instructions / cycle goes up as well but not enough. Branch misses stay roughly the same.

Code still available at the Github repo
Last edited by emerentius_ on Thu Jun 28, 2018 10:24 am, edited 1 time in total.
emerentius_
 
Posts: 23
Joined: 09 January 2018

Re: 3.77us Solver(2.8G CPU, TestCase:17Sodoku)

Postby dobrichev » Sun Jun 24, 2018 7:53 pm

emerentius_ wrote:I never figured out what UPDN or UPWCL was supposed to stand for. :lol:
...
Code still available at the Github repo


And that did not stop you from licensing the code without even mentioning the original author and other contributors?
dobrichev
2016 Supporter
 
Posts: 1862
Joined: 24 May 2010

PreviousNext

Return to Software