The New Sudoku Players' Forum

by **daj95376** » Tue Nov 30, 2010 6:03 pm

champagne wrote:I hope to stay with an improvement in performance somewhere between 1 to 10 and 1 to 100. The key point remains to see if ratings will match.

How about a specific performance rating (against SE) on puzzles with a 6.5 rating or lower?

Does your use of "parallel" mean the same thing as performing numerous equally-rated steps concurrently?

by **champagne** » Tue Nov 30, 2010 7:03 pm

daj95376 wrote:
champagne wrote:I hope to stay with an improvement in performance somewhere between 1 to 10 and 1 to 100. The key point remains to see if ratings will match.

How about a specific performance rating (against SE) on puzzles with a 6.5 rating or lower?

Does your use of "parallel" mean the same thing as performing numerous equally-rated steps concurrently?

I did it and as far as i can remember, it was about one second to one minute, but I will rerun the benchmark to-morrow and I'll come back with fresh data carefully checked.

I intend to make a second benchmark in the area 6.6 7.0 as soon as I have a reasonable draft.

Now, what means "parallel".

Applying the tagging process, you expand the weak links in all directions At the end, you find the tags included in "nice loops" or the invalid tags.

If i say it differently in the situation of SE, this equivalent to looking in once (in parallel) for any candidate.

Expansion of weak links is very fast. (in fact, a - b is transformed in a->~b and b->~a ( using the taggging rules a -> B and b ->A) )

It means also that you get in once chains of any legnth.

Reversely, for SE, I have to check separatly the different kinds of chains X cycle, Y cycle, X chain ..... to find the lowest rating.

As I wrote above, I intend to skip that as soon as I get a rating below or equal to the current highest ER .

champagne

by **PIsaacson** » Wed Dec 01, 2010 2:36 am

Daj,

Here's a quick benchmark using Champagne's sort list of puzzles from the Patterns Game posted results. There were 4665 puzzles with ratings of 6.5 and lower from his collection. My prototype code took less than 3 seconds and clocked over 1600 puzzles/second on an Intel core i7 860 @ 2.8 GHz. Using LK's ver 5 modified SE code and the x64 Sun Java VM, serate took 97.5 seconds and clocked about 48 puzzles/second so the C++ code is over 30x faster for these level puzzles using 64 bit code. Using the 32 bit JVM takes 3 minutes and 16 seconds which is 2x slower than than x64 Java and results in only 24 puzzles/second.

Code: Select all: $ r 691 sudoku -b -f '%g %n %.2r %.2d %.2p %.2e' < pat65.in > c:/temp/log1 Alpha (x86_64-w64-mingw32-g++) Sudoku Explainer C++ engine ver 1.0 C:\msys\home\Paul\se_cpp\sudoku.exe -b -f %g %n %.2r %.2d %.2p %.2e 4665 out of 4665 solved leaving 0 unsolved using 31 queues size 34x244 depth 4/32 avg givens 23.60 avg scores 4.10 total cpu time = 2882.8889 milliseconds solving rate = 1618.1685 puzzles/second 0.6180 msecs/puzzle full_house updates/calls 92984/308774 30.11% eff 46.0991 msec tot 0.0001 msec/call 0.0005 msec/update hidden_single_box updates/calls 142437/215790 66.01% eff 54.1547 msec tot 0.0003 msec/call 0.0004 msec/update hidden_single updates/calls 20486/73353 27.93% eff 20.1755 msec tot 0.0003 msec/call 0.0010 msec/update direct_pointing updates/calls 683/52867 1.29% eff 70.9229 msec tot 0.0013 msec/call 0.1038 msec/update direct_claiming updates/calls 0/52184 0.00% eff 82.7685 msec tot 0.0016 msec/call 1.#INF msec/update direct_hidden_pair updates/calls 7768/52184 14.89% eff 301.9797 msec tot 0.0058 msec/call 0.0389 msec/update naked_single updates/calls 2655/44557 5.96% eff 20.2980 msec tot 0.0005 msec/call 0.0076 msec/update direct_hidden_triple updates/calls 752/41902 1.79% eff 248.4021 msec tot 0.0059 msec/call 0.3303 msec/update locked_cands_t1 updates/calls 18693/41155 45.42% eff 29.9537 msec tot 0.0007 msec/call 0.0016 msec/update locked_cands_t2 updates/calls 7367/22462 32.80% eff 29.3681 msec tot 0.0013 msec/call 0.0040 msec/update naked_pair updates/calls 12910/15095 85.53% eff 43.9078 msec tot 0.0029 msec/call 0.0034 msec/update x_wing updates/calls 4388/11611 37.79% eff 17.8462 msec tot 0.0015 msec/call 0.0041 msec/update hidden_pair updates/calls 11950/10314 115.86% eff 19.2386 msec tot 0.0019 msec/call 0.0016 msec/update naked_triple updates/calls 3679/7254 50.72% eff 33.9676 msec tot 0.0047 msec/call 0.0092 msec/update swordfish updates/calls 3936/6558 60.02% eff 15.0387 msec tot 0.0023 msec/call 0.0038 msec/update hidden_triple updates/calls 2177/5707 38.15% eff 17.7608 msec tot 0.0031 msec/call 0.0082 msec/update xy_wing updates/calls 2327/5312 43.81% eff 37.8568 msec tot 0.0071 msec/call 0.0163 msec/update xyz_wing updates/calls 605/3741 16.17% eff 11.9451 msec tot 0.0032 msec/call 0.0197 msec/update ur_type_1 updates/calls 786/3159 24.88% eff 8.9829 msec tot 0.0028 msec/call 0.0114 msec/update ur_type_2 updates/calls 133/2766 4.81% eff 0.4826 msec tot 0.0002 msec/call 0.0036 msec/update ur_type_4 updates/calls 648/2633 24.61% eff 0.7829 msec tot 0.0003 msec/call 0.0012 msec/update ur_type_3 updates/calls 640/2309 27.72% eff 3.2982 msec tot 0.0014 msec/call 0.0052 msec/update ur_loop_type_1 updates/calls 292/11871 2.46% eff 167.9249 msec tot 0.0141 msec/call 0.5751 msec/update ur_loop_type_2 updates/calls 168/10911 1.54% eff 699.4343 msec tot 0.0641 msec/call 4.1633 msec/update ur_loop_type_4 updates/calls 72/10467 0.69% eff 179.9959 msec tot 0.0172 msec/call 2.4999 msec/update ur_loop_type_3 updates/calls 125/10147 1.23% eff 305.6810 msec tot 0.0301 msec/call 2.4454 msec/update naked_quad updates/calls 667/1679 39.73% eff 10.8015 msec tot 0.0064 msec/call 0.0162 msec/update jellyfish updates/calls 1107/1579 70.11% eff 4.3273 msec tot 0.0027 msec/call 0.0039 msec/update hidden_quad updates/calls 325/1374 23.65% eff 4.8041 msec tot 0.0035 msec/call 0.0148 msec/update bug_type_1 updates/calls 584/1331 43.88% eff 8.9293 msec tot 0.0067 msec/call 0.0153 msec/update bug_type_2 updates/calls 405/1039 38.98% eff 0.3316 msec tot 0.0003 msec/call 0.0008 msec/update bug_type_4 updates/calls 90/859 10.48% eff 0.1820 msec tot 0.0002 msec/call 0.0020 msec/update bug_type_3 updates/calls 318/814 39.07% eff 0.4666 msec tot 0.0006 msec/call 0.0015 msec/update aligned_pair updates/calls 220/557 39.50% eff 122.6229 msec tot 0.2201 msec/call 0.5574 msec/update x_cycle updates/calls 484/448 108.04% eff 98.6115 msec tot 0.2201 msec/call 0.2037 msec/update y_cycle updates/calls 9/43 20.93% eff 3.8020 msec tot 0.0884 msec/call 0.4224 msec/update xy_cycle updates/calls 5/9 55.56% eff 3.8286 msec tot 0.4254 msec/call 0.7657 msec/update Paul@pgisaacs2 ~/se_cpp $ r java java64.sh pat65.in > c:/temp/log2 real 1m37.551s user 0m0.000s sys 0m0.000s Paul@pgisaacs2 ~/se_cpp $ compare c:/temp/log1 c:/temp/log2 -- my scores vs SE (mine on left SE on right for those I scored higher) 000100020003000400050020003400600010001070600000008000030080090200400500006000000 4504 6.60 6.50 001008003040000050600050100000803009009000200800607000006040001050000080400200900 4510 6.60 6.50 001000200020003040300050006000200060002070100080006000600090004010800030007000800 4516 6.60 6.50 100002030004500100060030005070000050002080003900000400020004600600900001007060080 4548 6.60 6.50 000001020003200100040050006070000010005060003300000400060008200700400008009070050 4555 6.60 6.50 001023004050600002700000000000800090030070050040002000000000006500006010600910700 4611 6.60 6.50 Run time 1) 2.75 2) 0.08 Paul@pgisaacs2 ~/se_cpp $ compare c:/temp/log2 c:/temp/log1 -- my scores vs SE (mine on right SE on left for those SE scored higher) 000706000020030010100000005006000700050802030300000008900000004010040020008000300 4187 7.10 6.50 001002003020010400300900070006000004010090030800000900030004007004050090600700200 4258 6.00 5.60 Run time 1) 0.08 2) 2.75 Paul@pgisaacs2 ~/se_cpp $ java32.sh pat65.in > c:/temp/log4 real 3m15.939s user 0m0.015s sys 0m0.015s $

So while the C++ code is significantly faster, the bad news is that there are 6 puzzles which my code scores 0.1 points higher and 2 which score significantly lower than SE. Looking at the detail logs shows that the problems stem from how I find chains (by size) rather than by score. SE runs a technique, accumulates the potential eliminations/assignments, sorts the collection, and then selects the lowest scoring one. My code sets the chain length, finds the first qualifying chain, calculates the score and then processes it. Since the length is controlled by a for loop that starts from some specified minimum and expands the allowed length to some specified maximum and since all the chaining code uses BFS, this should very closely emulate what SE does. And that's the rub... It does closely emulate SE, but not exactly. This deviation becomes worse with the more complex chains since I use really different alogorithms which generate AICs/NLs/nrczts instead of SE's predominant use of forcing chains of various natures.

The long and short is that in order to duplicate SE's scores, I'll have to duplicate SE's algorithms in a rigourous manner. Which means adopting a very limited and somewhat arcane library of solving techniques, running each technique to completion, scoring all the potentials, then only selecting the lowest scoring ones all the while making sure the code is replete with the various quirks and oddities that SE possesses. Well, maybe it's not that bad, but I'm having a difficult time working up the necessary enthusiasm.

Cheers,
Paul

P.S. Although heavily modified from the code e-mailed to Champagne and LKSudoku, the above code contains LK's bug fixes and suggested changes merged in as best I could considering the code base deltas.

by **daj95376** » Wed Dec 01, 2010 3:57 am

Thanks Paul for the detailed comparison based on your solver!!! Thanks also for the overview on the software differences betweeen your solver and SE.

Paul wrote:SE runs a technique, accumulates the potential eliminations/assignments, sorts the collection, and then selects the lowest scoring one.

I've asked L.K. to investigate the possibility of having a version of 1.2.5.0 serate that concurrently performs all steps it has (at any given time) with the lowest rating. It's something he has on the back burner, but I'm patient and the results may prove to be interesting.

Regards, Danny

by **PIsaacson** » Wed Dec 01, 2010 5:47 am

daj95376 wrote:Thanks Paul for the detailed comparison based on your solver!!!

Danny,

I deliberately moved the emphasis to remind myself of exactly that point. My code is based on a solver instead of a scorer, which pretty much sums up my fundamentally mistaken assumption that a scorer is just a solving machine which tracks the "hardest" move.

Cheers,
Paul

by **champagne** » Wed Dec 01, 2010 8:50 am

I run the Benchmark on my computer

P8600 dual core (32 bits) 2.4 GHz
I use I guess the same file as Paul

Code: Select all: SE 277 seconds SE clone 8.5 seconds

so the ratio is the same as in Paul's test, about 1/30

some remarks;

This is done with a new lay-out, normally fitting with SE specifications
Are still missing some Bugs patterns
I have to redo all tests to finally debug that part of the program.

This will not change the order of magnitude and, moreover, the ratio for low ratings is not so important.

I'll revise these figure in due time

champagne

by **BryanL** » Wed Dec 01, 2010 11:24 am

Surely it is too much trouble to emulate SE scoring exactly.

Why not agree to a rating methodology based on solving technique - probably similar to SE but with the few noted inconsistencies ironed out.

Once that is agreed upon it takes away so much of the time and effort Paul, lk and Champagne are putting in, allowing them and whoever to move ahead.

I see no issue with a new rating method that differs from SE. Once agreed, g.r.emlin could post (append - leaving original results in place) new ratings for all past games. And after a trial period where both sets of scores could be posted, switch to the new method.

So could we have some consensus on the idea of NOT emulating SE? A poll perhaps?

Bryan

by **lksudoku** » Wed Dec 01, 2010 11:33 am

PIsaacson wrote:There were 4665 puzzles with ratings of 6.5 and lower from his collection.

I only modified Paul's code to provide the same rating for puzzles with rating lower than 6.5, puzzles with rating 6.5 are not supposed to match and there is no point in comparing them

The reason they do not match is both due to the first found return result, and a different rating calculation

Therefore the only difference (below 6.5) is in the puzzle

001002003020010400300900070006000004010090030800000900030004007004050090600700200 4258 6.00 5.60

Which is the puzzle I mentioned in which there are 3 possible UR type 1, and choosing two of them yields 6.00 rating while choosing the third one yields a 5.60 rating

This rating difference cannot be avoided because it depends on the choice of a UR out of a list of more than one, when choosing one destroys the other (and all choices are correct for the specific step as they have the same rating)

PIsaacson wrote:So while the C++ code is significantly faster, the bad news is that there are 6 puzzles which my code scores 0.1 points higher and 2 which score significantly lower than SE.

Wrong, until rating 6.5 (not containing 6.5) the score is equal (up to one UR destroys another UR case)

For rating at 6.5 or above, code modifications are required, but as I mentioned before, as long as no executable or source code is published, I do not know if there is any point in performing these modifications

PIsaacson wrote: Looking at the detail logs shows that the problems stem from how I find chains (by size) rather than by score. SE runs a technique, accumulates the potential eliminations/assignments, sorts the collection, and then selects the lowest scoring one. My code sets the chain length, finds the first qualifying chain, calculates the score and then processes it. Since the length is controlled by a for loop that starts from some specified minimum and expands the allowed length to some specified maximum and since all the chaining code uses BFS, this should very closely emulate what SE does. And that's the rub... It does closely emulate SE, but not exactly.

This is one of the problems, another problem is that your code simply computes wrong ratings for the same chains/loops (compared to SE)

PIsaacson wrote:The long and short is that in order to duplicate SE's scores, I'll have to duplicate SE's algorithms in a rigourous manner.

not necessarily, I think that it may be that small modifications can make it match, at least for x-chains (did not check y/xy-chains), anyway, if you will calculate the score for the same chain/loop differently than SE, it does not matter how you find it, the rating will not match

As I mentioned before, if executable or sources will be published I may try to match the x-chains/loops

by **champagne** » Wed Dec 01, 2010 1:12 pm

lksudoku wrote:As I mentioned before, if executable or sources will be published I may try to match the x-chains/loops

whatever will be the status, I promiss to publish the draft "as it will be" till the area 6.5 7.0 within one week.

Meantime, i'll clean the draft, make some debugging and prepare comments.

champagne

by **daj95376** » Wed Dec 01, 2010 1:49 pm

champagne wrote:I run the Benchmark on my computer

P8600 dual core (32 bits) 2.4 GHz
I use I guess the same file as Paul

Code: Select all
SE 277 seconds SE clone 8.5 seconds

so the ratio is the same as in Paul's test, about 1/30

Thanks Champagne for posting your comparison results.

Obviously, Paul's and Champagne's efforts to produce an SE-clone is worthwhile through ratings <6.5 .

BryanL: Although I agree with you, I don't think you'll get a consensus to scrap the SE-clone effort. The objective now seems to be locked into simply making a faster version that produces identical results. I think Paul is starting to realize that the effort to redesign his software to produce identical results is approaching unrealistic. I'm still hoping that LK will find the time to tackle a version of serate that'll perform identically rated steps concurrently.

Regards, Danny

by **champagne** » Wed Dec 01, 2010 3:08 pm

I'll try to comment on Bryanl and Danny's remarks.

One very positive point with SE is that it gives a good game.

Another one is that it finally gives for a pattern a good idea of a spread of possible puzzles.

To-day, the sluggyhness of SE is a true drawback.

I have heavy doubts that we can emulate 100% of SE (a supposed bug free version) with a significant improvement in speed.

Reversely, I am convinced that it will be possible to approach SE specifications with a performance ratio growing when the difficulty grows. As we start with a 1 to 30 ratio, we can show somehow some optimism.

On the other side, SE brings me ideas to slightly improve my own solver (at least I have the feeling that..);

So at the end, I like the challenge to set up a performing clone.

Also, I believe that if the results are not to far from SE's one, This could be accepted.

Champagne

PS in case the result woul not be accepted, I'll have still the possibility to use the clone in pre-rating and filtering

by **lksudoku** » Wed Dec 01, 2010 3:18 pm

champagne wrote:
lksudoku wrote:As I mentioned before, if executable or sources will be published I may try to match the x-chains/loops

whatever will be the status, I promiss to publish the draft "as it will be" till the area 6.5 7.0 within one week.

Meantime, i'll clean the draft, make some debugging and prepare comments.

champagne

I was hoping for Paul's version also to be published since

I already modified it for <6.5 rating
I have some knowledge of its structure

But if your version is the only one published I may switch to it

Does your version match SE for ratings <6.5? if not additional work will be required for modifying your version to match these ratings first

It is probably better to decide on one version as base and continue from it

by **champagne** » Wed Dec 01, 2010 4:16 pm

lksudoku wrote:I was hoping for Paul's version also to be published since
I already modified it for <6.5 rating
I have some knowledge of its structure
But if your version is the only one published I may switch to it

Does your version match SE for ratings <6.5? if not additional work will be required for modifying your version to match these ratings first

It is probably better to decide on one version as base and continue from it

I understand that Paul has not really redesigned it's solver to follow SE specifications.
I did it in a free way, but I faced problems and I made several trials (so I have also to clean the new code).

If Paul has a draft you can use, I agree that it would be better to continue with that one.

Normally the draft I'll produce will not give higher rating than SE. But it could be that it shows what I consider as bugs. I have in mind, for example the "non detection" of some hidden pairs in the UR process.

In the chain area, I am not to far from SE computation of length, but I have still some checking work due to the fact that pieces of the chains are hidden within the tagging.

I am now trying to cover the BUGs patterns I did not code to cover all the puzzles below 6.5

champagne

I

by **lksudoku** » Wed Dec 01, 2010 4:50 pm

champagne wrote:I understand that Paul has not really redesigned it's solver to follow SE specifications.

I modified Paul's code to follow SE specification until rating <6.5, and as the above analysis shows, it matches SE version 1.2.5.0 exactly until rating <6.5 (besides cases where one UR destroys another UR and both have euqal ratings)

champagne wrote:If Paul has a draft you can use, I agree that it would be better to continue with that one.

Yes, but if the draft is internal, than why bother

champagne wrote:Normally the draft I'll produce will not give higher rating than SE. But it could be that it shows what I consider as bugs. I have in mind, for example the "non detection" of some hidden pairs in the UR process.

When comparing to SE version 1.2.5.0, all these bugs are fixed, my method involves fixing both SE and the clone, so that if SE has bugs, I create another SE version (such as version 1.2.5.0) if the clone has bugs, I fix the clone code, for now the clone matches version 1.2.5.0, if you can show SE bugs in that version I can try to fix them too, but since there is a clone match (<6.5) I assume it is not likely that there are more bugs for these ratings

champagne wrote:I am now trying to cover the BUGs patterns I did not code to cover all the puzzles below 6.5

Paul's code already covers all SE BUGs patterns

L.K.

by **champagne** » Wed Dec 01, 2010 5:52 pm

lksudoku wrote:1) I modified Paul's code to follow SE specification until rating <6.5, and as the above analysis shows, it matches SE version 1.2.5.0 exactly until rating <6.5 (besides cases where one UR destroys another UR and both have euqal ratings)

2) Yes, but if the draft is internal, than why bother

3)When comparing to SE version 1.2.5.0, all these bugs are fixed, my method involves fixing both SE and the clone, so that if SE has bugs, I create another SE version (such as version 1.2.5.0) if the clone has bugs, I fix the clone code, for now the clone matches version 1.2.5.0, if you can show SE bugs in that version I can try to fix them too, but since there is a clone match (<6.5) I assume it is not likely that there are more bugs for these ratings

4) Paul's code already covers all SE BUGs patterns

L.K.

items 1) and 3) I got it. To know if I found other bugs (either in SE or in my draft), I would like to be sure to work with the last version of your SE fixed program. My last version is called FFIXED3

item 2) I don't catch your point. For sure, I assume the draft (which one it is) has been published.

item 4) I know also, I only think it will be faster for me to finish the draft.
I have another reason to do so. I want somehow to have, at the end, a better idea of what SE is doing.
Drafting a solution and studying what is not solved is an alternative to digging in the code.
(and at the end, I intend to bring back that code in my solver)

champagne

The New Sudoku Players' Forum

Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program