PIsaacson wrote:There were 4665 puzzles with ratings of 6.5 and lower from his collection.
I only modified Paul's code to provide the same rating for puzzles with rating lower than 6.5, puzzles with rating 6.5 are not supposed to match and there is no point in comparing them
The reason they do not match is both due to the first found return result, and a different rating calculation
Therefore the only difference (below 6.5) is in the puzzle
001002003020010400300900070006000004010090030800000900030004007004050090600700200 4258 6.00 5.60
Which is the puzzle I mentioned in which there are 3 possible UR type 1, and choosing two of them yields 6.00 rating while choosing the third one yields a 5.60 rating
This rating difference cannot be avoided because it depends on the choice of a UR out of a list of more than one, when choosing one destroys the other (and all choices are correct for the specific step as they have the same rating)
PIsaacson wrote:So while the C++ code is significantly faster, the bad news is that there are 6 puzzles which my code scores 0.1 points higher and 2 which score significantly lower than SE.
Wrong, until rating 6.5 (not containing 6.5) the score is equal (up to one UR destroys another UR case)
For rating at 6.5 or above, code modifications are required, but as I mentioned before, as long as no executable or source code is published, I do not know if there is any point in performing these modifications
PIsaacson wrote: Looking at the detail logs shows that the problems stem from how I find chains (by size) rather than by score. SE runs a technique, accumulates the potential eliminations/assignments, sorts the collection, and then selects the lowest scoring one. My code sets the chain length, finds the first qualifying chain, calculates the score and then processes it. Since the length is controlled by a for loop that starts from some specified minimum and expands the allowed length to some specified maximum and since all the chaining code uses BFS, this should very closely emulate what SE does. And that's the rub... It does closely emulate SE, but not exactly.
This is one of the problems, another problem is that your code simply computes wrong ratings for the same chains/loops (compared to SE)
PIsaacson wrote:The long and short is that in order to duplicate SE's scores, I'll have to duplicate SE's algorithms in a rigourous manner.
not necessarily, I think that it may be that small modifications can make it match, at least for x-chains (did not check y/xy-chains), anyway, if you will calculate the score for the same chain/loop differently than SE, it does not matter how you find it, the rating will not match
As I mentioned before, if executable or sources will be published I may try to match the x-chains/loops