The New Sudoku Players' Forum

by **dobrichev** » Fri Dec 10, 2010 4:34 am

The internal representation of the rating during the calculation of a separate technique doesn't matter.
Currently the rating consists of 3 values in format xx.x, but soon some stars, exclamation marks, or other flags may appear. Increasing precision makes integers the better candidate.
The parameterized part of the rating should not necessarily use a built-in interpreter. It could be fixed in code, leaving only some coefficients in the configuration file. These coefficients will change during initial acceptance, and after new technique is implemented. I am expecting more troubles in printing the solution path, where most users would like to customize something to fit their jargon.

by **PIsaacson** » Fri Dec 10, 2010 6:31 am

Ron,

If you set an arbitrarily long length and executed the SE getLengthDifficulty, the while loop ceil variable will generate the steps sequence (4, 6, 8, 12, 16...). It's an algorithm that approximates the fn(X) = (0.2 * log2 (X) - 0.4) equation. I'm assuming that if I offered the ability to add code in place of my proposed function parser with equations, it would be even more flexible, but orders of magnitude harder to implement.

Danny,

There's no reason to be concerned with the time to print an integer vs a double or even the time to calculate a score using integers vs doubles. I'll stand by my declaration that the most significant optimizations (to my current code) involve algorithmic changes.

Dobrichev,

Everyone keeps thinking within the SE box and, for some reason that I have yet to fathom, fearing the use of floating point doubles for something as mathematically simple as a sudoku scoring system. 16 digits of significance is plenty for this. Double arithmetic is plenty fast for this. Rounding and comparison algorithms are plentiful if even needed for this. I have a default calculation for each method based on the sudoku.ini configuration file base score plus code that approximates the SE length calculations. That's already in place and uses type float, so even 7 digits of significance suffices. All I intend is to provide a means to overload these default SE scoring algorithms with a user specified equation PLUS change it to use doubles with 16 significant digits of precision.

Your comments on output are intriguing to me. I already have functions that parse a format-like string similar to the serate --format param, but with additional formatting options such that you could specify something like %8.1r which converts to %8.1f or %81.81g which converts to %81.81s. There are several standard parts to my step display: step no, score, affected cell (RnCn), assignment or elimination symbol "<=" and "<>", digit(s) assigned/removed, string defining the pattern/chain. If you merely want to re-arrange the order with possible decoration, that would be relatively easy using something similar to the --format parsing. But, if the intent is to override the string defining the pattern/chain, that's something relatively difficult. I have a function somewhere that takes my internal chain stack and prints it either in NL, Eureka, AIC or nrczt chain formats. If that's all you need, then it would be easy enough to add that function and a runtime param to specify the preferred output formatting to use. If you're talking about something else, then I need to understand better what you mean by "most users would like to customize something to fit their jargon."

Cheers,
Paul

by **champagne** » Fri Dec 10, 2010 7:51 am

PIsaacson wrote:
Code: Select all
double get_score (double x_in) { return (0.2 * log2 (x_in) - 0.4); } int main () { double steps[] = {4,6,8,12,16,24,32,48,64,96,128,192,256,384,512,768,1024,1536,2048,3072,4096,6144,8192}; for (int n = 0; n <= 22; n++) { double score = get_score (steps[n]); printf ("%4d %6.0f %6.1f\n", n, steps[n], score); } }

Cheers,
Paul

Hi Paul,

I undestand that you try to have a wider scope than just emulating SE.

I am just doing the contrary.( I would not use that design for a rating program).

As a matter of fact, I am doing, unles I am wrong, exactly what SE is doing.

For any even number, the formula gives a relative "+0.2"
For any odd number, you get the value in between.

If I am right,

SE look in the table using (length-2) and stops as soon as it finds step[n]>= (length-2)
then makes a scoring adjustment 0.1* index

champagne

by **PIsaacson** » Fri Dec 10, 2010 9:10 am

Champagne,

Exactly so. Ron listed the updated procedure

Code: Select all: protected double getLengthDifficulty() { double added = 0.0; int ceil = 4; int length = getComplexity() - 2; boolean isOdd = false; while (length > ceil) { added += 0.1; if (!isOdd) ceil = (ceil * 3) / 2; else ceil = (ceil * 4) / 3; isOdd = !isOdd; } /* final int[] steps = new int[] {4, 6, 8, 12, 16, 24, 32, 48, 64, 96, 128, 192, 256, 384, 512, 768, 1024, 1536, 2048, 3072, 4096, 6144, 8192}; int length = getComplexity() - 2; double added = 0; int index = 0; while (index < steps.length && length > steps[index]) { added += 0.1; index++; } */ return added; }

The equivalent using the steps array has been commented out, but it produces identical results. Again, both are algorithms in that they require looping and conditionals in order to approximate the exact formula (0.2 * log2 (length) - 0.4). One of my objections is that it lumps different length chains into the same scoring range. Chains in the 16-24 range get the same increment. Plus why subtract 2 from the overall length in the first place??? Or is complexity not the same as length??? SE baffles me most of the time.

Using the equation, the length exactly computes a unique adjustment - using a modified score.cpp as follows:

Code: Select all: $ cat score2.cpp #include <stdio.h> #include <math.h> double get_score (double x_in) { return (0.288539008177792683 * log (x_in) - 0.4); // the first constant is a pre-computed 0.2/ln(2) } int main () { for (int n = 4; n <= 64; n += 2) { double score = get_score ((double) n); printf ("%4d %6.1f %6.4f\n", n, score, score); } } Paul@pgisaacs2 ~/teststl $ score2 4 0.0 0.0000 6 0.1 0.1170 8 0.2 0.2000 10 0.3 0.2644 12 0.3 0.3170 14 0.4 0.3615 // The SE algorithm leaves this as 0.3 16 0.4 0.4000 18 0.4 0.4340 20 0.5 0.4644 // The SE algorithm leaves this as 0.4 22 0.5 0.4919 // The SE algorithm leaves this as 0.4 24 0.5 0.5170 26 0.5 0.5401 28 0.6 0.5615 // The SE algorithm leaves this as 0.5 30 0.6 0.5814 // The SE algorithm leaves this as 0.5 32 0.6 0.6000 34 0.6 0.6175 36 0.6 0.6340 38 0.6 0.6496 40 0.7 0.6644 // The SE algorithm leaves this as 0.6 42 0.7 0.6785 // The SE algorithm leaves this as 0.6 44 0.7 0.6919 // The SE algorithm leaves this as 0.6 46 0.7 0.7047 // The SE algorithm leaves this as 0.6 48 0.7 0.7170 50 0.7 0.7288 52 0.7 0.7401 54 0.8 0.7510 // The SE algorithm leaves this as 0.7 56 0.8 0.7615 // The SE algorithm leaves this as 0.7 58 0.8 0.7716 // The SE algorithm leaves this as 0.7 60 0.8 0.7814 // The SE algorithm leaves this as 0.7 62 0.8 0.7908 // The SE algorithm leaves this as 0.7 64 0.8 0.8000 Paul@pgisaacs2 ~/teststl

The equation produces finer scoring granularity than the algorithm and probably runs faster since Pentiums have the FYL2X opcode. Speed is not critical here, so accuracy should be the main criteria and clearly the equation produces a smoother curve. Plus the SE increment points (0.1 transitions) are at the precise locations they should be at instead of those provided by the steps series algorithms.

Okay, so this is all hypothetical because clearly everyone wants to maintain the SE scoring, which is currently done using floating point arithmetic. You can't have it both ways. Either you want to maintain the identical scoring techniques of SE, including floating point behaviour which seems more of a presentation problem than a computation problem, or the sky's the limit. I vote for the sky... But with floating point doubles.

Cheers,
Paul

by **BryanL** » Fri Dec 10, 2010 12:19 pm

Hi,

the above comments on floats vs ints intrigues me. Delphi / Pascal being my language of choice, and knowing just enough c to get by, I am still sure that what one would do in Pascal would be just as one would do in c. Given the desire to use equations like Paul does, I would somewhere make a choice to either up the precision of ints and divide and round at print time, or, do the smart thing and use doubles throughout and round accordingly at print time. I have been in this situation with the desire to use ints before, and sticking with them has only come around and bit me and I end up being forced to change to floats.
It is a fraction unfortunate - pun intended - that SE has the steps in the chain scoring but that is unavoidable with xx.x level of precision. Smoothing it out a bit sounds like a good idea and would probably help at some point where comparisons need to be made - 0.105 says this rates higher than that, even if in the end it shown as 0.1. And there are many ways around comparing two supposedly equal floats for equality.
Sure in tight loops ints may be preferred but as Paul says, much higher gains will be realised here from higher algorithm changes compared to a probable less than 0.1% gain in one rating technique. And if a given loop / function has to be called a large number of times, there is nothing wrong with doing a once before and once after conversion to ints and back outside the call.

Saying all the above, I still stupidly agonise over trivial bits of code and then forget where I was in the bigger picture...

by **dobrichev** » Fri Dec 10, 2010 12:53 pm

PIsaacson wrote:Everyone keeps thinking within the SE box and, for some reason that I have yet to fathom, fearing the use of floating point doubles for something as mathematically simple as a sudoku scoring system. 16 digits of significance is plenty for this. Double arithmetic is plenty fast for this. Rounding and comparison algorithms are plentiful if even needed for this. I have a default calculation for each method based on the sudoku.ini configuration file base score plus code that approximates the SE length calculations. That's already in place and uses type float, so even 7 digits of significance suffices. All I intend is to provide a means to overload these default SE scoring algorithms with a user specified equation PLUS change it to use doubles with 16 significant digits of precision.

Your comments on output are intriguing to me. I already have functions that parse a format-like string similar to the serate --format param, but with additional formatting options such that you could specify something like %8.1r which converts to %8.1f or %81.81g which converts to %81.81s. There are several standard parts to my step display: step no, score, affected cell (RnCn), assignment or elimination symbol "<=" and "<>", digit(s) assigned/removed, string defining the pattern/chain. If you merely want to re-arrange the order with possible decoration, that would be relatively easy using something similar to the --format parsing. But, if the intent is to override the string defining the pattern/chain, that's something relatively difficult. I have a function somewhere that takes my internal chain stack and prints it either in NL, Eureka, AIC or nrczt chain formats. If that's all you need, then it would be easy enough to add that function and a runtime param to specify the preferred output formatting to use. If you're talking about something else, then I need to understand better what you mean by "most users would like to customize something to fit their jargon."
Paul

Fix rating to a double. Some may change it later if necessary. It is not significant at this stage.

What we want the puzzles to be sorted by? If the hardest move, represented as scalar, is insufficient, we need a vector. Implement a structure keeping the values of this vector in some convenient to you form. Implement operator "<". Later it can be changed. Let framework use this and only this operator for sorting the puzzles by difficulty. Keep in mind this is applicable to rating the moves within a solution path too - store "native" parameters of the specific move, and later remap to a scalar by tabular function instead by interpolating.

Predefine several levels of detailed output for one-line-per-move. Example is step number, method applied, difficulty rating, number of candidate eliminations, candidates left, list of eliminated candidates, list of solved cells. At runtime you could simply skip some of the fields according to the given level. Don't do it if you find difficulties, it is not much significant.

Implement internal representation of a "chain", and your favorite method for printing it. Others could implement their methods for printing, and call them at runtime just like you said.

Implement your favorite method for printing the whole information for a technique applied at a single move. Left alternative representations for others.

Implement simple queries like
- rate and print one-line-per-move info using details level ...;
- print steps from s1 to s2 using details level ...;
- rate puzzle list and sort it by rating, printing one line per puzzle.

Such technique allows by several executions of the tool to obtain the information you need for a specific step sequence, not scanning megabytes of heterogeneous dump. No interpreter needed.

Some will be interested in implementation of pure computational techniques like systematic guessing, eliminations by template, etc. Leave room such methods to have their place in the general rating mechanism - by additional coding and recompiling, not at runtime.

My final suggestion is to read the forum, but ignore the stuff you don't like to deal with, including mine posts. The community needs of far not perfect, but alive tool for rating.

Cheers,
MD

by **gsf** » Fri Dec 10, 2010 1:27 pm

agreed that this is possibly too nitpicky at this stage
but some decisions affect the api seen by all other solving techniques and library users of the rating program
by library users I mean users that may use the rating code as a library instead of a standalone program

its ok to talk about overloading operators to handle rounding errors
but then that may force library users to use C++ or write special functions in their own language to do the same
re-implementing those rounding/tweaking functions is where differences will creep in

I do think that having an api compatible with C will provide the most leverage
the internals can be C++, but the C api will make it easier for other language implementations to use the rater as a library

so I stand by using ints to store ratings
if a particular solving technique can use floating point to its advantage for score computation
then it can do so inside its own code and take care of rounding before exposing the rating

by **daj95376** » Fri Dec 10, 2010 3:52 pm

[Withdrawn]

by **lksudoku** » Fri Dec 10, 2010 7:47 pm

BryanL wrote:Thanks Pat,

but what a can of worms!

The hidden pair example, where the hidden pair is in both a column and a box, fails to recognise the resultant single in the box as a direct hidden pair. If the pair was in the column only (in two boxes) I could understand.

But, you go on to show that for direct locked candidates (i hate the pointing and claiming terms), the resultant single has to be in the box. Which makes me wonder if there could ever be a direct locked candidate type 2???

Bryan

I do not see any bug in direct methods,

In the SE definition of direct hidden pair, the hidden single must be in the same house of the hidden pair (in which the hidden pair was found)

For claiming/pointing, if there is a locked set based on 3 cells in one box aligned in one column inside that box, the hidden single must lie in one of the boxes inside the chute containing the box, in one of the other boxes above/below it
For 3 cells in a row inside one box, again the hidden single must be in the same chute in the box to the left/right of the locked set box

if the locked set is in a column with 3 cells in the column in the same box, the hidden single must lie in one of the columns crossing that box
if the locked set is in a row with 3 cells in the row in the same box, the hidden single must lie in one of the rows crossing that box

That is the code implementation, and it is not necessarily a bug, just a definition

If you can find an example to contradict this definition, post it and I will see what part of the code definition was not correct

by **PIsaacson** » Fri Dec 10, 2010 8:35 pm

Wow! I'm actually getting excited by all the activity and interest here. So, here's my proposal:

After studying the SourceForge project manifesto, I'm starting to buy into the concept of "release early and update often". I have a bunch of mass changes yet to apply, so I'm reluctant to do this and that's going to be the case for some time yet. But if enough of you would like to have hands on to explore the code as it now exists without going though my "send me a PM and I'll e-mail it back" gate, then I'll release early. If there's no interest, then I'll continue slogging away (mostly) solo. The advantage of early release is that once it's posted, this group will control its future content and direction.

If there's a better open-source project management site than SourceForge.net, please let me know.

Glenn,

Converting the calling convention from C++ methods to C by passing the current grid as a param is one of those massive change items that I have on my to-do list, but I'm torn here. If I commit to this, then releasing early will be postponed until I can complete this task and it's immense. How important is this in your opinion???

Cheers,
Paul

by **lksudoku** » Fri Dec 10, 2010 10:35 pm

Regarding the fixed point versus floating point rating representation, in terms of printed values they are both equal since the float (or double) can hold number of many digits (64 bits for double), and when truncated to printed value, it is accurate enough to round to the correct value (at least for c/c++)

When comparisons are made there can be odd effects such as (1.0/3.0+1.0/3.0+1.0/3.0) !=1.0
but this only affects comparisons, not printing

In terms of performance effect, the float/string parsing may be longer, but this is negligent compare to the overall runtime since printing is not a common function, it is called only once per puzzle while the algorithms take much longer

Regarding adding new techniques, for SE clone, there will not be much need to add new techniques since most techniques of SE are implemented, the problem is that they are not implemented to generate the same rating as SE (or find the same findings as SE), so the hard work is to modify/reimplement existing techniques

About chain scoring algorithm, I prefer the loop of SE code over the log function, since it has a logic to it and it is not a bug, the time it takes to perform this loop is again negligent since it is done rarely compare to all the other operations, when thinking of the loop, it can probably be calculated equally by performing one log operation, one power operation and a division

I suggest that for rating that is not just a number or table, there will be a possibility to register a user defined rating function per technique so the regular one number rating can be a default function, and for more complex mechanism the user can register a user defined function that gets information from the move being rated

by **BryanL** » Fri Dec 10, 2010 11:49 pm

lksudoku wrote:I do not see any bug in direct methods,

In the SE definition of direct hidden pair, the hidden single must be in the same house of the hidden pair (in which the hidden pair was found)

In the example Pat linked to

Code: Select all: 1 2 4 | 9 6 8 | 5 7 3 3 . 9 | 2 . . | 8 6 4 6 . 8 | 3 . 4 | 1 9 2 -------+-------+------ . 3 . | 7 9 . | . 8 5 . 9 5 | 4 . 6 | 7 3 1 . . . | 5 . 3 | . 2 9 -------+-------+------ . . . | 8 3 . | 2 . 6 . 8 . | 6 4 . | 3 . 7 . 6 3 | 1 . 5 | 9 4 8

there is a hidden pair present in both c1 and b7. After removing the candidates a hidden single is revealed in b7 but SE fails to rate it as direct hidden pair.
Is the pair in c1 or b7? Neither. It is in both and should be rated as same. Especially in light of the fact that SE rates a hidden single in a box lower than a hidden single in a row/col.

For claiming/pointing, if there is a locked set based on 3 cells in one box aligned in one column inside that box, the hidden single must lie in one of the boxes inside the chute containing the box, in one of the other boxes above/below it
For 3 cells in a row inside one box, again the hidden single must be in the same chute in the box to the left/right of the locked set box

if the locked set is in a column with 3 cells in the column in the same box, the hidden single must lie in one of the columns crossing that box
if the locked set is in a row with 3 cells in the row in the same box, the hidden single must lie in one of the rows crossing that box

That is the code implementation, and it is not necessarily a bug, just a definition

If you can find an example to contradict this definition, post it and I will see what part of the code definition was not correct

I thought I had seen an example - also from pat's link - for locked candidates where the resultant single was in the col/row intersecting with the box. I was probably a bit eager to "find a flaw"...

Bryan

by **lksudoku** » Sat Dec 11, 2010 12:25 am

lksudoku wrote:In the SE definition of direct hidden pair, the hidden single must be in the same house of the hidden pair (in which the hidden pair was found)

BryanL wrote:In the example Pat linked to
Code: Select all
1 2 4 | 9 6 8 | 5 7 3 3 . 9 | 2 . . | 8 6 4 6 . 8 | 3 . 4 | 1 9 2 -------+-------+------ . 3 . | 7 9 . | . 8 5 . 9 5 | 4 . 6 | 7 3 1 . . . | 5 . 3 | . 2 9 -------+-------+------ . . . | 8 3 . | 2 . 6 . 8 . | 6 4 . | 3 . 7 . 6 3 | 1 . 5 | 9 4 8

there is a hidden pair present in both c1 and b7.

Are you sure there is a hidden pair in b7?

b7 contains the following candidates:

Code: Select all: 4579 | 1457 | 17 --------------------- 259 | 8 | 12 --------------------- 27 | 6 | 3

Can you specify where is the hidden pair in the block?

There is a hidden pair found in column 1, but because it was found in c1, the hidden single for direct method must be in c1 also, and there is no consequent hidden single in c1

by **BryanL** » Sat Dec 11, 2010 3:56 am

lksudoku wrote:
BryanL wrote:In the example Pat linked to
Code: Select all
1 2 4 | 9 6 8 | 5 7 3 3 . 9 | 2 . . | 8 6 4 6 . 8 | 3 . 4 | 1 9 2 -------+-------+------ . 3 . | 7 9 . | . 8 5 . 9 5 | 4 . 6 | 7 3 1 . . . | 5 . 3 | . 2 9 -------+-------+------ . . . | 8 3 . | 2 . 6 . 8 . | 6 4 . | 3 . 7 . 6 3 | 1 . 5 | 9 4 8

there is a hidden pair present in both c1 and b7.

Are you sure there is a hidden pair in b7?

Can you specify where is the hidden pair in the block?

After removing the 5 in r7c2, the 5,9 hidden pair is also in b7.

However, I should have checked for myself. SE does indeed see the resultant single 4 in r7c2 (b7), and rates it as a direct hidden pair. Maybe there has been a change/fix or two since back in 2007 when the example was posted???

by **lksudoku** » Sat Dec 11, 2010 8:41 am

BryanL wrote:
lksudoku wrote:Can you specify where is the hidden pair in the block?

After removing the 5 in r7c2, the 5,9 hidden pair is also in b7.

However, I should have checked for myself. SE does indeed see the resultant single 4 in r7c2 (b7), and rates it as a direct hidden pair. Maybe there has been a change/fix or two since back in 2007 when the example was posted???

It seems that the last original SE version was created on the 27 dec 2007 (according to files timestamp), so between that time
and 2010 there were no changes, it is more likely that the example post assumed that if a hidden pair is found in a column but exist also in a box, a direct hidden single can be found in the box

The New Sudoku Players' Forum

Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program

Re: Team project: C or C++ Explainer-like rating program