## Rating rules / Puzzles. Ordering the rules

Advanced methods and approaches for solving Sudoku puzzles
champagne wrote:1)Super candidates are used when other process fail. It is out of the scope of that thread.
2)Super candidates are limited to AALS AC2 for the time being

Super candidates, even limited to AALS, have a size (number of cells). This size is an important parameter, as the number of super candidates increases exponentially with the number of cells. (candidates are super candidates of size 1).
Another parameter is the size (length and/or breadth) of the chains or nets based on these super candidates. Here again, we may have an exponential increase of possibilities when this size increases.
What'd be very surprsising is that your algorithm wouldn't dispaly any increase in computation time and/or memory as a function of these parameters if if was modified to take them into account.

Such problems are in the scope of this thread, even though your algorithm in its present state doesn't allow to obtain the answer.

My idea when I opened this thread was to compare different ratings of the rules and/or puzzles. The question was (and still is): what is the mean complexity of using such rule (or what are the relative mean complexities of several rules)? With a possible corollary: how should we order the rules?
The results above show that the usual ratings are reasonably well correlated - at least for the scale of complexities available in the first 1.000 puzzles in sudogen0 (from L0 to L7).
I agree that this first approach of the problem should be completed with other puzzles, including harder ones. Don't forget this question was raised only a few days ago and we can't have all the answers at once.
I could use the full 10.000 collection to extend the range. Or we could assemble another collection of puzzles with a broader range.
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

denis_berthier wrote:
tarek wrote:...unless T&E is an acceptable incorporation in the method to deal with non-T&E failed ratings.

Same answer as to gsf: how can we take this discontinuity into account?

the same way you take techniques/constraints into account for non-T&E
you define a model and assign weights on how the model reacts to a given input
this has been the gist of my q1 posts from the start

for sudoku T&E you take a lead from the constratint satisfaction literature
split T&E into a polynomial time sub-solver (a set of sudoku constraints, e.g., just singles, or even the logical kitchen sink)
and a backtracking algorithm that applies the sub-solver at each ply
but, different from the pseudo-random selection done by the backtrack suex* raters,
rein in the backtrack algorithm by ordering the next ply candidates by degree
this models the usual backtrack (and human player) optimization of selecting the most constrained candidates first
(e.g., check bivalue/bilocation candidates first, etc.)

the main idea for a q1 like strategy is to quantify the average worst case work that such a backtrack
algorithm would have to do for a given input
suex* does this by averaging a backtrack over several pesudo-random instantiations of the input
q1 does this in one pass by examining all branches of candidates with the same lowest degree

there is much room for experimentation in this model
especially with the function that maps the model stats for a given input to a single number
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Denis wrote

Super candidates, even limited to AALS, have a size (number of cells).

1)First of all, I only consider AALS/AAHS/AC2 which mean two digits valid out of 4.
2)The number of cell is not relevant
3) but each AALS/AAHS/AC2 brings 6 super candidates.

In my sample file, the pic values have been 382 AALS/AAHS/AC2 and 928 super candidates tags. (likely not at the same moment)

Denis wrote
This size is an important parameter, as the number of super candidates increases exponentially with the number of cells. (candidates are super candidates of size 1).

Not exactly, the key factor is the number of unknown digits and the growth factor is Cn,2 (maximum 9). But this was a good reason to try first AALS/AC2.

Denis wrote
Another parameter is the size (length and/or breadth) of the chains or nets based on these super candidates.
Here again, we may have an exponential increase of possibilities when this size increases.

This is not true. As without super candidates and if you erase all redundancies, the number of chains grows quickly at the beginning and reach very soon a ceiling.
Full tagging reinforce that tendancy creating equivalences in chains having a sub chain within the same layer.

Denis wrote
What'd be very surprsising is that your algorithm wouldn't display any increase in computation time and/or memory as a function of these parameters if if was modified to take them into account.

entering level 4 has a significant cost for sure but here we did not enter level 3, so we can not see it. In the processing time for Golden Nugget (about 80 seconds) the main part comes from level 4.

Denis wrote
I could use the full 10.000 collection to extend the range
.

I am afraid it would not help. Don't forget that 98% of these 10000 are not requesting more that linear chains, and 68% no chain at all.
champagne
2017 Supporter

Posts: 6706
Joined: 02 August 2007
Location: France Brittany

gsf wrote:
denis_berthier wrote:
tarek wrote:...unless T&E is an acceptable incorporation in the method to deal with non-T&E failed ratings.

Same answer as to gsf: how can we take this discontinuity into account?

the same way you take techniques/constraints into account for non-T&E
you define a model and assign weights on how the model reacts to a given input
this has been the gist of my q1 posts from the start
....

This doesn't address exactly the problem I was speaking of:
- your model would allow to compare different DFS algorithms, depending on the chosen sub-solvers (i.e. on the rules used to prune the search) - admitting the search optimisations you mentionned are kept fixed;
- on the contrary, my problem is about the intrusion of T&E as a last resort weapon when solving a puzzle resisting all the rules in a given set (e.g. NRCZT).
Notice that, for this 2nd problem, there has to be some a priori classification of T&E above all the rules we consider. But then, we are changing the meaning of the initial hierarchy, because "above" can no longer mean "more complex": one of the results of experimentations I've done is that the fastest solution (in the mean) is obtained for T&E pruned by elementary constraints and singles (i.e. the most "stupid" form of T&E is less computationally complex than almost anything else). I guess you've already made the same observations.

Another problem is that T&E is an algorithm, not a resolution rule (and I've shown that it can't be the implementation of any resolution rule). So T&E introduces a theoretical heterogeneity and this is another reason for expecting it to be associated with a leap in any purely logical classification.

This said, my current interest is not so much in T&E than in comparing the nrczt-rating with other ratings involving chains with subsets.
With Mike, we've shown that the coverage of the two sets of rules is roughly the same. The following questions are therefore natural:
- what are the main parameters for the chains based on subsets? (length of the chains, maximum size of the subsets,..?)
- can the two kinds of rules be joined into a unique classification?
- can we estimate (statistically) how the computation times and the memory requirements change with the main parameters?
- can we estimate this if (as in SudoRules) we add the constraint that no rule classified at some level can fire before any rule at a lower level. A priori exponential increase might then be expected (simply because this constraint implies that all the possibilities at lower levels must be considered even if they lead to no elimination.)
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

As I announed in a previous post, here is the pseudo-code for computing correlation coefficients.

Suppose we have two statistical variables X and Y depending on the puzzle P under consideration (e.g. X = the NRCZT-rating of P and Y = the SER rating of P).
Suppose we have a sequence of n puzzles.
Suppose, for definitness, that the X and Y variables corresponding to this sequence are respectively in x-file and y-file, with one value per line in each file.
The following algorithm (in its standard recursive form) gives the correlation coefficient r between X and Y:

Code: Select all
`(deffunction correlation (?x-file ?y-file ?n)   (open ?x-file "x-file" "r")   (open ?y-file "y-file" "r")      i = 1   EX = 0   EY = 0   EX2 = 0   EY2 = 0   EXY = 0      while i < (n + 1) do      ;; get the i-th element in x-file:      xi = eval (readline "x-file")      ;; get the i-th element in y-file:      yi = eval (readline "y-file")      ;; here I use "eval" to indicate a cast (if necessary) to the proper type (integer, float,...)      EX = EX + (xi - EX) / i      EY = EY + (yi - EY)  / i      EX2 = EX2 + (xi^2 - EX2) / i      EY2 = EY2 +(yi^2 - EY2) / i      EXY = EXY + ((xi * yi) - EXY) / i      i = i + 1   next i      VX = EX2 - EX^2   VY = EY2 - EY^2   CovXY= EXY - EX * EY   r = CovXY / [sqrt (VX) * sqrt (VY)]   ;; if you also want to compute the regression line Y = a X + b, add the following two lines:   ;a = CovXY / VX   ;b = EY - a * EX   (close "x-file")   (close "y-file")   ;; print whatever you want: r, a , b   return r)`

Now, if you want to compute the correlation between X and a function f of Y, just replace
within the "while" loop, by
yi = f (eval (readline "y-file"))

E.g., if X= Level and Y = Time (resp. Memory), you may want to check if there is a correlation between X and log(Y), corresponding to an exponential increase of computation times (resp. memory requirements) wrt to levels - or a correlation between X and Y^(1/4) corresponding to a polynomial (x^4) increase of computation times (resp. memory requirements) wrt to levels.

Such computations should always be interpreted with some care and should not be extrapolated beyond the range of the data. For instance, if the levels of all the puzzles in the collection have a range between 1 and 7, it is impossible to make a difference between an exponential behaviour and a polynomial behaviour in x^6 or x^7 (say for computation times or memory requirements).

The correlation results I've given in my previous post are valid for the nrczt levels from 1 to 7. This corresponds to 99.9 % of the random minimal puzzles and, likely, to normal human solving capabilities. I think they are interesting in this context.
Studying higher levels would require much larger random collections, in which sufficiently many puzzles at these higher levels would be present. Anyway, elementary statistics may not be a very appropriate tool for studying such extreme cases.
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

How many chains are there?

Given a partial chain of length n and of a given type (xy, nrc, hxyt, nrczt,...), it may in general be extended to the right into several chains of the same type and of length n+1. The number of chains of a given type may thus grow exponentially with the length. But, as there can't be chains of length > 81 (if we don't admit inner loops), there has to be a bound to the exponential growth.

The question of counting the chains is important in rating the puzzles, because the difficulty of finding useful chains is related to their length, of course, but also to how many (useless) chains of a given length there are.
Unfortunately, it is impossible to compute the number of chains a priori: worst case analysis suggests exponential growth, while a priori mean case analysis is impractical.
Here are the results of an experimental mean case analysis based on the 10,000 random minimal puzzles of the Sudogen0 collection, using the nrczt sets of rules and levels defined in the first post.

More precisely, here are the correlation coefficients between the nrczt-ratings and the number of chains (or the indicated function of it):

nrczt-rating vs number of chains: 0.54, no meaningful correlation

nrczt-rating vs log(number of chains): 0.96, very strong correlation

nrczt-rating vs sqr2(number of chains): 0.85
nrczt-rating vs sqr4(number of chains): 0.93
nrczt-rating vs sqr6(number of chains): 0.95
nrczt-rating vs sqr8(number of chains): 0.95

here, sqrn(x) is x^(1/n)

These results are valid for nrczt-levels ranging from 1 to 7 (i.e. with chains of length 1 to 7, enough to solve 99.9% of the minimal puzzles).
As explained in a previous post, in this range, it is impossible to discriminate between exponential or polynomial (x^6 or x^8) growth of the number of chains with length. But in any case, this shows a fast growth of the number of (useful or useless) chains with their length.
Remember that these results are only valid statistically and that there may be large differences between two puzzles at the same level.
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

Comments on my post: "How many chains are there?"

As I was interrupted while I was writing the previous post, I couldn't put the few comments its raw results may need. Here they are.

1) First, it is important to recall that these statistical results cover the whole range of puzzles that a human player can solve and they are likely to go much beyond normal human capabilities.

2) The NRCZT-rating is a natural one, based on a single and very simple parameter: the size of the hardest pattern (in the NRCZT set of rules) necessary to solve a puzzle. It satisfies all the good properties one can hope for a rating:
- purely logical definition, based on the concept of a resolution rule, whose conclusion can only be the assertion of a value or the elimination of candidates - i.e. the concrete elementary steps of any human solver;
- implementation independence (as a result of the previous property);
- invariance under elementary symmetries (permuting rows, columns, ...) and under logical supersymmetries;
- order independence (i.e. for any solver respecting the rules priorities, it doens't depend on the solver choices between rules at the same level); this is a direct consequence of the confluence property of the NRCZT-theories;
- it applies to almost all the puzzles;
- it could be extended to the few puzzles resisting nrczt-rules by adding a penalty for the intrusion of Trial and Error; I don't know yet what'd be the best form for this penalty, but it should introduce a clear discontinuity in the rating;
- finally, due to the fast growth in complexity with level, all the levels are clearly separated.

3) In the purely logical NRCZT-rating, apparently only the length of the longest pattern necessary to solve a puzzle is taken into account. But, as this rating guarantees that the puzzle can't be solved with shorter patterns, it implicitly takes all the shorter patterns into account. As a result, any conforming implementation will have to explore all the patterns of smaller length; it will thus implicitly take into account all the alternative possible choices at lower levels.

4) As the NRCZT-rating is reasonably well correlated with the other usual ratings (SER, SUEX9, SUEXT and GSF-Q1), these results also indicate that the length of the longest chain necessary to solve a puzzle is statistically the main parameter of its difficulty, whichever rating we consider (and whichever types of chains, among the current ones, it uses).
Of course, this result can't be extended to puzzles beyond the classification, such as the few minimal 0.01% which require T&E, nets or second order patterns.

5) Do these statistical results reflect the ratings of the puzzles aimed at human solvers, typically those published in the newspapers or on web forums?
I don't know how these puzzles are created/selected and I haven't made any statistical analysis on this, but if I was a puzzle creator, I would choose puzzles that need relatively long chains (because it is more fun) but for which the number of partial useless chains is limited (because finding the good ones would otherwise exceed human capabilities) - unless I chose to emphasise some specific pattern, e.g. fish for fish addicts. I would thus create very biased samples. I conjecture this is somehow what puzzle creators do. If I find a large sample of such puzzles created manually for a broad range of levels (and not emphasising any specific pattern), I'll test this conjecture (not in the near future, because I'm gonna be away from my beloved Mac for some time).

6) Can the NRCZT-rating be refined so as to take into account individual deviations from mean behaviour at each level? There is an obvious possibility: indicating the number of partial chains, e.g., taking examples from the first puzzles in Sudogen0: L1-412, L3-4014, L4-47155, L2-492, L6-141739, L2-1276, L2-1243, L4-42482, L1-494, L2-4064, L1-507, L1-477, L1-492, L4-11313,...

7) Do human solvers have to commit themselves to the rules priorities edicted by the NRCZT-rating? No. Due to the confluence property, even if they use unnecessarily long chains (just because they spot one before seeing a shorter one), they will in any case find a solution whenever the puzzle can be solved in nrczt.
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

Mike Barker wrote:Here are links to some of the codes for rating puzzles. In addition to Sudoku Explainer, suexratt, and GSF's solver, I've included Ruud's and Havard's solvers which also can also rate puzzles. I haven't used these, but for the other three I've shown the command line I've used. Note I've written code to reduce the full output to SE to just the ratings values. I'm not sure its necessary.

Thanks for these useful links and command lines.

Now that I've computed the NRCZT-ratings for all the 10.000 puzzles in Sudogen0, I could refine its correlation with SER by extending the sample from 1.000 to 10.000 puzzles.
I'm interested in the code you've written "to reduce the full output to SE to just the ratings values".
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

denis_berthier wrote:I'm interested in the code you've written "to reduce the full output to SE to just the ratings values".

gsf has done that & is downloadable. It has been in use for the patterns_game.

Just follow the link from "Serate" in the 1st post on that thread. The modified file outputs ER/EP/ED for each input puzzle.

tarek

tarek

Posts: 2756
Joined: 05 January 2006

denis_berthier wrote:I'm interested in the code you've written "to reduce the full output to SE to just the ratings values".

Using the gnu.org stream editor, the Windows command line (in the Command Prompt window) ...

type file.res | sed -e"/^[1-9H]/d" -e"s/Analyzing Sudoku #//" | sed -e :a -e"\$!N;s/\nDifficulty: /_/;ta" -e"P;D"

... yields output, e.g. ...

1_7.2

2_7.1

... meaning puzzle number 1 has rating 7.2, etc., etc. Much too unwieldy to type, so it needs to be in a command script.
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

Tarek, Ronk,

The standard (Unix) command line is:
java -cp SudokuExplainer.jar diuf.sudoku.test.Tester sudokus.txt results.txt
I checked it works.

If I understand properly the explanations, the (Unix) command line I need to have only the ER rating should merely be:

java -cp SudokuExplainer.jar diuf.sudoku.test.serate --format=%r sudokus.txt results.txt

As this produces no error message and no result, I must be missing something.
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

I'm not sure if this helps:

1.Either open/read the output file using "/" as a field seperator, therfore the first column/field would be the ER
2. OR: decompile, change the output command to show only ER not ER/EP/ED then compile again.

tarek

tarek

Posts: 2756
Joined: 05 January 2006

I found the proper command using the above indications and some T&E:

Code: Select all
`java -cp SudokuExplainer.jar diuf.sudoku.test.serate --input=sudokus.txt --output=results.txt`

will output the sequence of SER ratings, one per line, into file "results.txt" for puzzles given in "sudokus.txt" (one per line).
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

Additional correlation results: NRCZT vs SER

With the full collection of 10.000 random minimal puzzles in Sudogen0, the correlation coefficient between the NRCZT-ratings and the SER-ratings is 0.895.
(it was 0.885 with the first 1.000; this difference is within normal bounds for such computations).
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

Ruud has ordered his top10000 list according to decreasing difficulty, based on tabling.
The question naturally arises: is this ordering correlated with other ratings of puzzles?
I computed the SER ratings of all the puzzles in this list: they vary between 4.6 and 9.3.
I then correlated this with the rank in the list (call it RuudRank).
The correlation coefficient I got is -0.206, which means that there is no meaningful correlation.

As the complexities of the puzzles could decrease sub-linearly with their rank in the list, I also tried to correlate SER
- with log(RuudRank) and I got -0.23
- with sqr(RuudRank) and I got -0.22

(All these coefficients are negative, which is normal as the complexity is supposed decreasing).

All these results mean that there is no meaningful correlation between RuudRank, based on tabling, and the SER ranking.
This is interesting, as we have seen that the usual other rankings are strongly correlated.
denis_berthier
2010 Supporter

Posts: 1261
Joined: 19 June 2007
Location: Paris

PreviousNext