The New Sudoku Players' Forum

Website · by **denis_berthier** » Sun Mar 30, 2008 10:01 am

RATING RULES / PUZZLES; ORDERING THE RULES

INTRODUCTION - MOTIVATIONS
Several ratings of rules or puzzles are known and they generally lead to different classifications of puzzles.
Comparison of different ratings can't therefore be done on a puzzle by puzzle basis. They can only be statistical.
I tried a first approach of this problem, based on a small sample (31 of the 33 "Unsolvables") for which 5 different ratings were available.
These results are still available on this forum, here: http://forum.enjoysudoku.com/viewtopic.php?t=4833&start=90
They suggested that it would be interesting to extend this study to larger collections of puzzles, preferably random ones.

The various ratings can also suggest different clasification of the rules.

The ratings initially considered are (this list can be extended if participants propose new ratings):
NRCZT-rating
SER-rating
SUEX9-rating
SUEXT-rating
GSF-Q1-rating

The purpose of this thread is thus multiple. It aims at least at:
- assembling in a single place a precise definition of each rating (or pointers to such definitions)
- giving references to where and how each of these ratings can be computed
- providing software so that anyone can compare for himself the various ratings on large collections of puzzles
- providing large collections of puzzles that can be used as references for such comparisons
- evaluating the statistical complexity of rules (mean value analysis) and, for rules depending on some parameters, how this complexity varies with these parameters
(examples of possible parameters: length for chains, maximum ALS size for chains with ALS, ...)
- analysing the consequences on rules classification

CORRELATION COEFFICIENT OF TWO RATINGS
The main and simplest statistical tool for comparing two ratings is their correlation coefficient.
Given a collection of N puzzles and several rating functions, each of these rating functions is a statistical variable and it can be seen as a vector in R^N.
The (linear) correlation coefficient between 2 statistical variables has a value between -1 and +1. It is the cosine of the angle between the associated vectors in R^N.

A correlation coefficient whose absolute value is greater than .8 is generally considered high.
A correlation coefficient whose absolute value is smaller than .5 is generally considered low (which means that there is no meaningful correlation between the two variables - the angle between the 2 vectors representing the 2 variables in R^N is > 45°).

In a forthcoming post, I'll provide pseudo-code for computing this coefficient (this is a very standard statistical tool).

MEAN DISTANCE OF A RATING FROM ALL THE OTHERS
Whereas the correlation coefficient allows comparisons between 2 ratings, the mean distance from a rating to all the others allows to evaluate its compatibilty with them.
Its geometrical meaning is a mean distance in space R^N between the unit vectors associated to each rating. (Lengths are discarded, everything is brougth to the unit sphere in R^N. This rescaling allows to eliminate meaningless differences between different ratings).

Website · by **denis_berthier** » Sun Mar 30, 2008 10:03 am

NRCZT-rating
Of course, I'll be concerned with the NRCZT-rating, among others, so let me remind its definition.
It is mainly based on nrczt-chains and lassos (see their definition here: http://forum.enjoysudoku.com/viewtopic.php?t=5591)
Rules are classified in levels according to the number of cells they rely on. There is thus one and only one parameter and the classification is very simple.
More precisely
L0: only elementary constraints propagation (no puzzle can be solved here)
L1: Naked and Hidden Singles, elementary interactions between rows/columns and blocks
L2: Naked, Hidden and Super-Hidden Pairs + NRCZT2
L3: Naked, Hidden and Super-Hidden Triplets + NRCZT3
L4: Naked, Hidden and Super-Hidden Quads + NRCZT4
L5: NRCZT5
For n>4: Ln: NRCZTn, where n is the length of the chain or lasso (number of cells, half the number of candidates).

Specialisations of NRCZT chains or lassos can be used in the solution paths (NRCT, NRCZ, NRC or any of their 2D counterparts, but the levels used here will be based only on the above 3D version)

Added 03/31/08:
As some posts show, there may be a misunderstanding about the NRCZT-rating; let me be more precise.
Let T(Ln) be the set of resolution rules defined above at levels from 0 to n.
T(Ln) is a set of resolution rules (i.e. a resolution theory - for these notions, see my book or this thread: http://forum.enjoysudoku.com/viewtopic.php?t=5600)
Formally, Ln is the set of puzzles for which there is a proof of their solution within T(Ln).
The Ln stratification is thus purely logical, by definition. It doesn't depend in any way on any solver.

What can be said of the T(Ln)?
- they are a sequence of logical theories of increasing strength;
- each of them is invariant under symmetry and supersymmetry; this entails that the Ln stratification is invariant under symmetry;
- each of them has the confluence property; i.e. any couple of resolution paths will lead to the same final state, for any puzzle (with 0, 1 or more solutions); this very important property has been proven in my book; in practice, this entails that you can superimpose on these theories any precedence ordering of the rules you like;
- none of them is complete; i.e. none of them can solve any valid puzzle with a unique solution; (notice that this is the case for any currently known set of resolution rules);
- nevertheless, L7 is enough to solve 99,99% of the minimal puzzles.

Now, a word about SudoRules.
It is a solver, based on AI techniques. It uses the CLIPS or the JESS inference engine. The "program" is just the transposition in the CLIPS syntax of the rules defined above. The rules priorities are defined so that Ln+1 has a lower priority than Ln.
For each n, SudoRules is complete wrt to Ln: if a puzzle is in Ln, SudoRules will find a resolution path in Ln.
When I want to compute the level of a puzzle, I use SudoRules, but the result is independent of this particular implementation of the nRCZT rules.

Website · by **denis_berthier** » Sun Mar 30, 2008 10:08 am

FIRST RESULTS
In the following results:
-- the Sudogen0 collection has been used.
It is available here: http://www.carva.org/denis.berthier/HLS/sudogen0.txt
It is a collection of 10.000 minimal puzzles, randomly generated with the suexg generator
-- I used only the first 1.000 puzzles
-- as #707 leads to a memory overflow problem for my SudoRules solver (as lots of physical memory remain available, I think the problem comes from the CLIPS inference engine I'm using), it has been eliminated from all computations. For a sample of 1.000 puzzles, this may entail no bias.
-- this is a joint work with Mike. He provided all the SER, SUEX9, SUEXT and GSF-Q1 ratings used for the comparisons.

Correlation coefficients:
NRCZT vs SER: 0.89
NRCZT vs SUEX9: 0.73
NRCZT vs SUEXT: 0.77
NRCZT vs GSF-Q1: 0.70

SER vs SUEX9: 0.70
SER vs SUEXT: 0.68
SER vs GSF-Q1: 0.69

SUEX9 vs SUEXT: 0.88
SUEX9 vs GSF-Q1: 0.75

SUEXT vs GSF-Q1: 0.64

Although the GSF-Q1 rating remains a little apart from the others, it is not as bad as the limited set Unsolvables33 suggested.

Mean distances:
This analysis can be completed by computing the mean distance of each rating to the other 4. The shorter the distance, the closer the rating to others:

NRCZT-rating: 0.678
SER-rating: 0.721
SUEX9-rating: 0.682
SUEXT-rating: 0.716
GSF-Q1-rating: 0.780

by **tarek** » Sun Mar 30, 2008 3:31 pm

I know that a random sample correlates best.....

The puzzles that we encounter unfortunately are not .........

why should I bother with how different solvers rate puzzles that are of SS difficylty or less ....

your random pool should be filtered down to a (SS unsolvables) or SE121>x then compare the results......

do you have some postings denis_berthier documenting how your solver performed against these lists.

if any solver has a ceiling to what it can handle, that also should be mentioned for clarity.

tarek

by **gsf** » Sun Mar 30, 2008 3:33 pm

a question about ordering affects

for each ply in the solution process does the rating take into account the first move in the order that works and then apply it
or does it take into account all moves with the same value?

if it only takes the first move that works in the order then there will be ordering effects

to test this pick a puzzle with many instances of higher level techniques in the solution
and correlate each rating against itself on 10K random permutations of one puzzle

here is one to try (its ok to leave SE out -- unless you have 4 days for it to rate 10K ~ER=9.0 puzzles)

Code: Select all: 100000006020500040003060700040800000000200000000315080007000300050009020600000001

you can generate a stream of 10K pseudo-random permutations using my solver
-r123 sets the pseudo-random seed so that anyone can reproduce the stream

Code: Select all: sudoku -r123 -qFN -f%#0v -J9999 100000006020500040003060700040800000000200000000315080007000300050009020600000001

it will be interesting to see how the suex* self-correlate because they attempt to
wash ordering effects by averaging runs over pseudo-random permutations of the input puzzle

Website · by **denis_berthier** » Sun Mar 30, 2008 6:37 pm

tarek wrote:why should I bother with how different solvers rate puzzles that are of SS difficylty or less ....
your random pool should be filtered down to a (SS unsolvables) or SE121>x then compare the results......

It's up to the participants to propose their own lists of puzzles.

tarek wrote:do you have some postings denis_berthier documenting how your solver performed against these lists.

The purpose of my post is not to evaluate solvers but to compare existing ratings of puzzles.
To be meaningful, such evaluation has to be done for all levels of puzzles, at least all levels of puzzles that a human player can solve.

tarek wrote:if any solver has a ceiling to what it can handle, that also should be mentioned for clarity.

How do you define the ceiling? Which rating do you use? That's the problem.

Website · by **denis_berthier** » Sun Mar 30, 2008 6:43 pm

gsf wrote:a question about ordering affects
for each ply in the solution process does the rating take into account the first move in the order that works and then apply it
or does it take into account all moves with the same value?

I can't answer for the other ratings, but NRCZT-rating is order independent.
Given a puzzle P, if you submit it to any legal permutation, thus otaining P', the solution path SudoRules will give you for P' may not be the transposition of the solution path for P, but the level you'll get for P and P' will be the same.
This is because no rule at level n can be applied as long as rules at lower levels are applicable.

NRCZT-rating doesn't depend on the solver.

by **tarek** » Sun Mar 30, 2008 8:38 pm

denis_berthier wrote:It's up to the participants to propose their own lists of puzzles.

& I guess those participants will then tell us if they want to hear opinions about their choices or not

denis_berthier wrote:The purpose of my post is not to evaluate solvers but to compare existing ratings of puzzles.

That was my intention. The solvers adopt different methods in solving. so by comparing solvers, aren't you comparing rating methods ?

denis_berthier wrote:
tarek wrote:if any solver has a ceiling to what it can handle, that also should be mentioned for clarity.

How do you define the ceiling? Which rating do you use? That's the problem.

true, but at least you can tell what is above that ceiling

tarek

by **Mike Barker** » Sun Mar 30, 2008 8:56 pm

Here are links to some of the codes for rating puzzles. In addition to Sudoku Explainer, suexratt, and GSF's solver, I've included Ruud's and Havard's solvers which also can also rate puzzles. I haven't used these, but for the other three I've shown the command line I've used. Note I've written code to reduce the full output to SE to just the ratings values. I'm not sure its necessary.

Nicolas Juillerat's Sudoku Explainer at http://diuf.unifr.ch/people/juillera/Sudoku/Sudoku.html
java -cp SudokuExplainer.jar diuf.sudoku.test.Tester sudoku.txt result.txt

Guenter Stertenbrink's (dukuso) http://magictour.free.fr/suexratt.exe
suexratt sudoku.txt 10000 2 > rate_ST.out

Glenn Fowler's (gsf) Solver at http://www.research.att.com/~gsf/sudoku/
gsf -H -q1 -f"%6r %v" sudokus.out > rate_GSF.out
or
gsf -q"FNBTHW-G" -Z"FNBTHWP*" -B -M- -R"P0*1000+P2+(P8>1?10000:0)" -f"%r" sudoku.txt > rate_GSF.out

Ruud's Sudocue at http://www.sudocue.net/download.php

Havard Graff's Sudoku Assistenten at http://www.sudoku.frihost.net/

Also here's a summary of how I believe SE rates puzzles. I've enclosed those ratings I haven't reproduced in "[]". There are probably some that are missing, for example, chains with more or fewer "nodes" (my term) than I've shown and I'm not sure of the "node" count or how "nodes" are calculated (they are a combination of the number of if,then statements and the number of chains). Note that UR's and BUG's probably are rated higher than they should be based on difficulty and a limitation of SE is that it doesn't used finned fish, grouped strong links, ALS, or AHS so some puzzles get rated higher than I would think they should be

Code: Select all: 1.0 Single 1.2 Hidden Single in box 1.5 Hidden Single in line 1.7 Direct Pointing 1.9 Direct Claiming 2.0 Direct Hidden Pair 2.3 Naked Single 2.5 Direct Hidden Triplet 2.6 Pointing 2.8 Claiming 3.0 Naked Pair 3.2 X-Wing 3.4 Hidden Pair 3.6 Naked Triplet 3.8 Swordfish 4.0 Hidden Triplet 4.2 XY-Wing 4.3 [Direct Hidden Quad] 4.4 XYZ-Wing 4.5 UR Types 1 or 2 or 4 or 3 w/ hidden pair 4.6 UR Type 3 w/ naked pair or hidden triplet UL Types 1 or 2 or 4 or 3 w/ hidden pair (6 cells) 4.69 UL Type 3 w/ a naked pair or hidden triplet (6 cells) 4.7 UR Type 3 w/ naked triplet or hidden quad UL Types 1 or 2 or 4 or 3 w/ hidden pair (8 cells) 4.8 UR Type 3 w/ naked quad UL Type 3 w/ naked triplet [or hidden quad] (6 cells) UL Type 3 w/ naked pair or hidden triplet (8 cells) 4.89 UL Type 3 w/ naked quad (6 cells) 4.9 [UL Type 3 w/ naked triplet or hidden quad (8 cells)] 5.0 Naked Quad or UL 1 or 2 or 4 (>=10 cells) 5.1 UL Type 3 w/ naked pair (>=10 cells) 5.2 Jellyfish 5.3 Unknown 5.4 Hidden Quad 5.5 Unknown 5.6 BUG Type 1 5.7 BUG Type 2 or 4 5.8 BUG Type 3 w/ naked pair 5.9 BUG Type 3 w/ naked triplet 6.0 BUG Type 3 w/ naked quad 6.1 BUG Type 3 w/ naked quint 6.2 Aligned Pair Exclusion 6.3 Unknown 6.4 Unknown 6.5 Bidirectional X-Cycle or Bidirectional Y-Cycle (1-4 nodes) 6.6 Turbot Fish Forcing X-chain or Bidirectional Y-Cycle (5-6 nodes) 6.69 Forcing X-Chain (7-8 nodes) 6.7 Bidirectional Y-cycle (7-8 nodes) 6.8 Forcing X-Chain or Bidirectional Y-cycle (9-12 nodes) 6.9 Forcing X-Chain or Bidirectional Y-cycle (13-16 nodes) 7.0 Bidirectional Y-cycle (17-24 nodes) Forcing Chain or Bidirectional Cycle (1-4 nodes) 7.1 Forcing Chain or Bidirectional Cycle (5-6 nodes) 7.2 Forcing Chain or Bidirectional Cycle (7-8 nodes) 7.3 Forcing Chain or Bidirectional Cycle (9-12 nodes) 7.4 Forcing Chain (13-16 nodes) 7.5 Forcing Chain (17-24 nodes) Aligned Triplet Exclusion 7.6 Forcing Chain (25-36 nodes) Nishio Forcing Chain (5-6 nodes) 7.7 Nishio Forcing Chain (7-8 nodes) 7.8 Nishio Forcing Chain (9-12 nodes) 7.9 Nishio Forcing Chain (13-16 nodes) 8.0 Nishio Forcing Chain (17-24 nodes) 8.1 Nishio Forcing Chain (25-36 nodes) 8.2 Multiple (7-8 nodes) Region Forcing Chains 8.3 Multiple (9-12 nodes) Cell/Region Forcing Chains 8.4 Multiple (13-16 nodes) Cell/Region Forcing Chains 8.5 Multiple (17-24 nodes) Cell/Region Forcing Chains 8.6 Multiple (25-36 nodes) Dynamic (5-6 nodes) Cell/Region Forcing Chains 8.7 Dynamic (7-8 nodes) Cell/Region Forcing Chains 8.8 Dynamic (9-12 nodes) CRCD Forcing Chains 8.9 Dynamic (13-16 nodes) CRCD Forcing Chains 9.0 Dynamic (17-24 nodes) CRCD Forcing Chains 9.1 Dynamic (25-36 nodes) CRCD Forcing Chains 9.2 Dynamic (37-48 nodes) CRCD Forcing Chains 9.3 Dynamic (49-72 nodes) Dynamic + (9-12 nodes) CRCD Forcing Chains 9.4 Dynamic (73-96 nodes) Dynamic + (13-16 nodes) CRCD Forcing Chains 9.5 Dynamic + (17-24 nodes) CRCD Forcing Chains 9.6 Dynamic + (25-36 nodes) CRCD Forcing Chains 9.7 Dynamic + (37-48 nodes) CRCD Forcing Chains 9.8 Dynamic + (49-72 nodes) CRCD Forcing Chains 9.9 Dynamic + (73-96 nodes) CRCD Forcing Chains 10.0 Dynamic + (97-144 nodes) Dynamic + Forcing Chains (17-24 nodes) CRCD Forcing Chains 10.1 Dynamic + (145-192 nodes) Dynamic + Forcing Chains (25-36 nodes) CRCD Forcing Chains 10.2 Dynamic + Forcing Chains (37-48 nodes) CRCD Forcing Chains 10.3 Dynamic + Forcing Chains (49-72 nodes) CRCD Forcing Chains 10.4 Dynamic + Forcing Chains (73-96 nodes) CRCD Forcing Chains 10.5 Dynamic + Forcing Chains (97-144 nodes) CRCD Forcing Chains 10.6 Dynamic + Forcing Chains (145-192 nodes) CRCD Forcing Chains 10.7 Dynamic + Forcing Chains (193-288 nodes) CRCD Forcing Chains 10.8 Dynamic + Forcing Chains (289-384 nodes) CRCD Forcing Chains 10.9 Dynamic + Multiple Forcing Chains (73-96 nodes) CRCD Forcing Chains 11.0 Dynamic + Multiple Forcing Chains (97-144 nodes) CRCD Forcing Chains 11.1 Dynamic + Multiple Forcing Chains (145-192 nodes) CRCD Forcing Chains 11.2 Dynamic + Multiple Forcing Chains (193-288 nodes) CRCD Forcing Chains 11.3 Dynamic + Multiple Forcing Chains (289-384 nodes) CRCD Forcing Chains 11.4 Dynamic + Multiple Forcing Chains (385-576 nodes) CRCD Forcing Chains 11.4 [Dynamic + Dynamic Forcing Chains (73-96 nodes) Region/Contradiction Forcing Chains] 11.5 [Dynamic + Dynamic Forcing Chains (97-144 nodes) Region Forcing Chains] 11.6 [Dynamic + Dynamic Forcing Chains (145-192 nodes) Cell Forcing Chains] 11.7 [Dynamic + Dynamic Forcing Chains (193-288 nodes) Double Forcing Chains] CRCD=Cell/Region/Contradiction/Double UL=Unique Loop UR=Unique Rectangle BUG=Bivalue Universal Grave

by **gsf** » Sun Mar 30, 2008 9:56 pm

denis_berthier wrote:
gsf wrote:a question about ordering affects
for each ply in the solution process does the rating take into account the first move in the order that works and then apply it
or does it take into account all moves with the same value?

I can't answer for the other ratings, but NRCZT-rating is order independent.
Given a puzzle P, if you submit it to any legal permutation, thus otaining P', the solution path SudoRules will give you for P' may not be the transposition of the solution path for P, but the level you'll get for P and P' will be the same.
This is because no rule at level n can be applied as long as rules at lower levels are applicable.

NRCZT-rating doesn't depend on the solver.

not the order dependence I was aiming at

suppose at a given ply (board position) the next move in the technique order has rating R
but suppose at this ply there are N moves with the same rating R
does the solver apply 1 of the moves of rating R (if so, which one)?
or all the moves of rating R?
what if the first move of equals selected canibalizes some or all of the others?
what if the order of applying any or all of the moves changes the active techniques in the next ply?

by **gsf** » Sun Mar 30, 2008 10:04 pm

the problem with SE is that it only rates the hardest of the moves at all plies
so a puzzle A with rating 10.0 could have one 10.0 move and the rest singles
but a puzzle B with rating 10.0 could require ten separate 10.0 moves to solve

a rating should take into account the ratings of all necessary moves within a pre-specified order

(the SE numeric order is ok, but just rating the top move can miss the difficulty mark)

by **Mauricio** » Sun Mar 30, 2008 10:39 pm

Is the coefficient correlation efficient to compare 2 different ratings?
What would be the correlation coefficient of a rating function F and 2^F?
F and 2^F are in fact related, but the correlation coefficient will not capture that so well.

Website · by **denis_berthier** » Mon Mar 31, 2008 6:06 am

Mauricio wrote:Is the coefficient correlation efficient to compare 2 different ratings?
What would be the correlation coefficient of a rating function F and 2^F?
F and 2^F are in fact related, but the correlation coefficient will not capture that so well.

There is some reason why it is called the LINEAR correlation coefficient.
But, given two variables X and Y, you can always compare X and 2^Y.

I think the correlation coefficient is not only efficient but it is the only elementary tool to compare two ratings. As I already argued, such comparison vcan only be statistical.

Website · by **denis_berthier** » Mon Mar 31, 2008 6:37 am

tarek wrote:
denis_berthier wrote:The purpose of my post is not to evaluate solvers but to compare existing ratings of puzzles.
That was my intention. The solvers adopt different methods in solving. so by comparing solvers, aren't you comparing rating methods ?

I mean to compare rating methods independently of solvers.
This can be done at least with the NRCZT-rating (see what I've added to my post above). I don't know yet for the others.
It is one of the purposes of this thread to clarify such matters: how does the ratings depend on particular (explicit or implicit) choices in the solvers used to compute them.

tarek wrote:
denis_berthier wrote:
tarek wrote:if any solver has a ceiling to what it can handle, that also should be mentioned for clarity.

How do you define the ceiling? Which rating do you use? That's the problem.
true, but at least you can tell what is above that ceiling

tarek

If you can't define a ceiling, how can you tell what's above?
This is not rhetoric. When I tried the nrczt-chains on Ruud's top10000, the classification results I obtained were uncorrelated with Ruud's classification.

Website · by **denis_berthier** » Mon Mar 31, 2008 6:43 am

gsf wrote:
denis_berthier wrote:
gsf wrote:a question about ordering affects
for each ply in the solution process does the rating take into account the first move in the order that works and then apply it
or does it take into account all moves with the same value?

I can't answer for the other ratings, but NRCZT-rating is order independent.
Given a puzzle P, if you submit it to any legal permutation, thus otaining P', the solution path SudoRules will give you for P' may not be the transposition of the solution path for P, but the level you'll get for P and P' will be the same.
This is because no rule at level n can be applied as long as rules at lower levels are applicable.

NRCZT-rating doesn't depend on the solver.

not the order dependence I was aiming at

suppose at a given ply (board position) the next move in the technique order has rating R
but suppose at this ply there are N moves with the same rating R
does the solver apply 1 of the moves of rating R (if so, which one)?
or all the moves of rating R?
what if the first move of equals selected canibalizes some or all of the others?
what if the order of applying any or all of the moves changes the active techniques in the next ply?

I completed my post about NRCZT-rating. I thinkthat will answer these questions. The key is the confluence property of all the resolution theories described there. The application of a rule can make another one non applicable, but then a simpler rule withthe same conclusion will become applicable.
In practice, SudoRules applies the first move that it can.

The New Sudoku Players' Forum

Rating rules / Puzzles. Ordering the rules

Rating rules / Puzzles. Ordering the rules