Rating rules / Puzzles. Ordering the rules

Advanced methods and approaches for solving Sudoku puzzles

Re: Rating rules / Puzzles. Ordering the rules

Postby JasonLion » Sat Mar 26, 2011 12:43 pm

Thanks to denis_berthier we now have PDF files of posts made to this topic between June 2009 and December 2009.

Page 10
Page 11
Page 12
Page 13
Page 14
Page 15
Page 16
Page 17
Page 18
Page 19
Page 20
Page 21
Page 22
Page 23
Page 24
Page 25
Page 26
Page 27
Page 28
Page 29
Page 30
User avatar
JasonLion
2017 Supporter
 
Posts: 621
Joined: 25 October 2007
Location: Silver Spring, MD, USA

The real distribution of the W and SER ratings (reminder)

Postby denis_berthier » Mon Aug 12, 2013 5:00 am




The real distribution of the W and SER ratings (reminder)


In preparation of my next post, I recall here a few old results.

On page 29 of the pdf here (posts lost in the 2010 crash): http://forum.enjoysudoku.com/rating-rules-puzzles-ordering-the-rules-t5995-143.html (Oct. 2009), I gave various results about the SER and the W classifications of minimal puzzles, based on the collection of 5,926,343 puzzles generated by the controlled-bias generator (for this generator, see the "real distribution" thread). (It used 279 full scans of the whole collection of complete grids.)
Part of these results is also available in:
- "Unbiased Statistics of a CSP - A Controlled-Bias Generator", International Joint Conferences on Computer, Information, Systems Sciences and Engineering (CISSE 09), December 4-12, 2009, Springer. Published as a chapter of the book Innovations in Computing Sciences and Software Engineering, Khaled Elleithy Editor, pp. 11-17, Springer, 2010, ISBN 97890481911133; pdf available on my website;
- chapter 6 of my book "Constraint Resolution Theories", Lulu.com Publishers (October 2011), ISBN 978-1-4478-6888-0;
- chapter 6 of my book "Pattern-Based Constraint Satisfaction and Logic Puzzles", Lulu.com Publishers (Novembre 2012), ISBN 978-1-291-20339-4 (pdf available here: http://arxiv.org/abs/1304.1628).

Remember that, for any fixed number of clues, the controlled-bias generator, when it uses an integer number of full scans of the whole set of complete grids (as obtained by gsf's compressed collection), is unbiased. As a result, each row of the the tables in sections 1 and 2 below gives both the controlled-bias and the real values for the n-clue SER or W rating. Only the global mean values and standard deviations have to be computed differently (without or with the correction coefficients associated with the controlled-bias generator).

In this post, I use my updated notations: W replaces the old NRCZT name.
As whips (or any of the other chain patterns I introduced) are meaningful in any Constraint Satisfaction Problem (CSP), with or with out any grid structure, NRCZT was too Sudoku-specific (nrc refers to number, row, column).
The W rating of a puzzle P is the smallest n such that P can be solved by whips of maximum length n.

In the next post, I'll show how the dependency of the SER or W mean for a fixed number of clues wrt the number of clues can be extended, using puzzles generated by Afmob for extreme (low or high) numbers of clues.
Notice that these extensions cannot have any impact on the global W or SER distributions recalled in section 3 (all numbers of clues confused), as they concern only an extremely low proportion of puzzles.


____________________________________________________________________

1) HOW THE SER REALLY DEPENDS ON THE NUMBER OF CLUES
____________________________________________________________________

In this section, I'm using only 1,380,962 puzzles (corresponding to 65 full scans of gsf's collection) of the total collection, because computing the SER is too long.

Code: Select all
#clues  #instances       mean(SER)       standard-deviation(SER)
19      0                                   
20      0                                   
21      41               3.56 (*)        2.01 (*)
22      1,526            3.15            2.16
23      25,884           3.35            2.24
24      163,694          3.61            2.36
25      422,451          3.96            2.47
26      467,047          4.40            2.54
27      234,963          4.93            2.53
28      57,615           5.47            2.44
29      7,243            6.07            2.19
30      481              6.76            1.71
31      16               5.79 (*)        2.34 (*)
32      1                7.3 (*)         (*)
all     1,380,962                     

(*) values based on small samples are not meaningful.


Which gives:

Code: Select all
controlled-bias mean(SER) = 4.29    controlled-bias standard-deviation(SER) = 2.48

real mean(SER) = 4.73               real standard-deviation(SER) = 2.49

correlation coefficient #clues vs SER = 0.19


The real correlation coefficient #clues vs SER (= 0.19) is a little higher than for the top-down generator (0.12) but it remains too small to allow any predictions of (SER) complexity given the number of clues.



____________________________________________________________________

2) HOW THE W RATING REALLY DEPENDS ON THE NUMBER OF CLUES
____________________________________________________________________

Code: Select all
#clues  #instances       mean(W)         standard-deviation(W)
19      0                                   
20      2                1.95 (*)        1.05 (*)         
21      164              1.52 (*)        0.93 (*)
22      6,651            1.64            1.12
23      110,103          1.73            1.18
24      704,089          1.86            1.25
25      1,814,413        2.04            1.32
26      2,002,349        2.27            1.38
27      1,007,700        2.55            1.42
28      247,259          2.85            1.42
29      31,449           3.16            1.37
30      2088             3.47            1.29
31      74               3.57            1.23 (*)
32      2                4.5 (*)         0.5 (*)
all     5,926,343                     

(*) values based on small samples are not meaningful.


Which gives:
Code: Select all
controlled-bias mean(W) = 2.217     controlled-bias standard-deviation(W) = 1.35

real mean(W) = 2.449                real standard-deviation(W) = 1.39

correlation coefficient #clues vs W = 0.19

The real correlation coefficient #clues vs W (= 0.19) is a little higher than for the top-down generator (0.115) but it remains too small to allow any predictions of (W) complexity given the number of clues.



____________________________________________________________________

3) THE REAL W COMPLEXITY DISTRIBUTION OF MINIMAL PUZZLES
____________________________________________________________________


The 5,926,343 minimal puzzles produced by the controlled-bias generator give the following statistics for the real W complexity distribution of minimal puzzles.

It is also interesting to compare them with the complexity distribution of minimal puzzles produced by different types of generators.
Code: Select all

                   W1_0     W1      W2      W3      W4      W5     W6     W7     W8      W9      W10      W11      W12       >W12
bottom-up          46.27    13.32   12.36   15.17   10.18   1.98   0.49   0.19   0.020   0.010   0 *      0.01 *   0 *       0 *
top-down           41.76    12.06   13.84   16.86   12.29   2.42   0.55   0.15   0.047   0.013   3.8e-3   1.5e-3   9.0e-4    2.0e-04 *
controlled-bias    35.08    9.82    13.05   20.03   17.37   3.56   0.79   0.21   0.055   0.015   4.4e-3   1.2e-3   3.2e-4    2.0e-04 *
real               29.17    8.44    12.61   22.26   21.39   4.67   1.07   0.29   0.072   0.020   5.5e-3   1.5e-3   3.4e-4    2.0e-04 *
_______________________________________________________________________________________________________________________________________
* values based on a small sub-sample are not reliable.



These distributions show very clearly the complexity bias of the three kinds of generators.
All these distributions have the same two modes, at W1_0 and at W3, as the real distribution.
It can be seen that when we move from bottom-up to top-down to controlled-bias to real, the mass of the distribution moves progressively to the right.
This displacement towards higher complexity occurs mainly at the first W-levels, after which it is only very slight.

In any cases:
- more than 98.5% of the puzzles can be solved with whips of maximal length 5;
- more than 99% of the puzzles can be solved with whips of maximal length 7;
- more than 99.9% of the puzzles can be solved with whips of maximal length 9;
- all the puzzles can be solved with whips.


A better presentation (with html tables) of these results (and much more) has been available on my website since that time http://carva.org/denis.berthier/Classification/index.html (see section 3).
There, I also wrote, in substance, that there seems to be a (non absolute) barrier of complexity such that, when the number of clues (n) increases:
- the n-clue mean (SER or W) rating increases;
- the proportion of puzzles with (SER or W) rating away from the n-clue mean (SER or W) rating increases;
but
- the proportion of puzzles with (SER or W) rating far below the n-clue mean (SER or W) rating increases;
- the proportion of puzzles with (SER or W) rating far above the n-clue mean (SER or W) rating decreases.
Graphically, the n-clue (SER or W) rating distribution looks like a wave; when n increases, the wave moves to the right, with a longer left tail and a steeper right front.
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

How the mean W and SER ratings vary for extreme numbers of c

Postby denis_berthier » Mon Aug 12, 2013 3:01 pm




How the mean W and SER ratings vary for extreme numbers of clues


As announced in the previous post, the above results can now be completed using puzzles generated by Afmob's fast implementation of Red Ed's subset/superset method *.
The status of the following results is different from the above: being computed from samples that are:
- correlated (due to the method),
- possibly biased (due to the number of complete grids used),
- relatively small.

In particular, the standard deviation is unknown.

Anyway, it is interesting to see how the trend that could be observed in the previous post seems to extend to the extreme values.

(*) For two sets of puzzles, the origin is different:
- the 17s are the whole collection of 49,712 17s (bias 0);
- the 38s are the whole collection of known 38s (unknown bias).


____________________________________________________________________

1) HOW THE MEAN SER VARIES WITH EXTREME NUMBERS OF CLUES
____________________________________________________________________


Code: Select all
#clues  #instances      mean(SER)       
17      49,152          2.60                   
18      814             2.64                 
19      59              2.59  (*)                 
20      1000            2.81                   
21      1001            2.95


31      1000            6.77     
32      1002            7.17     
33      1000            7.40     
34      91              7.44 (*)     

38      121038          7.98 (**)     
39      336             7.95       

(*) small sample, result not reliable
(**) ad hoc sample, result not reliable




____________________________________________________________________

2) HOW THE MEAN W RATING VARIES WITH EXTREME NUMBERS OF CLUES
____________________________________________________________________

Code: Select all
#clues  #instances      mean(W)       
17      49,152          1.30                   
18      814             1.41                 
19      59              1.45 (*)                   
20      1000            1.45                   
21      1001            1.50


31      1000            3.65     
32      1002            3.89     
33      1000            4.02     
34      91              4.49 (*)     

(*) small sample, result not reliable
Last edited by denis_berthier on Mon Aug 12, 2013 4:29 pm, edited 1 time in total.
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Re: Rating rules / Puzzles. Ordering the rules

Postby denis_berthier » Mon Aug 12, 2013 3:02 pm

[Reserved]
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Re: Rating rules / Puzzles. Ordering the rules

Postby enxio27 » Mon Aug 12, 2013 3:43 pm

Denis, it's nice to see you back!

I'm following this thread with some interest, although on a much lower level than the rest of you. My main concern is getting a consistent rating of the puzzles in my (now vast) collection, with a view to working my way up the difficulty scale. I'm a pencil-and-paper solver, although for puzzles that (for me) require pencil marks, I let the software do the pencil marks for me (otherwise I make too many fun-killing mistakes before I ever start the puzzle itself). I am hoping to use y'all's findings in this thread to help me order the rules in my puzzle-rating software (currently Ruud's Sudocue) to give me a fairly accurate ranking of my puzzle collection.

I know this is something of a can of worms, but I'm curious as to the general consensus on which techniques would fall into the category of "guessing" vs. pure (human) logic.
User avatar
enxio27
 
Posts: 296
Joined: 13 November 2007

Re: Rating rules / Puzzles. Ordering the rules

Postby denis_berthier » Mon Aug 12, 2013 4:40 pm

Hi Enxio27,

enxio27 wrote:I am hoping to use y'all's findings in this thread to help me order the rules in my puzzle-rating software (currently Ruud's Sudocue) to give me a fairly accurate ranking of my puzzle collection.

I'm not sure this thread can be of any use to rate puzzles according to Sudocue ranking. But I've never used Sudocue (it runs only on Windows).

enxio27 wrote:I know this is something of a can of worms, but I'm curious as to the general consensus on which techniques would fall into the category of "guessing" vs. pure (human) logic.

A can of worms and quite OOT here. In 3.5 words: there's no consensus.

In my view, rating can't be a goal in and of itself and there can't be any unique rating system.
You rate puzzles with some view of how you should solve them.
If this view is pattern-based, then you rate them according to some set of resolution rules.
Even in this case, there are many possible ratings, all logically grounded (e.g. based on whips, g-whips, Subsets, g-Subets, ...).
In this thread, I've shown how some of them are (statistically) related.
denis_berthier
2010 Supporter
 
Posts: 1253
Joined: 19 June 2007
Location: Paris

Previous

Return to Advanced solving techniques