The Sudoku grey zone

Advanced methods and approaches for solving Sudoku puzzles

Re: The Sudoku grey zone

Postby denis_berthier » Fri May 24, 2013 6:24 am

So, we now have a better idea of what's in the "potential hardest" list.

If using it as a starting point for a vicinity search for any pattern with lower SER, say JExocet (or Exocet), it's more or less obvious that the result will be strongly biased.
There may be several ways of getting a (vague) idea of the possible bias:
- keep track of how many +1/-1 steps there are between the original puzzle and the final one and state the results as a function of this distance;
- keep track of the Exocets in the original puzzle and those in the final one and check how the results vary if we count all the Exocets in the final puzzles or only those that weren't already there in the original ones.
denis_berthier
2010 Supporter
 
Posts: 3967
Joined: 19 June 2007
Location: Paris

Re: The Sudoku grey zone

Postby champagne » Tue May 28, 2013 1:32 pm

coloin wrote:Probably most of the puzzles with an SK-loop are known.

C


Hi coloin,

I am running a first test in the grey zone and I can already tell you that this is not exact.

I studied years ago gsf's taxonomy file and found many puzzles with a relatively low rating having the sk loop
like that one I used as example on my website

Sample1 Coloin 02805 in gsf list (as of 2008 02 26)
600000002090400050001000700050943000000105000000800040007000600030009080200000001

The first "V" loop not recognised by ronk as a SK loop has been seen in a puzzle with a rating around SER 9.0

What will be interesting is to see the evolution of the frequency when the average rating goes down.
champagne
2017 Supporter
 
Posts: 7335
Joined: 02 August 2007
Location: France Brittany

How frequent are the J-Exocets in the grey zone?

Postby denis_berthier » Mon Jun 03, 2013 3:51 pm



How frequent are the J-Exocets in the grey zone?


For the 5,926,343 puzzles in the controlled-bias collection produced by the controlled-bias generator (*), I had computed long ago:
- the SER for the first 3,037,717
- the W rating for the whole collection (at that time, instead of "W rating", I said "pure NRCZT rating" but it's the same thing).
(*) For details about this, see the "real distribution of minimal puzzles" thread.

Considering only the first 3,037,717:
5615 have SER >= 9.0
2353 have W >= 8
664 have W >= 9


I have looked for JExocets in the 664 W>=9 cases, which corresponds to a stricter definition of the lower bound of the grey zone than SER >= 9.0

I activated JEs and the rules of SSTS - i.e. Whips[1] and (Naked, Hidden and Super-Hidden) Subset rules - and nothing else.
In order to avoid degenerated cases, JE's of any size were assigned lower priority than all the rules in SSTS.
By JE, I mean standard Jk-Exocets, with k = 2, 3, 4 or 5 (as defined in this post http://forum.enjoysudoku.com/pattern-based-classification-of-hard-puzzles-t30493-85.html). Franken or Blue's extensions were not taken into account (for the only reason that they are not programmed in SudoRules), but I don't think they could lead to very different stats.

No JExocet was found in any of these puzzles; so, the calculations for a rough estimate of the unbiased frequency in the grey zone shouldn't require a doctorate in statistics and I won't invest more personal time on this topic.


[Added: How frequent are sk-loops?: none found in this sample]
denis_berthier
2010 Supporter
 
Posts: 3967
Joined: 19 June 2007
Location: Paris

Distribution of clues in the grey zone

Postby denis_berthier » Tue Jul 02, 2013 4:40 am



Distribution of clues in the grey zone


This question arose in another thread.

I haven't defined the grey zone in a very precise way. Depending on how I make it more precise, SER >= 9.0 or W>=9, different calculations can be done but, as long as the number of clues is concerned, they don't lead to significantly different results.

If I consider the whole collection of 5,926,343 puzzles generated by the controlled-bias generator, 1258 have their W rating >= 9.
The raw distribution of clues for them is as follows:

Code: Select all
nb-clues   nb-instances  %
19         0
20         0
21         0
22         0
23         22            1.7
24         106           8.4
25         306           24.3
26         415           33.0
27         288           22.9
28         102           8.1
29         17            1.4
30         2             0.2
31         0
32         0
33         0
34         0
35         0
mean= 25.97
standard-deviation= 1.20


If I consider only the the first 3,037,717 for which I had computed the SER, 5615 have their SER >= 9.0. The raw distribution of clues for them is:

Code: Select all
nb-clues   nb-instances    %
19         0
20         0
21         0
22         2               0.04
23         46              0.8
24         416             7.4
25         1319            23.5
26         1915            34.1
27         1380            24.6
28         440             7.83
29         90              1.6
30         7               0.1
31         0
32         0
33         0
34         0
35         0
mean= 26.05
standard-deviation= 1.15



For comparison, I recall the data for the whole cb-sample (see p.43 of the pdf in the "real distribution" thread):

Code: Select all
nb-clues  nb-instances     %               
20        2                3.7e-05         
21        164              0.0027         
22        6,651            0.1124         
23        110,103          1.858         
24        704,089          11.88       
25        1,814,413        30.62         
26        2,002,349        33.79         
27        1,007,700        17.00         
28        247,259          4.172         
29        31,449           0.531         
30        2,088            0.0352       
31        74               0.00125       
32        2                3.37e-05     
mean= 25.67
standard-deviation= 1.12

denis_berthier
2010 Supporter
 
Posts: 3967
Joined: 19 June 2007
Location: Paris

W and gW ratings in the grey zone

Postby denis_berthier » Fri Jul 26, 2013 7:33 am



W and gW ratings in the grey zone


Long ago, I've shown that the W and gW ratings are rarely different when the W rating is finite and, when they differ, the difference is small (W-gW = 1 or 2).
Indeed, for the 5,926,343 puzzles I generated with the controlled-bias generator (all of which have finite W), there was a difference in only 0.23% cases.
In this post, I'll use raw stats for the cb-collection (I won't compute unbiased stats from them) - but this is OK as I'm only interested in orders of magnitude.

If I define the grey zone as the set of puzzles with W >= 9 (*), then the grey proportion in the above collection is only 0.021%. But this is still 1258 puzzles, enough to do some stats.
* a definition stricter (**) than what I first proposed in terms of SER >= 9, and more consistent with my approach.
** Strictly speaking, it is stricter on the lower side but broader on the upper side (no SER < 10.5 restriction here), but this will play no role here. In particular, this includes puzzles not in T&E(1), but these are still rarer (about 1 in 30,000,000 puzzles, i.e. about 0.016% of the grey puzzles) and they can therefore play no significant role in the global stats for the grey zone.

Now, we can ask: how often do the W and gW ratings differ in the grey zone? As the grey zone is a very small subset of the minimal puzzles, the result could be very different from that obtained for the whole collection.
And it is indeed very different: 31.8% (instead of 0.23%).
That it is larger is not really surprising, as the possibilities for extending a partial chain with a right-linking g-candidate when it can't be extended with a candidate increase with the length of the partial chain. The new thing is, the difference is now quantified.
What doesn't change is the maximum difference between W(P) and gW(P) for a puzzle P: 2.
As any rating system, the W and gW ratings can only be meaningful only statistically and a small difference in rating is not very meaningful. This entails that, even in the grey zone, the W rating remains in the mean a good estimate of the difficulty of a puzzle.

How does this result change if we consider a still stricter subset, i.e. W>=10 (0.0062% of the cb-collection)? We get 30.1%, not significantly different, considering the small size of the resulting sample.
denis_berthier
2010 Supporter
 
Posts: 3967
Joined: 19 June 2007
Location: Paris

PreviousNext

Return to Advanced solving techniques