The New Sudoku Players' Forum

Website · by **denis_berthier** » Tue Jul 02, 2013 4:40 am

Distribution of clues in the grey zone

This question arose in another thread.

I haven't defined the grey zone in a very precise way. Depending on how I make it more precise, SER >= 9.0 or W>=9, different calculations can be done but, as long as the number of clues is concerned, they don't lead to significantly different results.

If I consider the whole collection of 5,926,343 puzzles generated by the controlled-bias generator, 1258 have their W rating >= 9.
The raw distribution of clues for them is as follows:

Code: Select all: nb-clues nb-instances % 19 0 20 0 21 0 22 0 23 22 1.7 24 106 8.4 25 306 24.3 26 415 33.0 27 288 22.9 28 102 8.1 29 17 1.4 30 2 0.2 31 0 32 0 33 0 34 0 35 0 mean= 25.97 standard-deviation= 1.20

If I consider only the the first 3,037,717 for which I had computed the SER, 5615 have their SER >= 9.0. The raw distribution of clues for them is:

Code: Select all: nb-clues nb-instances % 19 0 20 0 21 0 22 2 0.04 23 46 0.8 24 416 7.4 25 1319 23.5 26 1915 34.1 27 1380 24.6 28 440 7.83 29 90 1.6 30 7 0.1 31 0 32 0 33 0 34 0 35 0 mean= 26.05 standard-deviation= 1.15

For comparison, I recall the data for the whole cb-sample (see p.43 of the pdf in the "real distribution" thread):

Code: Select all: nb-clues nb-instances % 20 2 3.7e-05 21 164 0.0027 22 6,651 0.1124 23 110,103 1.858 24 704,089 11.88 25 1,814,413 30.62 26 2,002,349 33.79 27 1,007,700 17.00 28 247,259 4.172 29 31,449 0.531 30 2,088 0.0352 31 74 0.00125 32 2 3.37e-05 mean= 25.67 standard-deviation= 1.12

Website · by **denis_berthier** » Fri Jul 26, 2013 7:33 am

W and gW ratings in the grey zone

Long ago, I've shown that the W and gW ratings are rarely different when the W rating is finite and, when they differ, the difference is small (W-gW = 1 or 2).
Indeed, for the 5,926,343 puzzles I generated with the controlled-bias generator (all of which have finite W), there was a difference in only 0.23% cases.
In this post, I'll use raw stats for the cb-collection (I won't compute unbiased stats from them) - but this is OK as I'm only interested in orders of magnitude.

If I define the grey zone as the set of puzzles with W >= 9 (*), then the grey proportion in the above collection is only 0.021%. But this is still 1258 puzzles, enough to do some stats.
* a definition stricter (**) than what I first proposed in terms of SER >= 9, and more consistent with my approach.
** Strictly speaking, it is stricter on the lower side but broader on the upper side (no SER < 10.5 restriction here), but this will play no role here. In particular, this includes puzzles not in T&E(1), but these are still rarer (about 1 in 30,000,000 puzzles, i.e. about 0.016% of the grey puzzles) and they can therefore play no significant role in the global stats for the grey zone.

Now, we can ask: how often do the W and gW ratings differ in the grey zone? As the grey zone is a very small subset of the minimal puzzles, the result could be very different from that obtained for the whole collection.
And it is indeed very different: 31.8% (instead of 0.23%).
That it is larger is not really surprising, as the possibilities for extending a partial chain with a right-linking g-candidate when it can't be extended with a candidate increase with the length of the partial chain. The new thing is, the difference is now quantified.
What doesn't change is the maximum difference between W(P) and gW(P) for a puzzle P: 2.
As any rating system, the W and gW ratings can only be meaningful only statistically and a small difference in rating is not very meaningful. This entails that, even in the grey zone, the W rating remains in the mean a good estimate of the difficulty of a puzzle.

How does this result change if we consider a still stricter subset, i.e. W>=10 (0.0062% of the cb-collection)? We get 30.1%, not significantly different, considering the small size of the resulting sample.

The New Sudoku Players' Forum

The Sudoku grey zone

Distribution of clues in the grey zone

W and gW ratings in the grey zone