by denis_berthier » Fri Jul 26, 2013 7:33 am
W and gW ratings in the grey zone
Long ago, I've shown that the W and gW ratings are rarely different when the W rating is finite and, when they differ, the difference is small (W-gW = 1 or 2).
Indeed, for the 5,926,343 puzzles I generated with the controlled-bias generator (all of which have finite W), there was a difference in only 0.23% cases.
In this post, I'll use raw stats for the cb-collection (I won't compute unbiased stats from them) - but this is OK as I'm only interested in orders of magnitude.
If I define the grey zone as the set of puzzles with W >= 9 (*), then the grey proportion in the above collection is only 0.021%. But this is still 1258 puzzles, enough to do some stats.
* a definition stricter (**) than what I first proposed in terms of SER >= 9, and more consistent with my approach.
** Strictly speaking, it is stricter on the lower side but broader on the upper side (no SER < 10.5 restriction here), but this will play no role here. In particular, this includes puzzles not in T&E(1), but these are still rarer (about 1 in 30,000,000 puzzles, i.e. about 0.016% of the grey puzzles) and they can therefore play no significant role in the global stats for the grey zone.
Now, we can ask: how often do the W and gW ratings differ in the grey zone? As the grey zone is a very small subset of the minimal puzzles, the result could be very different from that obtained for the whole collection.
And it is indeed very different: 31.8% (instead of 0.23%).
That it is larger is not really surprising, as the possibilities for extending a partial chain with a right-linking g-candidate when it can't be extended with a candidate increase with the length of the partial chain. The new thing is, the difference is now quantified.
What doesn't change is the maximum difference between W(P) and gW(P) for a puzzle P: 2.
As any rating system, the W and gW ratings can only be meaningful only statistically and a small difference in rating is not very meaningful. This entails that, even in the grey zone, the W rating remains in the mean a good estimate of the difficulty of a puzzle.
How does this result change if we consider a still stricter subset, i.e. W>=10 (0.0062% of the cb-collection)? We get 30.1%, not significantly different, considering the small size of the resulting sample.