STATISTICAL COMPARISONS OF VARIOUS RATINGS
In my book PBCS2, section 6.6, I wrote:In 10,000 puzzles tested, only 20 (0.2%) have different W and B ratings. Moreover, the maximum length of whips in a single resolution path using only loopless whips and obtained by the “simplest first” strategy is a good approximation of both the W and B ratings.
This post will make this old result more precise and extend it to more ratings.
As in the above results of [PBCS], whips and g-whips are supposed to be loopless (i.e. simpler than if not — which makes the results still more interesting). [Loopless would be a meaningless condition for braids.]
The collection used for the following results is made of the first 21,375 puzzles in my controlled-bias collection of 5,926,343 minimal puzzles. It is itself a controlled-bias collection. It is thus less biased than the top-down collection used for the comparison in [PBCS]. But the same results hold.
Details (in particular, the detailed lists of all the ratings and the detailed differences between them) can be found in the Examples/Sudoku-examples/cbg-000 folder of CSP-Rules-V2.1 on GitHub (https://github.com/denis-berthier/CSP-Rules-V2.1).
This cbg-000 collection can then become a reference for any quick statistical comparison with any other ratings.
(The full collection of 5,926,343 minimal controlled-bias puzzles is also on GitHub — https://github.com/denis-berthier/Controlled-bias_Sudoku_generator_and_collection — but it has only the W ratings).
1) COMPARISONS INVOLVING NO SUBSETS:
W vs B:
Note that one must always have W ≥ B
In reality, there are only 65 differences, i.e. a proportion of 0,30% differences.
And there are only 4 cases with difference > 1 (0,019%) and none with difference > 2.
W vs gW
Note that one must always have W ≥ gW
In reality, there are only 48 differences, i.e. a proportion of 0,22% differences.
And there are only 5 cases with difference > 1 (0,023%) and only one with difference > 2.
W vs FW:
Note that one must always have W ≥ FW
In reality, there are only 7 differences, i.e. a proportion of 0,03% differences.
And there is only 1 case with difference 2 (0,005%) and no case with difference > 2.
2) COMPARISONS INVOLVING SUBSETS BUT NO FINNED FISH
W vs S+W:
Note that one must always have W ≥ S+W
In reality, there are only 18 differences, i.e. a proportion of 0,084% differences
And there are only 2 cases with difference >1 (0,009%) and only one with difference > 2.
gW vs S+gW:
Note that one must always have gW ≥ S+gW
In reality, there are only 16 differences, i.e. a proportion of 0,075% differences
And there is only 1 case with difference > 1 (0,0047%) and none with difference > 2.
The differences are the same as before (W vs S+W), minus two cases (#41 and #2862).
For all these differences (i.e. not including the above 2), W = gW. The 'W vs gW' and 'W vs S+W' differences are "quasi-independent".
S+W vs gW:
Note that there is no a priori relation between S+W and gW.
In reality, there are only 62 differences, i.e. a proportion of 0,29% differences.
And there are only 6 cases with difference > 1 (0,028%) and only 2 with difference > 2.
S+W vs S+gW:
Note that one must always have S+W ≥ S+gW
In reality, there are only 47 differences, i.e. a proportion of 0,22% differences
And there are only 4 cases with difference > 1 (0,019%) and no case with difference > 2.
W vs S+gW:
Note that one must always have W ≥ S+gW
In reality, there are only 62 differences, i.e. a proportion of 0,29% differences.
Notice that delta(W, gW) = 48, delta(gW, S+gW) = 16 and delta(W, S+gW) = 62.
delta(W, S+gW) is almost equal to delta(W, gW) + delta(gW, S+gW) = 64.
This confirms the quasi-independence of the S and g differences.
There are only 6 cases with difference > 1 (0,028%) and none with difference > 2.
3) COMPARISONS INVOLVING SUBSETS AND FINNED FISH
S+W vs SFin+W:
Note that one must always have S+W ≥ SFin+W
In reality, there are no differences.
S+gW vs SFin+gW:
Note that one must always have S+gW ≥ SFin+gW
In reality, there are no differences.
4) WARNING
As any statistical results, the above ones are valid only in random collections with the same distribution as the controlled-bias one. I think they can be extended with little change to any unbiased collection.
However, I have proven in [PBCS] that there are 2.5477*10^25 non isomorphic minimal Sudoku puzzles (with 0.065% relative error) — a huge number that leaves a lot of possibilities to find exceptional cases wrt any statistical result.
In particular, it is not difficult to find handmade collections of puzzles that have lots of Subsets and for which the proportion of differing W and S+W ratings is larger.
Similarly, it is not difficult to find collections of puzzles with much larger mean difficulty than in a random one. The longer the length of whips/braids necessary to solve a puzzle, the more the proportion of differing W and B ratings will grow.
Finally, it is not difficult to find exceptional puzzles where the difference between the ratings is much larger than 2.


