The New Sudoku Players' Forum

Website · by **denis_berthier** » Mon Feb 28, 2022 5:43 am

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
PRELIMINARY REMARKS AND GENERAL CONTEXT

1) Number of clues vs number of candidates
I've shown long ago that there is no meaningful correlation between the number of clues of a minimal puzzle and its difficulty. (For more precise results, see the references.)
However, I've found recently that there is some larger (though still not very meaningful) correlation between the number of candidates and the difficulty of a puzzle. Let this remain vague for now, it's the purpose of this thread to make it clearer.
Notice that it is much more natural to study the impact of the number of candidates rather than of the number of clues, because candidates are the only things that play any role during resolution. By applying Singles during resolution, newly asserted values allow to immediately eliminate more candidates via direct contradictions, but once this is done they don't play any role any longer. Indeed, one could consider the Singles rule and the associated eliminations by direct contradictions as a single elimination rule with multiple targets.

2) Vagueness of the word "difficulty"
The title of this thread has some (more or less deliberate) vagueness: there's no unique measure of difficulty of a puzzle and there can't be any unique measure of difficulty. "Difficulty" depends on our goals (e.g. fast vs pattern-based solving, or in the latter case hardest step vs full path).
My personal analyses rely on difficulty interpreted as difficulty of the hardest step and measured by the SER or by the B/BpB ratings (B or BpB depending on whether the puzzle is in T1E(1) or T&E(2)).
But this thread is open to similar studies wrt to alternative ratings (e.g. q1, q2, ...), provided that they are well defined.
Vague impressions or rantings about the SER based on a few handfuls of special puzzles are not the topic of this thread.

3) "Number of candidates" has some vagueness also,
but that's easier to deal with. A priori, one could consider the number of candidates:
- either at the very start (i.e. keeping only those that are not in a direct contradiction with a given),
- or after applying all the available Singles,
- or after applying all the Singles and whips[1].
My analyses rely mainly on the 3rd option. Here is why.

4) Singles and g-Singles
Unless one concentrates on totally trivial puzzles, initial Singles should play no role in estimating the difficulty of a puzzle; as a result, candidates they allow to eliminate should play no role either.
Whips[1] could be considered as g-Singles and totally forgotten, by introducing more CSP-Variables than the rc, rn, cn and bn ones, namely CSP-Variables related to mini-rows and mini-columns. In all my developments, I could have chosen this option. I haven't because not choosing it:
- allows a simpler approach;
- allows making a clear distinction between whips and g-whips;
- generally speaking, allow cleaner developments;
- in some logic games (e.g.Kakuro), g-candidates can be very complex and it's much better not to have them at the start.
However, one should keep in mind that what whips[1] fundamentally do is direct eliminations associated to g-Singles and that it seems reasonable to focus on the number of candidates after they have been applied together with ordinary Singles.

5) No correlation between the number of clues and the number of candidates
With all the previous points stated, the first result is:
There is no meaningful correlation between the number of clues and the number of candidates after W1.
More precisely, for the controlled-bias cbg000 collection, the correlation coefficient is 0.15.
At first sight, this result may be surprising, but it may seem trivial if you consider that the number of clues can in no way predict how many Singles or whips[1] there will be at the start.
This result also justifies this thread, because it shows that knowing the effects of the number of clues doesn't provide much knowledge about the effects of the number of candidates.

6) References (all are available as pdf's on Researchgate and in my CSP-Rules software on GitHub):
[Berthier 2009]: BERTHIER D., Unbiased Statistics of a CSP - A Controlled-Bias Generator, International Joint Conferences on Computer, Information, Systems Sciences and Engineering (CISSE 09), December 4-12, 2009, Springer. Re-published as a chapter of "Innovations in Computing Sciences and Software Engineering", Khaled Elleithy Editor, pp. 11-17, Springer, 2010.
[Berthier 2011]: BERTHIER D., Constraint Resolution Theories, Lulu.com Publishers, November 2011.
[Berthier 2012]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (First Edition), Lulu.com Publishers, November 2012.
[Berthier 2015]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Second Edition), Lulu.com Publishers, July 2015.
[Berthier 2015]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Third Edition), Lulu.com Publishers, November 2021.

Website · by **denis_berthier** » Mon Feb 28, 2022 5:44 am

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
RESULTS FOR THE CONTROLLED-BIAS COLLECTION

I've been busy with other things, but here are finally the results for the 21375 puzzles of the small cbg-000 part of controlled-bias collection.

Here, I've used the W rating instead of SER because there are fewer levels. I could have used slices of SER, such as 4 <= SER < 5, but that wouldn't change much the global result.

"sd" means standard deviation.
Precision is not very high, but I think that this is largely enough to show that there is a very clear general trend: in the controlled-bias collection, the mean number of candidates after Singles and whips[1] increases with the W rating.

Code: Select all: W= 9 nb-puzzles= 4 mean-nb-cands= 169.5 sd-nb-cands= 1.8 W= 8 nb-puzzles= 12 mean-nb-cands= 170.9 sd-nb-cands= 11.7 W= 7 nb-puzzles= 52 mean-nb-cands= 166.2 sd-nb-cands= 16.6 W= 6 nb-puzzles= 168 mean-nb-cands= 157.9 sd-nb-cands= 19.5 W= 5 nb-puzzles= 788 mean-nb-cands= 149.1 sd-nb-cands= 24.6 W= 4 nb-puzzles= 3690 mean-nb-cands= 128.3 sd-nb-cands= 31.0 W= 3 nb-puzzles= 4305 mean-nb-cands= 109.8 sd-nb-cands= 34.1 W= 2 nb-puzzles= 2771 mean-nb-cands= 109.3 sd-nb-cands= 38.2

Two remarks:
1) the standard deviation is very high, showing that this result is of little practical help for predicting the difficulty of a puzzle.
2) however, this may be very useful for checking statistical properties of a collection (see my next post).
.

Website · by **denis_berthier** » Mon Feb 28, 2022 5:45 am

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
RESULTS FOR THE HARDEST COLLECTION
.
This is an adaptation of a post I made in the "hardest" thread.
The following refers to the current version of the hardest collection (ph_2010) and it doesn't take into account the new puzzles with very high ratings (essentially mith's). But that's irrelevant here.
Regarding the number of candidates, I have computed them after Singles and whips[1] (but doing it after only Singles wouldn't change much to the general conclusions).
(The precision in the table below is absurd, but I'm lazy to crop each result manually.)
"sd" means standard deviation.

Code: Select all: SER= 11.9 nb-puzzles= 9 mean-nb-cands= 234.555555555556 sd-nb-cands= 5.77564063038573 SER= 11.8 nb-puzzles= 61 mean-nb-cands= 236.032786885246 sd-nb-cands= 6.97763925768015 SER= 11.7 nb-puzzles= 138 mean-nb-cands= 235.376811594203 sd-nb-cands= 9.29640954421292 SER= 11.6 nb-puzzles= 393 mean-nb-cands= 224.735368956743 sd-nb-cands= 15.2200549339413 SER= 11.5 nb-puzzles= 1 071 mean-nb-cands= 225.515406162465 sd-nb-cands= 11.8234560820402 SER= 11.4 nb-puzzles= 2 612 mean-nb-cands= 224.942189892802 sd-nb-cands= 12.0760459024426 SER= 11.3 nb-puzzles= 15 127 mean-nb-cands= 218.818404177959 sd-nb-cands= 15.0062417464236 SER= 11.2 nb-puzzles= 28 384 mean-nb-cands= 220.340684892896 sd-nb-cands= 14.1528507406146 SER= 11.1 nb-puzzles= 184 734 mean-nb-cands= 197.189391232798 sd-nb-cands= 26.9153506659259 SER= 11.0 nb-puzzles= 932 949 mean-nb-cands= 176.137465177623 sd-nb-cands= 20.4984525042759 SER= 10.9 nb-puzzles= 274 400 mean-nb-cands= 219.148837463557 sd-nb-cands= 12.5406000027526 SER= 10.8 nb-puzzles= 269 562 mean-nb-cands= 218.315942158022 sd-nb-cands= 13.1106080080684 SER= 10.7 nb-puzzles= 77 364 mean-nb-cands= 226.410811230027 sd-nb-cands= 9.8060107074916 SER= 10.6 nb-puzzles= 321 092 mean-nb-cands= 224.069213807882 sd-nb-cands= 9.1219581130378

For SER ≥ 11.0, it shows some slow and irregular decrease in the number of candidates, compatible with the results of my previous post.

The above results for very high SER (≥ 11.0) confirm both the general trend and the conclusion that the number of candidates after W1 is not a good predictor of difficulty of a puzzle. As I have written previously in the "hardest" thread, in the search for the "hardest" puzzles, any pre-filter based on a high threshold for the number of candidates is a very bad idea.

As for the results for SER < 11.0, drastically contradicting the general trend, I think they are the result of a very strong bias in the collection and this part of the collection is not worth much with regard to the "hardest" qualification.

[Edit]:added the 10.6 line. I don't plan to go lower, because that's largely enough to show that the part below 11.0 is totally biased.
+ a few minor changes after adding real content to the previous post.

Website · by **denis_berthier** » Sat Mar 05, 2022 5:13 am

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
ADDITIONAL RESULTS FOR THE CBG-000 COLLECTION

1) W-rating vs number of clues
In the first post of this thread, I said there's no correlation (for minimal puzzles) between the number of clues and the difficulty.
Until now, I had published only general correlation results.
Here is how the relation looks for the cbg-000 collection when difficulty is measured by the W rating. The absence of correlation couldn't be made clearer.

Code: Select all: W= 9 nb-puzzles= 4 mean-nb-clues= 25.50 sd-nb-clues= 1.12 W= 8 nb-puzzles= 12 mean-nb-clues= 25.67 sd-nb-clues= 0.94 W= 7 nb-puzzles= 52 mean-nb-clues= 25.85 sd-nb-clues= 1.01 W= 6 nb-puzzles= 168 mean-nb-clues= 26.15 sd-nb-clues= 1.21 W= 5 nb-puzzles= 788 mean-nb-clues= 25.99 sd-nb-clues= 1.15 W= 4 nb-puzzles= 3690 mean-nb-clues= 25.94 sd-nb-clues= 1.12 W= 3 nb-puzzles= 4305 mean-nb-clues= 25.84 sd-nb-clues= 1.09 W= 2 nb-puzzles= 2771 mean-nb-clues= 25.62 sd-nb-clues= 1.11 W= 1 nb-puzzles= 2093 mean-nb-clues= 25.47 sd-nb-clues= 1.08 W= 0 nb-puzzles= 7489 mean-nb-clues= 25.45 sd-nb-clues= 1.06

1) W-rating vs number of candidates after Singles
In the second post of this thread, I said using the number of candidates after BRT (i.e. Singles) instead of after W1 (i.e. Singles + whips[1]) wouldn't change "much" in the general trend.
Here is how the relation looks for the cbg-000 collection when difficulty is measured by the W rating:

Code: Select all: W= 9 nb-puzzles= 4 mean-nb-cands= 177.25 sd-nb-cands= 6.14 W= 8 nb-puzzles= 12 mean-nb-cands= 176.67 sd-nb-cands= 10.59 W= 7 nb-puzzles= 52 mean-nb-cands= 171.02 sd-nb-cands= 15.67 W= 6 nb-puzzles= 168 mean-nb-cands= 164.55 sd-nb-cands= 19.54 W= 5 nb-puzzles= 788 mean-nb-cands= 157.66 sd-nb-cands= 24.12 W= 4 nb-puzzles= 3690 mean-nb-cands= 138.89 sd-nb-cands= 31.36 W= 3 nb-puzzles= 4305 mean-nb-cands= 122.12 sd-nb-cands= 36.02 W= 2 nb-puzzles= 2771 mean-nb-cands= 121.61 sd-nb-cands= 40.19 W= 1 nb-puzzles= 2093 mean-nb-cands= 125.63 sd-nb-cands= 33.30

Of course, there are more candidates after Singles than after W1, but the general trend is very similar.
.

Website · by **denis_berthier** » Wed Aug 31, 2022 7:02 am

.
After the discovery that mith's Loki puzzle is not in T&E(2) and the subsequent explosion of the number of puzzles in T&E(3), one may wonder how the above results apply to them.
Based on mith's database of 63,137 min-expand puzzles in T&E(3), one has:

Code: Select all: Results after W1: mean-nb-cands= 185.14 sd-nb-cands= 11.16 max-nb-cands= 234 min-nb-cands= 136

which clearly puts mith's collection above the cbg-000 one and among the low part of the 11+ hardest.
Probably, this only reflects their SER distribution, as the upper part of the collection has a much higher number of candidates than this mean value.

What this also shows is, with respect to the number of candidates, these T&E(3) puzzles don't seem to have anything special among the hardest.
.

The New Sudoku Players' Forum

Relation between difficulty and number of candidates

Relation between difficulty and number of candidates

Re: Relation between difficulty and number of candidates

Re: Relation between difficulty and number of candidates

Re: Relation between difficulty and number of candidates

Re: Relation between difficulty and number of candidates