## Relation between difficulty and number of candidates

Everything about Sudoku that doesn't fit in one of the other sections

### Relation between difficulty and number of candidates

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
PRELIMINARY REMARKS AND GENERAL CONTEXT

1) Number of clues vs number of candidates
I've shown long ago that there is no meaningful correlation between the number of clues of a minimal puzzle and its difficulty. (For more precise results, see the references.)
However, I've found recently that there is some larger (though still not very meaningful) correlation between the number of candidates and the difficulty of a puzzle. Let this remain vague for now, it's the purpose of this thread to make it clearer.
Notice that it is much more natural to study the impact of the number of candidates rather than of the number of clues, because candidates are the only things that play any role during resolution. By applying Singles during resolution, newly asserted values allow to immediately eliminate more candidates via direct contradictions, but once this is done they don't play any role any longer. Indeed, one could consider the Singles rule and the associated eliminations by direct contradictions as a single elimination rule with multiple targets.

2) Vagueness of the word "difficulty"
The title of this thread has some (more or less deliberate) vagueness: there's no unique measure of difficulty of a puzzle and there can't be any unique measure of difficulty. "Difficulty" depends on our goals (e.g. fast vs pattern-based solving, or in the latter case hardest step vs full path).
My personal analyses rely on difficulty interpreted as difficulty of the hardest step and measured by the SER or by the B/BpB ratings (B or BpB depending on whether the puzzle is in T1E(1) or T&E(2)).
But this thread is open to similar studies wrt to alternative ratings (e.g. q1, q2, ...), provided that they are well defined.
Vague impressions or rantings about the SER based on a few handfuls of special puzzles are not the topic of this thread.

3) "Number of candidates" has some vagueness also,
but that's easier to deal with. A priori, one could consider the number of candidates:
- either at the very start (i.e. keeping only those that are not in a direct contradiction with a given),
- or after applying all the available Singles,
- or after applying all the Singles and whips[1].
My analyses rely mainly on the 3rd option. Here is why.

4) Singles and g-Singles
Unless one concentrates on totally trivial puzzles, initial Singles should play no role in estimating the difficulty of a puzzle; as a result, candidates they allow to eliminate should play no role either.
Whips[1] could be considered as g-Singles and totally forgotten, by introducing more CSP-Variables than the rc, rn, cn and bn ones, namely CSP-Variables related to mini-rows and mini-columns. In all my developments, I could have chosen this option. I haven't because not choosing it:
- allows a simpler approach;
- allows making a clear distinction between whips and g-whips;
- generally speaking, allow cleaner developments;
- in some logic games (e.g.Kakuro), g-candidates can be very complex and it's much better not to have them at the start.
However, one should keep in mind that what whips[1] fundamentally do is direct eliminations associated to g-Singles and that it seems reasonable to focus on the number of candidates after they have been applied together with ordinary Singles.

5) No correlation between the number of clues and the number of candidates
With all the previous points stated, the first result is:
There is no meaningful correlation between the number of clues and the number of candidates after W1.
More precisely, for the controlled-bias cbg000 collection, the correlation coefficient is 0.15.
At first sight, this result may be surprising, but it may seem trivial if you consider that the number of clues can in no way predict how many Singles or whips[1] there will be at the start.
This result also justifies this thread, because it shows that knowing the effects of the number of clues doesn't provide much knowledge about the effects of the number of candidates.

6) References (all are available as pdf's on Researchgate and in my CSP-Rules software on GitHub):
[Berthier 2009]: BERTHIER D., Unbiased Statistics of a CSP - A Controlled-Bias Generator, International Joint Conferences on Computer, Information, Systems Sciences and Engineering (CISSE 09), December 4-12, 2009, Springer. Re-published as a chapter of "Innovations in Computing Sciences and Software Engineering", Khaled Elleithy Editor, pp. 11-17, Springer, 2010.
[Berthier 2011]: BERTHIER D., Constraint Resolution Theories, Lulu.com Publishers, November 2011.
[Berthier 2012]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (First Edition), Lulu.com Publishers, November 2012.
[Berthier 2015]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Second Edition), Lulu.com Publishers, July 2015.
[Berthier 2015]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Third Edition), Lulu.com Publishers, November 2021.
Last edited by denis_berthier on Mon Feb 28, 2022 5:07 pm, edited 1 time in total.
denis_berthier
2010 Supporter

Posts: 3536
Joined: 19 June 2007
Location: Paris

### Re: Relation between difficulty and number of candidates

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
RESULTS FOR THE CONTROLLED-BIAS COLLECTION

I've been busy with other things, but here are finally the results for the 21375 puzzles of the small cbg-000 part of controlled-bias collection.

Here, I've used the W rating instead of SER because there are fewer levels. I could have used slices of SER, such as 4 <= SER < 5, but that wouldn't change much the global result.

"sd" means standard deviation.
Precision is not very high, but I think that this is largely enough to show that there is a very clear general trend: in the controlled-bias collection, the mean number of candidates after Singles and whips[1] increases with the W rating.

Code: Select all
`W= 9     nb-puzzles=    4     mean-nb-cands= 169.5   sd-nb-cands= 1.8W= 8     nb-puzzles=   12     mean-nb-cands= 170.9   sd-nb-cands= 11.7W= 7     nb-puzzles=   52     mean-nb-cands= 166.2   sd-nb-cands= 16.6W= 6     nb-puzzles=  168     mean-nb-cands= 157.9   sd-nb-cands= 19.5W= 5     nb-puzzles=  788     mean-nb-cands= 149.1   sd-nb-cands= 24.6W= 4     nb-puzzles= 3690     mean-nb-cands= 128.3   sd-nb-cands= 31.0W= 3     nb-puzzles= 4305     mean-nb-cands= 109.8   sd-nb-cands= 34.1W= 2     nb-puzzles= 2771     mean-nb-cands= 109.3   sd-nb-cands= 38.2`

Two remarks:
1) the standard deviation is very high, showing that this result is of little practical help for predicting the difficulty of a puzzle.
2) however, this may be very useful for checking statistical properties of a collection (see my next post).
.
Last edited by denis_berthier on Thu Mar 03, 2022 8:21 am, edited 2 times in total.
denis_berthier
2010 Supporter

Posts: 3536
Joined: 19 June 2007
Location: Paris

### Re: Relation between difficulty and number of candidates

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
RESULTS FOR THE HARDEST COLLECTION
.
The following refers to the current version of the hardest collection (ph_2010) and it doesn't take into account the new puzzles with very high ratings (essentially mith's). But that's irrelevant here.
Regarding the number of candidates, I have computed them after Singles and whips[1] (but doing it after only Singles wouldn't change much to the general conclusions).
(The precision in the table below is absurd, but I'm lazy to crop each result manually.)
"sd" means standard deviation.

Code: Select all
`SER= 11.9     nb-puzzles=       9    mean-nb-cands= 234.555555555556   sd-nb-cands= 5.77564063038573SER= 11.8     nb-puzzles=      61    mean-nb-cands= 236.032786885246   sd-nb-cands= 6.97763925768015SER= 11.7     nb-puzzles=     138    mean-nb-cands= 235.376811594203   sd-nb-cands= 9.29640954421292SER= 11.6     nb-puzzles=     393    mean-nb-cands= 224.735368956743   sd-nb-cands= 15.2200549339413SER= 11.5     nb-puzzles=   1 071    mean-nb-cands= 225.515406162465   sd-nb-cands= 11.8234560820402SER= 11.4     nb-puzzles=   2 612    mean-nb-cands= 224.942189892802   sd-nb-cands= 12.0760459024426SER= 11.3     nb-puzzles=  15 127    mean-nb-cands= 218.818404177959   sd-nb-cands= 15.0062417464236SER= 11.2     nb-puzzles=  28 384    mean-nb-cands= 220.340684892896   sd-nb-cands= 14.1528507406146SER= 11.1     nb-puzzles= 184 734    mean-nb-cands= 197.189391232798   sd-nb-cands= 26.9153506659259SER= 11.0     nb-puzzles= 932 949    mean-nb-cands= 176.137465177623   sd-nb-cands= 20.4984525042759SER= 10.9     nb-puzzles= 274 400    mean-nb-cands= 219.148837463557   sd-nb-cands= 12.5406000027526SER= 10.8     nb-puzzles= 269 562    mean-nb-cands= 218.315942158022   sd-nb-cands= 13.1106080080684SER= 10.7     nb-puzzles=  77 364    mean-nb-cands= 226.410811230027   sd-nb-cands= 9.8060107074916SER= 10.6     nb-puzzles= 321 092    mean-nb-cands= 224.069213807882   sd-nb-cands= 9.1219581130378`

For SER ≥ 11.0, it shows some slow and irregular decrease in the number of candidates, compatible with the results of my previous post.

The above results for very high SER (≥ 11.0) confirm both the general trend and the conclusion that the number of candidates after W1 is not a good predictor of difficulty of a puzzle. As I have written previously in the "hardest" thread, in the search for the "hardest" puzzles, any pre-filter based on a high threshold for the number of candidates is a very bad idea.

As for the results for SER < 11.0, drastically contradicting the general trend, I think they are the result of a very strong bias in the collection and this part of the collection is not worth much with regard to the "hardest" qualification.

:added the 10.6 line. I don't plan to go lower, because that's largely enough to show that the part below 11.0 is totally biased.
+ a few minor changes after adding real content to the previous post.
Last edited by denis_berthier on Thu Mar 03, 2022 9:16 am, edited 5 times in total.
denis_berthier
2010 Supporter

Posts: 3536
Joined: 19 June 2007
Location: Paris

### Re: Relation between difficulty and number of candidates

.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
ADDITIONAL RESULTS FOR THE CBG-000 COLLECTION

1) W-rating vs number of clues
In the first post of this thread, I said there's no correlation (for minimal puzzles) between the number of clues and the difficulty.
Until now, I had published only general correlation results.
Here is how the relation looks for the cbg-000 collection when difficulty is measured by the W rating. The absence of correlation couldn't be made clearer.

Code: Select all
`W= 9     nb-puzzles=    4     mean-nb-clues= 25.50   sd-nb-clues= 1.12W= 8     nb-puzzles=   12     mean-nb-clues= 25.67   sd-nb-clues= 0.94W= 7     nb-puzzles=   52     mean-nb-clues= 25.85   sd-nb-clues= 1.01W= 6     nb-puzzles=  168     mean-nb-clues= 26.15   sd-nb-clues= 1.21W= 5     nb-puzzles=  788     mean-nb-clues= 25.99   sd-nb-clues= 1.15W= 4     nb-puzzles= 3690     mean-nb-clues= 25.94   sd-nb-clues= 1.12W= 3     nb-puzzles= 4305     mean-nb-clues= 25.84   sd-nb-clues= 1.09W= 2     nb-puzzles= 2771     mean-nb-clues= 25.62   sd-nb-clues= 1.11W= 1     nb-puzzles= 2093     mean-nb-clues= 25.47   sd-nb-clues= 1.08W= 0     nb-puzzles= 7489     mean-nb-clues= 25.45   sd-nb-clues= 1.06`

1) W-rating vs number of candidates after Singles
In the second post of this thread, I said using the number of candidates after BRT (i.e. Singles) instead of after W1 (i.e. Singles + whips[1]) wouldn't change "much" in the general trend.
Here is how the relation looks for the cbg-000 collection when difficulty is measured by the W rating:

Code: Select all
`W= 9     nb-puzzles=    4     mean-nb-cands= 177.25   sd-nb-cands= 6.14W= 8     nb-puzzles=   12     mean-nb-cands= 176.67   sd-nb-cands= 10.59W= 7     nb-puzzles=   52     mean-nb-cands= 171.02   sd-nb-cands= 15.67W= 6     nb-puzzles=  168     mean-nb-cands= 164.55   sd-nb-cands= 19.54W= 5     nb-puzzles=  788     mean-nb-cands= 157.66   sd-nb-cands= 24.12W= 4     nb-puzzles= 3690     mean-nb-cands= 138.89   sd-nb-cands= 31.36W= 3     nb-puzzles= 4305     mean-nb-cands= 122.12   sd-nb-cands= 36.02W= 2     nb-puzzles= 2771     mean-nb-cands= 121.61   sd-nb-cands= 40.19W= 1     nb-puzzles= 2093     mean-nb-cands= 125.63   sd-nb-cands= 33.30`

Of course, there are more candidates after Singles than after W1, but the general trend is very similar.
.
denis_berthier
2010 Supporter

Posts: 3536
Joined: 19 June 2007
Location: Paris

### Re: Relation between difficulty and number of candidates

.
After the discovery that mith's Loki puzzle is not in T&E(2) and the subsequent explosion of the number of puzzles in T&E(3), one may wonder how the above results apply to them.
Based on mith's database of 63,137 min-expand puzzles in T&E(3), one has:

Code: Select all
`Results after W1: mean-nb-cands= 185.14         sd-nb-cands= 11.16       max-nb-cands= 234          min-nb-cands= 136`

which clearly puts mith's collection above the cbg-000 one and among the low part of the 11+ hardest.
Probably, this only reflects their SER distribution, as the upper part of the collection has a much higher number of candidates than this mean value.

What this also shows is, with respect to the number of candidates, these T&E(3) puzzles don't seem to have anything special among the hardest.
.
denis_berthier
2010 Supporter

Posts: 3536
Joined: 19 June 2007
Location: Paris