.
RELATIONSHIP BETWEEN THE DIFFICULTY OF A PUZZLE AND THE NUMBER OF CANDIDATES
PRELIMINARY REMARKS AND GENERAL CONTEXT
1) Number of clues vs number of candidates
I've shown long ago that there is no meaningful correlation between the number of clues of a minimal puzzle and its difficulty. (For more precise results, see the references.)
However, I've found recently that there is some larger (though still not very meaningful) correlation between the number of candidates and the difficulty of a puzzle. Let this remain vague for now, it's the purpose of this thread to make it clearer.
Notice that it is much more natural to study the impact of the number of candidates rather than of the number of clues, because candidates are the only things that play any role during resolution. By applying Singles during resolution, newly asserted values allow to immediately eliminate more candidates via direct contradictions, but once this is done they don't play any role any longer. Indeed, one could consider the Singles rule and the associated eliminations by direct contradictions as a single elimination rule with multiple targets.
2) Vagueness of the word "difficulty"
The title of this thread has some (more or less deliberate) vagueness: there's no unique measure of difficulty of a puzzle and there can't be any unique measure of difficulty. "Difficulty" depends on our goals (e.g. fast vs pattern-based solving, or in the latter case hardest step vs full path).
My personal analyses rely on difficulty interpreted as difficulty of the hardest step and measured by the SER or by the B/BpB ratings (B or BpB depending on whether the puzzle is in T1E(1) or T&E(2)).
But this thread is open to similar studies wrt to alternative ratings (e.g. q1, q2, ...), provided that they are well defined.
Vague impressions or rantings about the SER based on a few handfuls of special puzzles are not the topic of this thread.
3) "Number of candidates" has some vagueness also,
but that's easier to deal with. A priori, one could consider the number of candidates:
- either at the very start (i.e. keeping only those that are not in a direct contradiction with a given),
- or after applying all the available Singles,
- or after applying all the Singles and whips[1].
My analyses rely mainly on the 3rd option. Here is why.
4) Singles and g-Singles
Unless one concentrates on totally trivial puzzles, initial Singles should play no role in estimating the difficulty of a puzzle; as a result, candidates they allow to eliminate should play no role either.
Whips[1] could be considered as g-Singles and totally forgotten, by introducing more CSP-Variables than the rc, rn, cn and bn ones, namely CSP-Variables related to mini-rows and mini-columns. In all my developments, I could have chosen this option. I haven't because not choosing it:
- allows a simpler approach;
- allows making a clear distinction between whips and g-whips;
- generally speaking, allow cleaner developments;
- in some logic games (e.g.Kakuro), g-candidates can be very complex and it's much better not to have them at the start.
However, one should keep in mind that what whips[1] fundamentally do is direct eliminations associated to g-Singles and that it seems reasonable to focus on the number of candidates after they have been applied together with ordinary Singles.
5) No correlation between the number of clues and the number of candidates
With all the previous points stated, the first result is:
There is no meaningful correlation between the number of clues and the number of candidates after W1.
More precisely, for the controlled-bias cbg000 collection, the correlation coefficient is 0.15.
At first sight, this result may be surprising, but it may seem trivial if you consider that the number of clues can in no way predict how many Singles or whips[1] there will be at the start.
This result also justifies this thread, because it shows that knowing the effects of the number of clues doesn't provide much knowledge about the effects of the number of candidates.
6) References (all are available as pdf's on Researchgate and in my CSP-Rules software on GitHub):
[Berthier 2009]: BERTHIER D., Unbiased Statistics of a CSP - A Controlled-Bias Generator, International Joint Conferences on Computer, Information, Systems Sciences and Engineering (CISSE 09), December 4-12, 2009, Springer. Re-published as a chapter of "Innovations in Computing Sciences and Software Engineering", Khaled Elleithy Editor, pp. 11-17, Springer, 2010.
[Berthier 2011]: BERTHIER D., Constraint Resolution Theories, Lulu.com Publishers, November 2011.
[Berthier 2012]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (First Edition), Lulu.com Publishers, November 2012.
[Berthier 2015]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Second Edition), Lulu.com Publishers, July 2015.
[Berthier 2015]: BERTHIER D., Pattern-Based Constraint Satisfaction and Logic Puzzles (Third Edition), Lulu.com Publishers, November 2021.