Classification and frequency of patterns/rules

As an illustration of the importance of the classification of patterns when one tries to estimate their frequency, I'll take the following example, the simplest in the short list of JExocets provided by Champagne here: http://forum.enjoysudoku.com/jexocet-pattern-defintion-t31133-37.html

Here is the puzzle:

5..........46..9...8..4..1..2.9.6.....6.7.3.....3...2..1..2..5...9..37..........8

It is mentioned as having a "classical JExocet form" with:

- base cells r4c5 r6c5

- target cells r2c6 r8c4

- and 3 base digits 158

This JExocet can therefore (potentially) eliminate candidates (n2r2c6 n2r8c4 n3r2c6 n3r8c4 n4r2c6 n4r8c4 n6r2c6 n6r8c4 n7r2c6 n7r8c4 n9r2c6 n9r8c4). In practice, most of them are eliminated during initialisation, as being in direct contradiction with a given. There remain the following 3: n2r2c6 n4r8c4 n7r2c6.

If we don't take the JExocet into account, we get the following resolution path (in W4):

Note that the finned-x-wing, swordfish, biv-chains, ... are special cases of whips[<= 3]

Considering the recent post in which I showed that a JExocet involves at least 13 or 16 CSP-variables (depending on whether it has 3 or 4 base digits), we could already conclude that it would never be found in this puzzle if, in the rules hierarchy, JExocet was classified at a place consistent with this high count.

However, one can still want to see what happens if we use it nevertheless. So, let's give it the highest priority and apply it right at the start. Do we get a simpler solution? NO. Here is the new resolution path.

As JExocet is currently not programmed in SudoRules, I use a special rule that allows me to make "simulated eliminations" of any list of candidates. Only the effective ones are displayed (here only 3 in the list of 12 potential ones).

As you can see, there's almost no difference. Not only is the W rating unchanged, but the resolution path itself is almost unchanged (differences are shown by the ";;; <<<<<<<<<<<<<<" sign).

What can we conclude? When one tries to estimate the frequency of a pattern or a resolution rule, at least three conditions of the estimate should be clearly stated:

- on which collection of puzzles it is based

- how the pattern/rule is classified in some hierarchy of patterns/rules (i.e. which rules are applied before it)

- whether what is being talked about is the pattern or the rule, i.e. whether the impact of the mere presence of the pattern on the rating of a puzzle is taken into account (e.g. is the puzzle still considered as "having the pattern" if applying the associated rule has no impact or no significant impact?)

You think all this is obvious? I fully agree. Why then is it (almost) never applied?