RW wrote:I noticed that the lack of 19s seems to be related to the total amount of 2-digit unavoidable sets. There's 4 grids so far, the amount of 2-digit unavoidables: 68, 70, 73, 74. The average on random grids created with gsf's program is 55. The largest amount in a known grid with a 17 is 63. It's possible that I started examining the wrong thing when I looked at the individual pairs instead of the 2-digit unavoidables as a total. The amount of 2-permutable pairs varies very much between the different grids without 19s, but as you can see they have in common a very high amount of total 2-digit unavoidables. Of course the amount of small 3-digit unavoidables also must be important.
I've recently been playing with Diff(G,2), which is related to the number of 2-digit unavoidables as follows. Think of a grid as being made up of nine templates, with template D being those cells filled in with digit D. We define Diff(G,k) to be the number of grids that differ from G by exactly k templates. If there are u(d,e) minimal unavoidables on digits (d,e) then Diff(G,2) = 2^u(1,2) + 2^u(1,3) + ... + 2^u(8,9) - 36. By contrast, you appear to be using Unav(G,2) = u(1,2) + u(1,3) + ... + u(8,9). Both are very good discriminators of grids containing a 17 versus grids without, with Diff(G,2) apparently slightly better than Unav(G,2) as I will now explain.
I've been evaluating different grid statistics, e.g. Diff or Unav, with a "good-ranking score", which is equal to the probability that a random grid containing a 17 beats (has a better statistic than) any old random grid. Ties (equal stats) are resolved by flipping a coin. I calculate this score by comparing the list of 34108 grids-with-a-17 with a separate list of 34108 grids generated at random. Of course this only gives me an approximation to the 17-beats-random probability, and I could improve that approximation by using more random grids, but it feels good enough.
The good-ranking scores for Diff(G,2) and Unav(G,2) are 81.2% and 80.6% respectively, which may or may not be a statistically significant difference. The
SF grid is ranked very highly by both statistics, as you'd hope, with
SFB coming top overall with perfect Diff and Unav stats as mentioned in earlier posts. You do can slightly better than Diff and Unav, at least on my limited sample of random grids, by using UnavSquared(G,2) = u(1,2)^2 + u(1,3)^2 + ... + u(8,9)^2, which comes in at 81.4%. You might also expect an improvement by considering three digits instead of just two, but not so: Diff(G,3) scores only 78.2%.
Of the other stats that I tried with a substantially different theme, only one scored anywhere near these figures: that was
Wolfgang's 3-rookeries statistic which, when you count the number of rookeries rather than the number of cell pairs (his 30 & 60 vice his 2628 & 5049), scores 79.0%. Everything other method that I coded up scored
c. 70% or lower.