To avoid endless and fruitless discussions, I open a new thread to collect some results on ratings below the potential hardest.
It’s clear that an intensive work has been done over time to find and analyse the hardest puzzles.
There were two main reasons to focus on such puzzles:
- People like challenging targets,
- There was no volume issue for such puzzles.
It appeared quickly that puzzles defined as “potential hardest” had specific properties giving sometimes a short quick and easy solution.
The Exocet pattern can be found in more than 75% the puzzles and in total, more than 80% have one of the “exotic” properties.
For several reasons, the main one being the volume issue, no significant collection of puzzles existed for lower ratings.
Recently the question of the frequency of the “exotic” patterns for lower ratings has been raised, it appeared that we were lacking information to even try to answer to that question.
On the other side, some players were interested in getting some puzzles offering some of the “exotic” patterns with lower ratings, and, if possible, patterns differing from the ones found for the potential hardest file.
At the end, 2 actions were started.
a) Eleven tried produced a “pseudo random” sample of puzzles. As the price to pay to have puzzles with significant ratings is very high, the file he got remains in the lowest part of the ratings range to study.
b) I launched a vicinity +-3 search on the potential hardest file keeping all puzzles generated.
The corresponding sample file will be the basis for deeper investigations described in that thread.
Eleven’s file has 282588 puzzles rating (skfr rating) 8.6 to 9.3, with 2380 puzzles (less than 1% of the sample) rating respectively 9.1 (2219) 9.2 (158) and 9.3 (3);
The vicinity search produced 209.7 millions puzzles not in the data base of potential hardest rating more than 7.5 and 56.9 million puzzles rating between 6.2 and 7.5
One interesting result is the ratio JExocet/Number of puzzles in Eleven’s sample.
The ratio is 0.03% (raw figures an adjusted 0.018% has been worked out, but this does not change the scope).
In the file of potential hardest, the same ratio is 76%.
If we assume that the distribution of that ratio is not linked to the rating, we can expect the same ratio in the ratings over 10.3 (the entire file of potential hardest) than in eleven’s sample
Then, we should expect the entire file of ratings over 10.3 to be at least 4000 times bigger than the current file.
Everybody having worked in that field will tell that this is crazy. The best we can expect is to have the entire file with clues in the range 20-25 to be 2 to 10 times bigger than the current file.
Then, we must assume a ratio JE/number of puzzles depending on the rating and we come to eleven’s assumption that the ratio will sharply decline from the highest level to the sample area (8.6 to 9.0)
The sample coming out of the vicinity search should give a upper limit of that tendency. The ongoing rating of the entire sample is a necessary step to have that upper limit.
Next posts will describe in details the results of tests done on the 2 samples and where to find the corresponding data.
Regarding the JE (and other Exocets) frequency, I decided to keep here the non truncated process.
Additional selections will follow to meet David’s expectations. This will give more homogeneity in the data sets.
storing the results
we have to share huge volumes, which is not easy.
Some of the results will be located in the skfr project where we have an allocation of 4 GB
Google does not allocate storage place for new projects, so I'll use the new allocation, something as storage in the cloud to locate the raw results of the samples.
the link to that storage place is here
having no experience of that new tool, I started with a readme file and the list of puzzles in eleven sample having a rank 0 logic seen by my solver;
pieces of the big sample should come after the rating will be over