This has some differences from usual rating schemes, some of which are pros, some of which are cons, some of which could go either way depending on what you're looking for, and some of which are just not understood, at least not by me:
- ratings are fast
- ratings are largely independent of any body of prior knowledge or practice of sudoku
- ratings are not based on the hardest single step, but may instead be roughly proportional to the number of non-basic inferences or learned conflict clauses produced along the way.
- ratings need to be averaged over many random permutations, so they are estimations that are not entirely reproducible.
- ratings are influenced by minisat's heuristics, but probably to a lesser degree than they would be for backtrack ratings of solvers that don't use CDCL.
- pigeonhole inferences are hard for minisat and uniqueness inferences are impossible, so if these techniques really simplify a puzzle's solution then the puzzle may be over-rated by minisat.
- Code: Select all
Distribution of Minisat-Hardness of Unbiased Puzzles by SE Range
SE=1 SE=2 SE=3 SE=4 SE=5 SE=6 SE=7 SE=8 SE=9
-------------------------------------------------------------------------------
Lowest 0 0.0 1.0 0.9 0.9 0.9 1.0 1.9 4.2
25th Percentile 0 0.0 1.9 1.4 1.0 1.8 2.5 4.9 7.8
Median 0 0.0 2.5 1.9 1.2 2.5 3.4 6.0 9.2
75th Percentile 0 0.0 3.4 2.7 1.9 3.5 4.5 7.3 10.8
Highest 0 12.7 13.4 13.6 12.3 16.0 20.2 20.1 20.5
It's interesting to note that puzzles in the SE4/5 range tend to be easier than SE3 puzzles as far as minisat is concerned. But after that the relationship trends as expected. Here is an extension of this relationship using puzzles from the hardest DB instead of the controlled-bias generator since this samples few very hard puzzles (all of these were evaluated over 1000 puzzle permutations):
- Code: Select all
Distribution of Minisat-Hardness of Hardest DB Puzzles by SE Range
SE10.5-11.0 SE11.0-11.5 SE11.5+
-------------------------------------------------------------------
Lowest 19.2 20.8 52.6
25th Percentile 45.5 47.9 73.8
Median 52.4 56.3 78.7
75th Percentile 60.1 65.4 86.1
Highest 117.5 125.9 123.9
So there is at the same time significant correlation between SE rating and minisat rating, but also significant variation. At the moment I have the hunches in the bullets above to explain the differences, but no great ideas for how to validate them.
Note that the highest minisat-rated puzzle in the hardest DB has a rating of 125.9 as shown above, but it's not terribly hard to find puzzles with still-higher minisat ratings. For example, here are some exemplars from minisat-hard clusters:
- Code: Select all
Minisat SER
38.1.....16...8.....2..........4..9....5.13.........458..6.35......2..79.5....... 126.1 11.1/1.2/1.2
..........839......69..3.......5..41...2......96..85......7...4.5.8..3......2..1. 127.0 9.8/9.8/2.6
98.76....7...9.6....6..5...5..9..........4.3...865.9...2......4..754.8.........1. 130.2 11.3/11.3/10.5
.....8..7....1..9....6...28.19....4.6.........54...93....2.6....95.3........5..1. 150.0 11.2/11.2/2.6
It's possible that at these extremes I'm just selecting puzzles that interact really badly with minisat's heuristics. But it would be interesting to hear what expert puzzle analysts have to say about them!