The New Sudoku Players' Forum

by **tdillon** » Wed Jul 22, 2020 10:13 am

As discussed a bit in another thread I've experimented with using minisat for rating difficulty when generating puzzles. Specifically, I've been computing the minisat rating as the average number of decisions that minisat makes given an augmented SAT encoding that represents hidden singles and locked candidates and taking the average over many random row/col/band/stack/digit permutations of the puzzle. Given this, any puzzle that can be solved only with singles and locked candidates has a minisat rating of zero. Really hard sudoku can have a rating over 100. Really hard pencilmark sudoku can have minisat ratings of tens or hundreds of thousands.

This has some differences from usual rating schemes, some of which are pros, some of which are cons, some of which could go either way depending on what you're looking for, and some of which are just not understood, at least not by me:

ratings are fast
ratings are largely independent of any body of prior knowledge or practice of sudoku
ratings are not based on the hardest single step, but may instead be roughly proportional to the number of non-basic inferences or learned conflict clauses produced along the way.
ratings need to be averaged over many random permutations, so they are estimations that are not entirely reproducible.
ratings are influenced by minisat's heuristics, but probably to a lesser degree than they would be for backtrack ratings of solvers that don't use CDCL.
pigeonhole inferences are hard for minisat and uniqueness inferences are impossible, so if these techniques really simplify a puzzle's solution then the puzzle may be over-rated by minisat.

For a rough comparison to traditional ratings, here are distributions of minisat-hardness for a set of 1 million puzzles sampled using a controlled-bias generator, grouped by floor(SE rating):

Code: Select all: Distribution of Minisat-Hardness of Unbiased Puzzles by SE Range SE=1 SE=2 SE=3 SE=4 SE=5 SE=6 SE=7 SE=8 SE=9 ------------------------------------------------------------------------------- Lowest 0 0.0 1.0 0.9 0.9 0.9 1.0 1.9 4.2 25th Percentile 0 0.0 1.9 1.4 1.0 1.8 2.5 4.9 7.8 Median 0 0.0 2.5 1.9 1.2 2.5 3.4 6.0 9.2 75th Percentile 0 0.0 3.4 2.7 1.9 3.5 4.5 7.3 10.8 Highest 0 12.7 13.4 13.6 12.3 16.0 20.2 20.1 20.5

It's interesting to note that puzzles in the SE4/5 range tend to be easier than SE3 puzzles as far as minisat is concerned. But after that the relationship trends as expected. Here is an extension of this relationship using puzzles from the hardest DB instead of the controlled-bias generator since this samples few very hard puzzles (all of these were evaluated over 1000 puzzle permutations):

Code: Select all: Distribution of Minisat-Hardness of Hardest DB Puzzles by SE Range SE10.5-11.0 SE11.0-11.5 SE11.5+ ------------------------------------------------------------------- Lowest 19.2 20.8 52.6 25th Percentile 45.5 47.9 73.8 Median 52.4 56.3 78.7 75th Percentile 60.1 65.4 86.1 Highest 117.5 125.9 123.9

So there is at the same time significant correlation between SE rating and minisat rating, but also significant variation. At the moment I have the hunches in the bullets above to explain the differences, but no great ideas for how to validate them.

Note that the highest minisat-rated puzzle in the hardest DB has a rating of 125.9 as shown above, but it's not terribly hard to find puzzles with still-higher minisat ratings. For example, here are some exemplars from minisat-hard clusters:

Code: Select all: Minisat SER 38.1.....16...8.....2..........4..9....5.13.........458..6.35......2..79.5....... 126.1 11.1/1.2/1.2 ..........839......69..3.......5..41...2......96..85......7...4.5.8..3......2..1. 127.0 9.8/9.8/2.6 98.76....7...9.6....6..5...5..9..........4.3...865.9...2......4..754.8.........1. 130.2 11.3/11.3/10.5 .....8..7....1..9....6...28.19....4.6.........54...93....2.6....95.3........5..1. 150.0 11.2/11.2/2.6

It's possible that at these extremes I'm just selecting puzzles that interact really badly with minisat's heuristics. But it would be interesting to hear what expert puzzle analysts have to say about them!

by **tarek** » Wed Jul 22, 2020 1:27 pm

You are correct that with Uniqueness techniques inclusion in most resolution rules, that there will be puzzles that appear hard for mini-sat which fall to what appears to be much simpler technique.

You got got correctly as well when you said that there is (despite the above) there is a correlation between you mini-sat hardness and SER.

If our goal is (Which it has been for the past 14 years) is to defeat SER as we consider it the current "yardstick" to rate puzzle difficulty, then mini-sat will be a fresh & new angle take on this.

suxrat9, suxratt, q1,q2 were all different ways of attacking the problem and would usually screen a large group of candidates prior to rating them by SE with a great degree of success.

The observations that made us get closer were certainly over the years allowing us to find a different way to find these monsters. From a high Peral rating to a high diamond rating, passing through to diagonal clue layout & finally Mladen's observation which I mentioned in the other thread which eleven found success with. All of these puzzles will need a thorough neighborhood search as well as we have encountered several related puzzles with higher ratings in the vicinity.

Not very methodical but gives you an idea of what has been done to this date. With your success with pencilmark sudokus, I'm hoping that by combining some of the previous ideas/observations with your mini-sat approach that we would have a fresh batch of hardest.

tarek

by **dobrichev** » Wed Jul 22, 2020 1:32 pm

Being not an expert puzzle analyst I see a contradiction in

ratings are fast
and
ratings need to be averaged over many random permutations.

Surely Minisat can be used as a fast filter in hard puzzles searching but it isn't capable to be used as reference due to the reasons you mentioned.

Imagine e new improved version of Minisat appears. All accumulated ratings become useless.

The learned conflict clauses should be covered by the sudoku solving techniques but I lost interest in this area due to the existing aggressive and self-reproductive mess in the terminology.

by **tdillon** » Wed Jul 22, 2020 5:15 pm

dobrichev wrote:I see a contradiction between ratings being fast and needing lots of averaging

Agree there's a trade-off between precision and performance. I've generally used a small number of rating iterations during search where some noise is OK, and I've turned it up to 1000 iterations for re-rating interesting candidates. That said, even with 1000 iterations it's still a lot faster than SE for hard puzzles. For example, it took a few days on a laptop to rate all of the hardest DB puzzles at 1000 iterations, while this would have taken months using SE.

dobrichev wrote:Isn't capable of being used as a reference ... Imagine a new version of Minisat appears ...

Agree that non-determinism stands in the way of using the minisat rating as a reference, though I suppose you could first canonicalize the puzzle and then generate a sequence of permutations from a fixed seed. Maybe that's not so bad. Any rating system will have some arbitrary element; any rating system will encourage trophy hunting; and any rating system will have inertia once it's been widely used to rate and compare puzzles.

dobrichev wrote:Learned conflict clauses should be covered by the sudoku solving techniques

I'd put this the other way around: the backtracking of any DPLL or CDCL-based solver can be rewritten from the bottom up as a resolution-refutation proof, so in this sense any set of sudoku resolution rules is subsumed by the one resolution rule of propositional logic. This is one reason why I share your lack of interest in wading into nomenclature.

Overall I guess my aim with this post was not really to recommend minisat rating as a reference scheme, but rather (1) to point out that it's been useful to me in fast puzzle search, (2) to give some stats for context, and (3) to raise a subjective question that may be difficult to answer: is a rating scheme that's independent of our priors and not based on the hardest step likely to find satisfying puzzles in a fresh way? (e.g., puzzles that present new challenges, diverse challenges, are not one-trick ponies, etc.)

by **coloin** » Thu Jul 23, 2020 11:10 pm

I could be completely wrong but this puzzle has the highest suexratt rating .....

Code: Select all: . . 3 | . . . | . . . 4 . . | . 8 . | . 3 6 . . 8 | . . . | 1 . . -------+-------+------ . 4 . | . 6 . | . 7 3 . . . | 9 . . | . . . . . . | . . 2 | . . 5 -------+-------+------ . . 4 | . 7 . | . 6 8 6 . . | . . . | . . . 7 . . | 6 . . | 5 . .

I Wonder what you make of it !!!!

by **tdillon** » Fri Jul 24, 2020 12:06 am

coloin wrote:I could be completely wrong but this puzzle has the highest suexratt rating .....
Code: Select all
. . 3 | . . . | . . . 4 . . | . 8 . | . 3 6 . . 8 | . . . | 1 . . -------+-------+------ . 4 . | . 6 . | . 7 3 . . . | 9 . . | . . . . . . | . . 2 | . . 5 -------+-------+------ . . 4 | . 7 . | . 6 8 6 . . | . . . | . . . 7 . . | 6 . . | 5 . .

I Wonder what you make of it !!!!

It's not the most extreme by minisat, but certainly in rarefied air:

Code: Select all: tdoku$ build/run_benchmark -a -b -n1000 -sminisat_augmented \ <(echo ..3......4...8..36..8...1...4..6..73...9..........2..5..4.7..686........7..6..5..) 105.4

by **RSW** » Fri Jul 24, 2020 6:35 am

coloin wrote:I could be completely wrong but this puzzle has the highest suexratt rating .....
Code: Select all
. . 3 | . . . | . . . 4 . . | . 8 . | . 3 6 . . 8 | . . . | 1 . . -------+-------+------ . 4 . | . 6 . | . 7 3 . . . | 9 . . | . . . . . . | . . 2 | . . 5 -------+-------+------ . . 4 | . 7 . | . 6 8 6 . . | . . . | . . . 7 . . | 6 . . | 5 . .

I Wonder what you make of it !!!!

Nice puzzle. Solvable with 3 (or perhaps fewer?) steps:
JE2: (1259)R13C1, r4c3 r7c2 => -3r789c2 -7r56c3
JE2: (1259)R89C3, r2c2 r4c1 => -3r56c1 -7r123c2 -8r4c1
Sashimi X-Wing: (9):r38c125 => -9r1c2
stte

<Edit>
Explanation of JE2 eliminations (ref rule no.s from David P. Bird's compendium):

Code: Select all: +--------------------+----------------------+-------------------+ | 1259 125679 3 | 12457 12459 145679 | 24789 24589 2479 | | 4 12579 12579 | 1257 8 1579 | 279 3 6 | | 259 25679 8 | 23457 23459 345679 | 1 2459 2479 | +--------------------+----------------------+-------------------+ | 12589 4 1259 | 158 6 158 | 289 7 3 | | 12358 12357 12567 | 9 1345 134578 | 2468 1248 124 | | 1389 1379 1679 | 13478 134 2 | 4689 1489 5 | +--------------------+----------------------+-------------------+ | 12359 12359 4 | 1235 7 1359 | 239 6 8 | | 6 123589 1259 | 123458 123459 134589 | 23479 1249 12479 | | 7 12389 129 | 6 12349 13489 | 5 1249 1249 | +--------------------+----------------------+-------------------+

JE2: (1259)R13C1, r4c3 r7c2 => -3r789c2 -7r56c3
Rule 3: Nonbase candidate 3 is invalid in target cells => -3r7c2.
Rule 9: Mirror node 1 has locked nonbase digit 8. All other nonbase digits can be eliminated from the mirror node => -3r89c2 -7r56c3

JE2: (1259)R89C3, r2c2 r4c1 => -3r56c1 -7r123c2 -8r4c1
Rule 3: Nonbase candidates 7, 8 are invalid in target cells => -7r2c2 -8r4c1
Rule 9: Mirror node 1 has locked nonbase digit 8. All other nonbase digits can be eliminated from the mirror node => -3r56c1 -7r13c2

by **SpAce** » Fri Jul 24, 2020 4:33 pm

coloin wrote:I could be completely wrong but this puzzle has the highest suexratt rating .....
I Wonder what you make of it !!!!

Well... Don't ask me to justify this too hard. Let's just pretend it makes sense.

Code: Select all: .------------------------------.------------------------------.------------------------. | a1259 125679 3 | 47-125 12459 467-159 | 478-29 24589 2479 | | 4 bg1259-7 be7-1259 | cf1257* 8 cf1579* | cf279* 3 6 | | a259 25679 8 | 347-25 23459 3467-59 | 1 2459 2479 | :------------------------------+------------------------------+------------------------: | bg1259-8 4 ce1259 | cf158* 6 cf158* | cf289* 7 3 | | 12358 37-125 12567 | 9 1345 3478-15 | 468-2 1248 124 | | 1389 37-19 1679 | 3478-1 134 2 | 468-9 1489 5 | :------------------------------+------------------------------+------------------------: | be3-1259 ce1259-3 4 | cf1235* 7 cf1359* | cf239* 6 8 | | 6 123589 d1259 | 348-125 123459 348-159 | 347-29 1249 12479 | | 7 12389 d129 | 6 12349 348-19 | 5 1249 1249 | '------------------------------'------------------------------'------------------------'

Step 1. A fishy AALS-loop (same caveats as with an SK-loop):

1259 = wxyz

(w|x=yz)r13c1 - (y|z)r2c23|r47c1 = {(yz)R247 \ (yz)c467#(2+2) 4n3 7n2} - (y|z=wx)r89c3 - (w|x)r7c12|r24c3 = {(wx)R247 \ (wx)c467#(2+2) 2n2 4n1} - loop
=> -wxyz(1259) c467(~r247), r2c3, r56c2, r7c1, -3 r7c2, -7 r2c2, -8 r4c1 (43 eliminations, basics)

Step 2: RP(29)r1c9,r9c3 => -29 r9c9; stte

--
Edit. Corrected typo: r13c13 -> r13c1

by **yzfwsf** » Sun Jul 26, 2020 10:53 am

My solver found a double JE.

: Double JE.png (43.42 KiB) Viewed 1301 times

by **SpAce** » Fri Jul 31, 2020 9:25 pm

Hi yzfwsf,

yzfwsf wrote:My solver found a double JE.

I think it also found some non-standard JE4 eliminations: -7r2c46, -8r4c46, -3rc46. Nice! They looked a bit out of place as direct eliminations at first, but I think they're actually quite clever. What was the exact justification it used for them? Does it see that (125)r247 and (159)r247 get locked as hidden triples because there has to be 2x1259 in r247c467 (and r247c7 can only hold 1x29, locking the other 6 candidates in r247c46)? Or something else?

In any case, that's a whopping 57 eliminations in a single step. Do you have examples of even more?

PS. Of course this is purely academic, as the 5-6 basic target+mirror eliminations of either JE2 trivialize the puzzle on their own. (Thus RSW's solution only needs two steps, too. One of the JE2s is redundant.)

by **yzfwsf** » Mon Sep 07, 2020 9:03 am

Do you have examples of even more?

Code: Select all: ........1..1..2..3.4..1..5...3.6.5.7.6....3..7....4.6..3..4..7..5..76.3.8..9..... ........1.....2.3..45.3.6.....7.......8.9.3.64.9.8.5....6.......5.9..8..8...5..63 ........1.....2.....3.4.25...26..74..3......89...7......4..756..7.46....3...5....

by **Leren** » Tue Sep 08, 2020 7:12 am

RSW wrote : Rule 9: Mirror node 1 has locked nonbase digit 8. All other nonbase digits can be eliminated from the mirror node => -3r89c2 -7r56c3

As written that looks a bit confusing. I think what you meant to say was something like :

Rule 9: Mirror node 1 has locked nonbase digit 6. All other nonbase digits can be eliminated from the mirror node => -7r56c3
Rule 9: Mirror node 2 has locked nonbase digit 8. All other nonbase digits can be eliminated from the mirror node => -3r89c2

Hope this helps. Leren

by **SpAce** » Wed Sep 09, 2020 2:46 am

Leren wrote:
RSW wrote : Rule 9: Mirror node 1 has locked nonbase digit 8. All other nonbase digits can be eliminated from the mirror node => -3r89c2 -7r56c3

As written that looks a bit confusing. I think what you meant to say was something like :

Rule 9: Mirror node 1 has locked nonbase digit 6. All other nonbase digits can be eliminated from the mirror node => -7r56c3
Rule 9: Mirror node 2 has locked nonbase digit 8. All other nonbase digits can be eliminated from the mirror node => -3r89c2

I concur. The same is true for the other pair of mirror cells as well. Glad you caught that. When I said "perfect" (in another thread), I was just happy to see the correct rule references but didn't check they made complete sense.

Personally I think a picture is always better than words anyway. Thus I'd be happiest to see a grid with relevant markings. Then the simplest kinds of eliminations (like here) don't necessarily need any verbal explanations, as long as the different types are separated:

Code: Select all: .-------------------------------.--------------------------.----------------------. | b1259 M1259.6-7 3 | 12457 12459 145679 | 24789 24589 2479 | | 4 T1259-7 12579 | s1257 8 s1579 | s279 3 6 | | b259 M259.6-7 8 | 23457 23459 345679 | 1 2459 2479 | :-------------------------------+--------------------------+----------------------: | T1259-8 4 t1259 | s158 6 s158 | s289 7 3 | | M125.8-3 12357 m125.6-7 | 9 1345 134578 | 2468 1248 124 | | M19.8-3 1379 m19.6-7 | 13478 134 2 | 4689 1489 5 | :-------------------------------+--------------------------+----------------------: | 12359 t1259-3 4 | s1235 7 s1359 | s239 6 8 | | 6 m1259.8-3 B1259 | 123458 123459 134589 | 23479 1249 12479 | | 7 m129.8-3 B129 | 6 12349 13489 | 5 1249 1249 | '-------------------------------'--------------------------'----------------------'

Code: Select all: JE2:(1259)r13c1,r4c3,r7c2 => -3r7c2; -7r56c3, -3r89c2 JE2:(1259)r79c3,r2c2,r4c1 => -7r2c2, -8r4c1; -7r13c2, -3r56c1

Or a bit more verbose:

Code: Select all: JE2:(1259)r13c1,r4c3,r7c2 => -3r7c2 (target); -7r56c3, -3r89c2 (mirrors with locked 6, 8) JE2:(1259)r79c3,r2c2,r4c1 => -7r2c2, -8r4c1 (targets); -7r13c2, -3r56c1 (mirrors with locked 6, 8)

(The marked grid should also make it obvious that the two could be combined into a JE4 with many more eliminations.)

by **Leren** » Wed Sep 09, 2020 9:36 am

RSW wrote : Nice puzzle. Solvable with 3 (or perhaps fewer?) steps:

JE2: (1259)R13C1, r4c3 r7c2 => -3r789c2 -7r56c3
JE2: (1259)R89C3, r2c2 r4c1 => -3r56c1 -7r123c2 -8r4c1
Sashimi X-Wing: (9):r38c125 => -9r1c2

Hopefully I'm not getting too off topic here but either of the JE2's can be replaced by some simple basics, 4 HP's and 3 intersections. This often happens when you have 2 locked non-base digit mirror node eliminations. Leren

The New Sudoku Players' Forum

Minisat hardness

Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness

Re: Minisat hardness