Minisat hardness

Everything about Sudoku that doesn't fit in one of the other sections

Minisat hardness

Postby tdillon » Wed Jul 22, 2020 10:13 am

As discussed a bit in another thread I've experimented with using minisat for rating difficulty when generating puzzles. Specifically, I've been computing the minisat rating as the average number of decisions that minisat makes given an augmented SAT encoding that represents hidden singles and locked candidates and taking the average over many random row/col/band/stack/digit permutations of the puzzle. Given this, any puzzle that can be solved only with singles and locked candidates has a minisat rating of zero. Really hard sudoku can have a rating over 100. Really hard pencilmark sudoku can have minisat ratings of tens or hundreds of thousands.

This has some differences from usual rating schemes, some of which are pros, some of which are cons, some of which could go either way depending on what you're looking for, and some of which are just not understood, at least not by me:

  • ratings are fast
  • ratings are largely independent of any body of prior knowledge or practice of sudoku
  • ratings are not based on the hardest single step, but may instead be roughly proportional to the number of non-basic inferences or learned conflict clauses produced along the way.
  • ratings need to be averaged over many random permutations, so they are estimations that are not entirely reproducible.
  • ratings are influenced by minisat's heuristics, but probably to a lesser degree than they would be for backtrack ratings of solvers that don't use CDCL.
  • pigeonhole inferences are hard for minisat and uniqueness inferences are impossible, so if these techniques really simplify a puzzle's solution then the puzzle may be over-rated by minisat.
For a rough comparison to traditional ratings, here are distributions of minisat-hardness for a set of 1 million puzzles sampled using a controlled-bias generator, grouped by floor(SE rating):

Code: Select all
Distribution of Minisat-Hardness of Unbiased Puzzles by SE Range
                  SE=1   SE=2   SE=3   SE=4   SE=5   SE=6   SE=7   SE=8   SE=9
-------------------------------------------------------------------------------
Lowest               0    0.0    1.0    0.9    0.9    0.9    1.0    1.9    4.2
25th Percentile      0    0.0    1.9    1.4    1.0    1.8    2.5    4.9    7.8
Median               0    0.0    2.5    1.9    1.2    2.5    3.4    6.0    9.2
75th Percentile      0    0.0    3.4    2.7    1.9    3.5    4.5    7.3   10.8
Highest              0   12.7   13.4   13.6   12.3   16.0   20.2   20.1   20.5

It's interesting to note that puzzles in the SE4/5 range tend to be easier than SE3 puzzles as far as minisat is concerned. But after that the relationship trends as expected. Here is an extension of this relationship using puzzles from the hardest DB instead of the controlled-bias generator since this samples few very hard puzzles (all of these were evaluated over 1000 puzzle permutations):

Code: Select all
Distribution of Minisat-Hardness of Hardest DB Puzzles by SE Range
                          SE10.5-11.0     SE11.0-11.5       SE11.5+
-------------------------------------------------------------------
Lowest                        19.2            20.8            52.6
25th Percentile               45.5            47.9            73.8
Median                        52.4            56.3            78.7
75th Percentile               60.1            65.4            86.1
Highest                      117.5           125.9           123.9

So there is at the same time significant correlation between SE rating and minisat rating, but also significant variation. At the moment I have the hunches in the bullets above to explain the differences, but no great ideas for how to validate them.

Note that the highest minisat-rated puzzle in the hardest DB has a rating of 125.9 as shown above, but it's not terribly hard to find puzzles with still-higher minisat ratings. For example, here are some exemplars from minisat-hard clusters:

Code: Select all
                                                                                    Minisat       SER
38.1.....16...8.....2..........4..9....5.13.........458..6.35......2..79.5.......     126.1   11.1/1.2/1.2
..........839......69..3.......5..41...2......96..85......7...4.5.8..3......2..1.     127.0   9.8/9.8/2.6
98.76....7...9.6....6..5...5..9..........4.3...865.9...2......4..754.8.........1.     130.2   11.3/11.3/10.5
.....8..7....1..9....6...28.19....4.6.........54...93....2.6....95.3........5..1.     150.0   11.2/11.2/2.6

It's possible that at these extremes I'm just selecting puzzles that interact really badly with minisat's heuristics. But it would be interesting to hear what expert puzzle analysts have to say about them!
tdillon
 
Posts: 49
Joined: 14 June 2019

Re: Minisat hardness

Postby tarek » Wed Jul 22, 2020 1:27 pm

You are correct that with Uniqueness techniques inclusion in most resolution rules, that there will be puzzles that appear hard for mini-sat which fall to what appears to be much simpler technique.

You got got correctly as well when you said that there is (despite the above) there is a correlation between you mini-sat hardness and SER.

If our goal is (Which it has been for the past 14 years) is to defeat SER as we consider it the current "yardstick" to rate puzzle difficulty, then mini-sat will be a fresh & new angle take on this.

suxrat9, suxratt, q1,q2 were all different ways of attacking the problem and would usually screen a large group of candidates prior to rating them by SE with a great degree of success.

The observations that made us get closer were certainly over the years allowing us to find a different way to find these monsters. From a high Peral rating to a high diamond rating, passing through to diagonal clue layout & finally Mladen's observation which I mentioned in the other thread which eleven found success with. All of these puzzles will need a thorough neighborhood search as well as we have encountered several related puzzles with higher ratings in the vicinity.

Not very methodical but gives you an idea of what has been done to this date. With your success with pencilmark sudokus, I'm hoping that by combining some of the previous ideas/observations with your mini-sat approach that we would have a fresh batch of hardest.

tarek
User avatar
tarek
 
Posts: 3726
Joined: 05 January 2006

Re: Minisat hardness

Postby dobrichev » Wed Jul 22, 2020 1:32 pm

Being not an expert puzzle analyst I see a contradiction in
  • ratings are fast
    and
  • ratings need to be averaged over many random permutations.

Surely Minisat can be used as a fast filter in hard puzzles searching but it isn't capable to be used as reference due to the reasons you mentioned.

Imagine e new improved version of Minisat appears. All accumulated ratings become useless.

The learned conflict clauses should be covered by the sudoku solving techniques but I lost interest in this area due to the existing aggressive and self-reproductive mess in the terminology.
dobrichev
2016 Supporter
 
Posts: 1784
Joined: 24 May 2010

Re: Minisat hardness

Postby tdillon » Wed Jul 22, 2020 5:15 pm

dobrichev wrote:I see a contradiction between ratings being fast and needing lots of averaging

Agree there's a trade-off between precision and performance. I've generally used a small number of rating iterations during search where some noise is OK, and I've turned it up to 1000 iterations for re-rating interesting candidates. That said, even with 1000 iterations it's still a lot faster than SE for hard puzzles. For example, it took a few days on a laptop to rate all of the hardest DB puzzles at 1000 iterations, while this would have taken months using SE.

dobrichev wrote:Isn't capable of being used as a reference ... Imagine a new version of Minisat appears ...

Agree that non-determinism stands in the way of using the minisat rating as a reference, though I suppose you could first canonicalize the puzzle and then generate a sequence of permutations from a fixed seed. Maybe that's not so bad. Any rating system will have some arbitrary element; any rating system will encourage trophy hunting; and any rating system will have inertia once it's been widely used to rate and compare puzzles.

dobrichev wrote:Learned conflict clauses should be covered by the sudoku solving techniques

I'd put this the other way around: the backtracking of any DPLL or CDCL-based solver can be rewritten from the bottom up as a resolution-refutation proof, so in this sense any set of sudoku resolution rules is subsumed by the one resolution rule of propositional logic. This is one reason why I share your lack of interest in wading into nomenclature.

Overall I guess my aim with this post was not really to recommend minisat rating as a reference scheme, but rather (1) to point out that it's been useful to me in fast puzzle search, (2) to give some stats for context, and (3) to raise a subjective question that may be difficult to answer: is a rating scheme that's independent of our priors and not based on the hardest step likely to find satisfying puzzles in a fresh way? (e.g., puzzles that present new challenges, diverse challenges, are not one-trick ponies, etc.)
tdillon
 
Posts: 49
Joined: 14 June 2019

Re: Minisat hardness

Postby coloin » Thu Jul 23, 2020 11:10 pm

I could be completely wrong but this puzzle has the highest suexratt rating .....
Code: Select all
 
 . . 3 | . . . | . . . 
 4 . . | . 8 . | . 3 6 
 . . 8 | . . . | 1 . . 
-------+-------+------
 . 4 . | . 6 . | . 7 3 
 . . . | 9 . . | . . . 
 . . . | . . 2 | . . 5 
-------+-------+------
 . . 4 | . 7 . | . 6 8 
 6 . . | . . . | . . . 
 7 . . | 6 . . | 5 . .

I Wonder what you make of it !!!!
coloin
 
Posts: 1906
Joined: 05 May 2005

Re: Minisat hardness

Postby tdillon » Fri Jul 24, 2020 12:06 am

coloin wrote:I could be completely wrong but this puzzle has the highest suexratt rating .....
Code: Select all
 
 . . 3 | . . . | . . . 
 4 . . | . 8 . | . 3 6 
 . . 8 | . . . | 1 . . 
-------+-------+------
 . 4 . | . 6 . | . 7 3 
 . . . | 9 . . | . . . 
 . . . | . . 2 | . . 5 
-------+-------+------
 . . 4 | . 7 . | . 6 8 
 6 . . | . . . | . . . 
 7 . . | 6 . . | 5 . .

I Wonder what you make of it !!!!

It's not the most extreme by minisat, but certainly in rarefied air:
Code: Select all
tdoku$ build/run_benchmark -a -b -n1000 -sminisat_augmented \
  <(echo ..3......4...8..36..8...1...4..6..73...9..........2..5..4.7..686........7..6..5..)
       105.4   
tdillon
 
Posts: 49
Joined: 14 June 2019

Re: Minisat hardness

Postby RSW » Fri Jul 24, 2020 6:35 am

coloin wrote:I could be completely wrong but this puzzle has the highest suexratt rating .....
Code: Select all
 
 . . 3 | . . . | . . . 
 4 . . | . 8 . | . 3 6 
 . . 8 | . . . | 1 . . 
-------+-------+------
 . 4 . | . 6 . | . 7 3 
 . . . | 9 . . | . . . 
 . . . | . . 2 | . . 5 
-------+-------+------
 . . 4 | . 7 . | . 6 8 
 6 . . | . . . | . . . 
 7 . . | 6 . . | 5 . .

I Wonder what you make of it !!!!


Nice puzzle. Solvable with 3 (or perhaps fewer?) steps:
JE2: (1259)R13C1, r4c3 r7c2 => -3r789c2 -7r56c3
JE2: (1259)R89C3, r2c2 r4c1 => -3r56c1 -7r123c2 -8r4c1
Sashimi X-Wing: (9):r38c125 => -9r1c2
stte
RSW
 
Posts: 57
Joined: 01 December 2018
Location: Western Canada

Re: Minisat hardness

Postby SpAce » Fri Jul 24, 2020 4:33 pm

coloin wrote:I could be completely wrong but this puzzle has the highest suexratt rating .....
I Wonder what you make of it !!!!

Well... Don't ask me to justify this too hard. Let's just pretend it makes sense.

Code: Select all
.------------------------------.------------------------------.------------------------.
|  a1259      125679    3      |   47-125   12459     467-159 |   478-29  24589  2479  |
|   4       bg1259-7  be7-1259 | cf1257*    8       cf1579*   | cf279*    3      6     |
|  a259       25679     8      |   347-25   23459     3467-59 |   1       2459   2479  |
:------------------------------+------------------------------+------------------------:
| bg1259-8    4       ce1259   | cf158*     6       cf158*    | cf289*    7      3     |
|   12358     37-125    12567  |   9        1345      3478-15 |   468-2   1248   124   |
|   1389      37-19     1679   |   3478-1   134       2       |   468-9   1489   5     |
:------------------------------+------------------------------+------------------------:
| be3-1259  ce1259-3    4      | cf1235*    7       cf1359*   | cf239*    6      8     |
|   6         123589   d1259   |   348-125  123459    348-159 |   347-29  1249   12479 |
|   7         12389    d129    |   6        12349     348-19  |   5       1249   1249  |
'------------------------------'------------------------------'------------------------'

Step 1. A fishy AALS-loop (same caveats as with an SK-loop):

1259 = wxyz

(w|x=yz)r13c1 - (y|z)r2c23|r47c1 = {(yz)R247 \ (yz)c467#(2+2) 4n3 7n2} - (y|z=wx)r89c3 - (w|x)r7c12|r24c3 = {(wx)R247 \ (wx)c467#(2+2) 2n2 4n1} - loop
=> -wxyz(1259) c467(~r247), r2c3, r56c2, r7c1, -3 r7c2, -7 r2c2, -8 r4c1 (43 eliminations, basics)

Step 2: RP(29)r1c9,r9c3 => -29 r9c9; stte

--
Edit. Corrected typo: r13c13 -> r13c1
Last edited by SpAce on Fri Jul 31, 2020 7:55 pm, edited 1 time in total.
-SpAce-: Show
Code: Select all
   *             |    |               |    |    *
        *        |=()=|    /  _  \    |=()=|               *
            *    |    |   |-=( )=-|   |    |      *
     *                     \  ¯  /                   *   

"If one is to understand the great mystery, one must study all its aspects, not just the dogmatic narrow view of the Jedi."
User avatar
SpAce
 
Posts: 2354
Joined: 22 May 2017

Re: Minisat hardness

Postby yzfwsf » Sun Jul 26, 2020 10:53 am

My solver found a double JE.
Double JE.png
Double JE.png (43.42 KiB) Viewed 87 times
yzfwsf
 
Posts: 210
Joined: 16 April 2019

Re: Minisat hardness

Postby SpAce » Fri Jul 31, 2020 9:25 pm

Hi yzfwsf,

yzfwsf wrote:My solver found a double JE.

I think it also found some non-standard JE4 eliminations: -7r2c46, -8r4c46, -3rc46. Nice! They looked a bit out of place as direct eliminations at first, but I think they're actually quite clever. What was the exact justification it used for them? Does it see that (125)r247 and (159)r247 get locked as hidden triples because there has to be 2x1259 in r247c467 (and r247c7 can only hold 1x29, locking the other 6 candidates in r247c46)? Or something else?

In any case, that's a whopping 57 eliminations in a single step. Do you have examples of even more?

PS. Of course this is purely academic, as the 5-6 basic target+mirror eliminations of either JE2 trivialize the puzzle on their own. (Thus RSW's solution only needs two steps, too. One of the JE2s is redundant.)
User avatar
SpAce
 
Posts: 2354
Joined: 22 May 2017


Return to General