## SudokuP - Min Clue Project

For fans of Killer Sudoku, Samurai Sudoku and other variants

### Re: SudokuP - Min Clue Project

Mathimagics wrote:I had hoped to piggy-back the enumeration of 11-clue puzzles along with the 10-clue testing, but that now looks to be an impossible dream.

It might be possible ... read on.

If this "profile" is typical of an 11-clue puzzle, and it does appear to be so, then that's the death sentence for exhaustive enumeration!

It may be typical of a grid with an 11-clue puzzle, but I don't think it's typical of randomly selected grids.
I think you'll see a lot of cases with no hitting sets at all -- even with your "MCN" filtering in place.

The largest set of UA's that I could assemble (using GridChecker, as my simplistic UA generator proved to be inadequate) was 124, from which I obtained:

• 1,130 HSets of size 8
• 375,331 HSets of size 9
• 27,266,402 Hsers of size 10
• 620,690,909 HSets of size 11

These are "net" figures, ie the HSets of size 9 do not include any repeats of size 8, etc.

The first 3 cases need to be multiplied by the "free cell multiplier", for size 8 this is C(73, 3) = 62,196, for size 9 it's C(72,2) = 2,556, and for size 10 it's 71.

So the total number of 11-clue puzzle candidates that need to to be tested is (1130 x 62196) + (375331 x 2556) + (27266402 x 71) + 620690909 = 3,586,232,967 .

You can improve that number substantially ... reducing "free cell multiplier" numbers, by using the "dead clues" concept.
I remember that they were discussed in McGuire's paper, but I don't remember how much detail was covered.

Thier code used it in three ways:
• In selecting which UA to cover "next".
• To reduce the number of cells that need to be considered for "hits" in that set.
• To reduce what you've called the "free cell multiplier" numbers.
If your code is guaranteeing that the "620,690,909 HSets of size 11" are all unique, then you must be doing something similar.

For the "free cell multiplier" connection: if you reach a point where, say, 9 clues have been placed, and all of the UA sets that the code is dealing with, have been hit ... then in the double loop that adds two more clues, you can skip any cells that are (currently) flagged as "dead".

If any of that doesn't "click", don't hesitate to ask ...
blue

Posts: 909
Joined: 11 March 2013

### Re: SudokuP - Min Clue Project

Thanks for that update .... boggling

I also think the 3rd method - is also too many puzzles to test....

maybe there a 4th method ....in the search for 11-clue puzzles... is this what you tried ?

as there is 11 clues there must always be 2 clues min in a box ... and in many of the published puzzles there are 3 clues in one of the boxes.
Its generally quicker to take away clues than to add clues ....
so maybe a way would be take a valid puzzle .... fill up the box with 3 clues to 9 clues - and search around that for other 9plus8 valid sudokuP puzzles - there may well be quite a few though.....
if you can remove 6 clues from the full box and still have a valid puzzle - you will get an 11-puzzle.

Perhaps if one starts a fresh - and still end up with the same puzzles - that might be significant
coloin

Posts: 2216
Joined: 05 May 2005
Location: Devon

### Re: SudokuP - Min Clue Project

Hi blue, to illustrate what the new SudokuP move was, I'll illustrate it on E(M(B4)) 1....................3.26..................4.....1....7.8......4..............2.9

Code: Select all
`*------------------------------------*| 1  b36 9  | 56 8  4  |a35  2   7   || 2   5  36 | 7  69 1  | 8   9-3 4   || 8   7  4  | 3  59 2  | 6   59  1   ||-----------+----------+-------------|| 36  4  1  | 8  7  35 | 9   356 2   || 36  8  7  | 56 2  9  | 1   4   35  || 9   2  5  | 4  1  36 | 7  d36  8   ||-----------+----------+-------------|| 7   9  8  | 2  35 56 | 4   1   36  || 4   1  2  | 9  36 8  | 35  7   356 || 5  c36 36 | 1  4  7  | 2   8   9   |*------------------------------------*`

Row/P House Skyscraper (3) r1c7 = r1c2 - r9c2 = (3) r6c8 => - 3 r2c2; stte

The trick is that P House 7 only has two 3's, so there is a strong link on 3 between cells c and d (impossible in an ordinary Sudoku).

As it turns out, Row/P House Skyscrapers can be used to complete the solution for all members of the F group of M(B4)). (I used M(B4) rather than B4 because there was an extraneous Skyscraper in that group, which didn't look so good).

I have no doubt that if a similar problem turns up for other puzzles, some SudokuP analog of an ordinary Sudoku move will fix the problem in an analogous way.

Leren
Leren

Posts: 4338
Joined: 03 June 2012

### Re: SudokuP - Min Clue Project

Greetings Leren!

Leren wrote:Not quite sure what you mean by that - if you manage to enumerate all 11 clue puzzles haven't you also solved the 10 clue question ? It seems to me that all 10 clue puzzles must have a number of 11 clue extensions, with the 11th clue being redundant.

Now assume that you have enumerated all 11 clue puzzles ...

And there you have it. That little assumption ... we will have resolved the 10-clue question long before any enumeration of 11-clue puzzles could possibly be completed. The 10-clue question is many, many times easier than the enumeration of 11-clue puzzles. See the comparison above. There is really just one definitive (exhaustive search) algorithm available in either case, and that's the Hitting Set approach.

leren wrote:Even if the 10 clue puzzle problem is not resolved, the enumeration of all 11 clue puzzles is, to me, an equally meritorious achievement, and should be completed.

I'm gratified that you feel that way, really! And I can certainly share the work around - I could provide participants with a small set of grids that need to be searched, plus software to do the job. If anybody else volunteers I will seriously consider that. Could be fun!

The jobs I am running at the moment will eliminate 80-90% of grids from consideration, hopefully even more than that (indeed it is absolutely critical that we eliminate at least 95%).

I will have a clearer picture when my "Phase 2" jobs finish (in perhaps a week).

Cheers!

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### Re: SudokuP - Min Clue Project

I can see that there is a lot of parallel thinking and posting going on here. This SudokuP 10/11 clue thing has really got us all fired up - probably the most interesting thing to happen in Sudoku in years.

I just saw coloin's last post where he says that for an 11 clue puzzle there must always be a box with at least two clues. I have been thinking the same thing, but think I can say more.

We know what and where the 2 clues are (or so I think). AFAIK you only need to consider two cases for these 2 clues.

Code: Select all
` 1   2   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . 1   .   .  | .  .  .  | .  .  . .   2   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .`

How do I (think) I know that. Well, it's simply a matter of, for some 11 clue puzzle, morphing the clue set in such a way that it lines up with one of these cases.

What about the case where he says that a box has 3 clues ? I think the answer is

Code: Select all
` 1   2   3  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . 1   2   .  | .  .  .  | .  .  . 3   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . 1   2   .  | .  .  .  | .  .  . .   .   3  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .------------+----------+--------- .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  . .   .   .  | .  .  .  | .  .  .`

If no box has 3 clues, then at least 2 boxes must have 2 clues, but I won't proceed further until someone checks what I've done and says that it has no errors and/or is not complete rubbish.

Leren
Leren

Posts: 4338
Joined: 03 June 2012

### Re: SudokuP - Min Clue Project

.
I have found a rather novel method for finding UA's!

For the 11-clue test case above I started with just 39 UA's from my simple cycle method (), and had to run the GridChecker tool separately to extend this to 124 (and these 124 were used to produce the HS stats quote above).

My new UA generator finds over 700 UA sets (not an exhaustive list), and all in just a few seconds. GridChecker took 75 seconds to identify 3663 UA's for Sudoku, of which only 124 were valid for SudokuP.

Perhaps we probably can't use that many UA's effectively, but it does give us choice, and with choice the ability to tune the HS generation process.

For example, deploying the first 250 UA's greatly reduces the HSets enumerated for the test case:

The figures given above reduce to:

• 0 HSets of size 8
• 1,259 HSets of size 9
• 624,151 Hsets of size 10
• 72,263,743 Hsets of size 11

Applying the same free-cell multipliers, that reduces the number of 11-clue puzzles to be tested from 3.6 billion, to around 120 million. HS generation (as yet unoptimised) time goes up, but overall job time is reduced by a factor of 6.

PS: I see that there are several new posts that have been made while I was busy with this! In particular blue has made some observations on the HS algorithm. I will review these shortly.

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### Re: SudokuP - Min Clue Project

Hi blue,

Thanks mate, another very useful contribution.

Yes, I am using a "dead-clue" list along the lines of McGuire & co, but I had separated the HS-generation process from the solving process while building a test rig, and so I hadn't made the connection between dead-cells and free-cells. Nice one!

blue wrote:It may be typical of a grid with an 11-clue puzzle, but I don't think it's typical of randomly selected grids.
I think you'll see a lot of cases with no hitting sets at all -- even with your "MCN" filtering in place.

I chose the 11-clue to test because it seemed to be typical of low-MCN cases, which should be the hardest grids for the HS generator. There may of course be even harder cases to come, but no hitting sets at all? I wonder ...

The impact of having a relatively large pool of UA's available (as we now have) for any grid is yet to be explored, but I will use the dead-clue list to re-work my examples.

Viability of 11-Clue enumeration does seem to have made a recovery ("I'm not dead!!") ... the numbers are becoming less boggling! (hi coloin!)

=================================================================

Ok, I have recalculated the figures for the 11-clue test case. The table following compares job sizes (actual number of 10-clue and 11-clue puzzles that need to be tested) resulting from use of UA collections of size 124 (the GridChecker set), 170 and 250:

Code: Select all
`                 NUA = 124                        NUA = 170                     NUA = 250 HS         NHS      NP10       NP11 |       NHS      NP10       NP11 |      NHS    NP10     NP11 -------------------------------------------------------------------------------------------------  8        1130    961522   13168967 |         5      4284      57400 |        0  9      375331  13999060  261806637 |     15377    597302   11584912 |     1259   48688    940486 10    27266402  27266402  923015446 |   3479156   3479156  122134755 |   624151  624151  21899969 11   620690909       n/a  620690909 | 208718929       n/a  208718929 | 72263743     n/a  72263743 -------------------------------------------------------------------------------------------------                 42226984 1818081959               4080742  342495996             672839  95104198`

HS is Hitting Set size, NHS is # of HSets of that size, NP10 and NP11 are the number of 10-clue/11-clue puzzles to be tested.

NP10 and NP11 are calculated individually for each HSet, ie: the # of ways to choose the additional cells from the applicable free cell list for that HS.

Note that the most time consuming part of these jobs is the counting/enumeration of 11-clue HSets. This took 12 hours for the 124 UA case, and 4 hours for the 170 and 250 cases. The HS enumerator is unoptimised, so faster times are probably available, but not dramatically so.

Note that enumeration of the 11-clue HSets in particular takes much longer than the puzzle-solving, ie testing the puzzles. So despite the 11-clue problem with 250 UA's seeming to be 20 times smaller than the 124 UA case, in terms of total time it is perhaps only 4 to 6 times smaller.
Last edited by Mathimagics on Mon Mar 12, 2018 6:16 pm, edited 2 times in total.

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### UA Generation - Monte Carlo

.
While considering an alternative approach to UA generation alluded to earlier, ie: generating grids that are more than one "Pittenger-step" (cycle-change) distant from the starting grid, I came up with this:

UA Generation (Monte Carlo)

• remove random cells from complete grid until we get multiple solutions
• if that number of solutions is 2, then compare the 2nd solution grid with the first, the list of cells that differ is a UA
• repeat until desired number of distinct UA's is obtained

• a fast solver means fast UA generation
• the method works for any variant, just plug in the appropriate solver
• no grid analysis is required

The last point is crucial - grid pattern analysis, eg as done by GridChecker, gets more expensive as we look for UA's involving more than 3 different cell values, there are more combinations of values and more templates that need to be checked.

This method turns up UA's involving multiple cell values - my test case had UA's involving 5, 6 and 7 values.

Another potential failing of grid analysis is that it can miss UA's that are a union of invalid individual cycles. For every cycle in our list, it needs to check all cycle combinations that can (and often do) produce a valid grid. This all takes more time ...

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### Re: SudokuP - Min Clue Project

.
We have abandoned phase 2 as described above. The clique detection is simply too slow, and not eliminating sufficient candidates for the 11-clue problem to justify it.

So we restart using the 25,906,034 survivors of Phase 1, and with Phase 2 now calculating MCN from the size of the smallest Hitting Sets. We don't enumerate them HSets, we just find the smallest.

As expected, this eliminates most of the remaining grids (or will do in time), but it's currently a little too expensive for comfort.

The HS algorithm is slow when we use too many UA's (it never rains but it pours). I'm looking at a method that uses an incremental approach to try and arrive at the result more quickly.

The idea is that if we have a HSet for the first N UA's, then we can add another set of N (or any increment), and if we save the HSets, then we can test each one for validity ie. hitting, of the the additional UA's. If we repeat with suitable increments it might be faster.

Now this involves the enumeration of HSets, but only for sizes up to 10, it just has to decide whether an 11 exists. If not we eliminate the candidate. It might just work, the pool of HSets of size < 11 will probably fit into memory comfortably.

Meanwhile, I've also decided to split the project into two distinct sub-processes, the 10-clue decision problem and the 11-clue enumeration problem. Since 10-clue candidate elimination is very much easier, so we can make much better progress there.

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### Re: SudokuP - Min Clue Project

.
The incremental HSet idea seems to be a lost cause, too many fish in the HSet pool, in a nutshell.

So we are proceeding with the existing HS algorithms, and have cranked up Phase 3 for the 10-clue problem. This just tests candidates not yet eliminated for the existence of HSets of size 10 or less, writes these to a file for use in Phase 4 (the money phase).

As expected, not many candidates survive this process, and the jobs do complete in a reasonable time (20-25 minutes).

So, with 5367 jobs to do, 4 processes running, say 1/2 an hour per job, we are still looking at 700 hours, that's about a month.

The 11-clue enumeration is deferred until we complete the 10-clue problem. Estimates of Phase 3 time for 11-clue mode are too horrible to contemplate just now ...

The 5367 figure refers to the original division of the problem into 10K grid jobs. But for Phase 3 in 10-clue mode we only actually do about 2200 grids per job, since only survivng candidates with MCN < 11 need to be tested.

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### 10-Clue Question: Progress Update

.
Phase 3 has completed 10% of the grids. From the first 5,380,000 grids, just 20,162 (0.04%) are marked for further consideration as 10-clue candidates.

We are only using a small (50) set of UA's at this stage, so this number is very likely to be reduced even further by a repeat dose of Phase 3 with more UA's.

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### SudokuP: 10-clue progress

.
Update time!

First of all, I have re-labelled the phases, so what was referred to above as "Phase 3" is now "Phase 2". This phase involves multiple passes (2.0, 2.1 etc) using increasing numbers of UA's (50, 150, 250 etc) to reduce the candidate pool by eliminating candidate grids with no HS of size 10 or less.

Phase 1 left us with 11,800,147 grids in the candidate pool:
Code: Select all
`Phase 1:  MCN <= 10:  11800147   23%  MCN  = 11:  14105887   26%  MCN >= 12:  27760655   51%  --------------------------  Total:      53666689  100%  Time/grid:     0.021s`

Phase 2.0 is now complete, reducing the pool to 5,195,763 grids:
Code: Select all
`Phase 2.0 (# of UAs = 50):  MCN <= 10:   5195763    44%  MCN >= 11:   6704384    56%  --------------------  Total:      11800147      Time/grid      1.028s`

Phase 2.1 is underway, and is expected to reduce the pool to around 800K grids:
Code: Select all
`Phase 2.1 (# of UAs = 150):  Pool:        5195763    MCN <= 10:     31984        MCN >= 11:    168940  --------------------  Total:        200924    4%  Time/grid      4.259s`

The time per grid gives us an idea of how long Phase 2.1 will take. At 4.25s per grid we will need around 22 Ms (millions of seconds). My processing capacity is about 0.69 Ms per day, which means we will need at least 32 days.
Last edited by Mathimagics on Sun Apr 08, 2018 9:27 am, edited 1 time in total.

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### Re: SudokuP - Min Clue Project

i am sorry, that i don't have any cpu capacities at all. seems, that you have to do it alone - like champagne with his 17 clue project.
eleven

Posts: 2805
Joined: 10 February 2008

### SudokuP: 10-clue progress

.
Phase 2.1 is now completed, and the grid pool is now 1,986,949:

Code: Select all
`Phase 2.1 (# of UAs = 150):  Pool:        5195763    MCN <= 10:   1986949   38%   MCN >= 11:   3208814   62%  Time/grid      4.163s`

Now running Phase 2.2 (# of UA's = 300). Grid time looks to be 20-25s, and so we need 40-50 Ms (60 - 70 days est). On the plus side, the elimination rate should be noticably higher with 300 UA's.

Also, coloin has offered a spare PC to assist, so we might be able to reduce this time.

Mathimagics
2017 Supporter

Posts: 1901
Joined: 27 May 2015
Location: Canberra

### Re: SudokuP - Min Clue Project

.
For coloin, here is the Phase 2.2 executable. Unzip to the same location where you unzipped the data files that I emailed you.
Attachments
SudokuP22.zip
Updated 8 May 2018