The New Sudoku Players' Forum

by **Mathimagics** » Wed Jun 26, 2019 2:56 am

I have recently been giving some thought to this issue, prodded by a comment from coloin made recently (in the High clue tamagotchis thread).

coloin wrote:... maybe one in three solution grids have an 18 clue puzzle
... almost all solution grids have at least one 19
… the MC grid doesnt have a 19 and only has one ED 20-puzzle

Closely related to the MNC (Minimum Number of Clues) problem is what I call, for want of a better term, the Low Clue Threshold - the smallest value of N for which every solution grid has an N-clue puzzle.

From the observations above, it looks very much like LCT = 20 for standard Sudoku, but of course we would like proof of the fact. Having a grid with no 19-clue puzzle, if we were to produce evidence of 20-clue or better puzzles for all ED grids, this would establish LCT = 20 as fact.

(I see from old threads going back to 2005/2006 that the occurrence of 19C and 20C puzzles was discussed in some detail, eg Grid containing a 20 but no 19)

With faster CPU's and algorithms (mostly), proof that LCT = 20 looks feasible to me. Perhaps measurable in "CPU months" whereas the "No 16C" problem involved "CPU decades" …

It might also make feasible a subsequent identification of all grids with no 19C …

by **Mathimagics** » Wed Jun 26, 2019 3:02 am

There is also the companion problem, the High Clue Threshold - the largest of N for which every solution grid has a minimal N-clue puzzle.

From what I can gather, for standard Sudoku HCT is possibly 36 or 37?

This is a much tougher nut to crack than LCT, I think ...

by **qiuyanzhe** » Wed Jun 26, 2019 3:50 am

I didn't find it easy to prove that
all N's for a specific solution grid, such that it has a minimal N-clue puzzle
forms a region(i.e. a consecutive set of numbers).
Though it is very likely to be true, there are counterexamples for non-sudoku structures. So I would rather express HCT as
Find the largest of N for which every solution grid has a minimal puzzle with at least N clues.

Admittedly, this would still be tougher than LCT drastically, and even more complex than the previous version..

by **Mathimagics** » Wed Jun 26, 2019 8:48 am

Good point! :cool:

Ok, HCT = N should mean that every grid has a minimal puzzle with at least N clues …

by **blue** » Wed Jun 26, 2019 5:19 pm

New info, possibly : Grids containing a 21 but no 20 ... awaiting 2nd party verification.

by **Mathimagics** » Wed Jun 26, 2019 10:37 pm

Hi blue!

Well done!

So we now have LCT > 20

dobrichev wrote:Confirming non-existence of 20-clue puzzles in these 4 grids.
Code: Select all
123456789456789123789123456214897365365214897897365214541632978632978541978541632 puz20=0 puz21= 7488 123456789456789123789132465218967534564213978937548216391875642645321897872694351 puz20=0 puz21= 3138 123456789457189326689327154216534897745891632938672541361245978574918263892763415 puz20=0 puz21= 3894 123456789457189326689327154216534897745891632938672541392765418574918263861243975 puz20=0 puz21=19682

by **coloin** » Mon Jul 01, 2019 7:39 pm

Indeed well done - grids with automorphism tend to have less ED puzzles - of course !

so the LCT situation could be approx -

17 clues - 46300 ED grid solutions
18 clues - 1e9 grids
19 clues - 4e9 grids
20 clues - 100000 grids ? [including MC and PT grids - only 1 ED 20-puzzle in each]
21 clues - [only ?] 4 grids

the HCT is just a little bit more problematic to confirm

35 clues ? any
36 clues - the MC grid [ it did not have a 37 found in dobrichev 's files]
37 clues
38 clues
39 clues
40 clues - 2 puzzles 1 grid solution

by **dobrichev** » Mon Jul 01, 2019 8:45 pm

40 clues - 1 grid

Code: Select all: 123456789456789123798132564231675498584923617679841235367518942815294376942367851 1..4....9.5..8..2..981.2..4..1..5.9.58.92...7.798412.5.67518.4281.294.76......... 1..4....9.5..89.2..981.2..4..1..5.9.58..2...7.798412.5.67518.4281.294.76.........

by **dobrichev** » Mon Jul 01, 2019 9:29 pm

36 clues - ? grids, 205 GB zip puzzles, small portion found
37 clues - ? grids, 423 774 106 puzzles
38 clues - 1 491 898 grids, 2 123 548 puzzles, significant portion found
39 clues - 2291 grids, 2651 puzzles, significant portion found

by **Mathimagics** » Tue Jul 02, 2019 12:49 am

Thanks Mladen.

@coloin: I believe that the 17C grid count is currently 46300

by **Mathimagics** » Mon Jul 08, 2019 4:21 am

I've made a start on Project LCT. The first stage is "LCT-20", which will look at the existence of 20-clue puzzles. LCT-20 started with a copy of the ED grid catalog. When we find a 20C puzzles on an ED grid, we replace the grid with this representative puzzle. A status code identifies the number of clues in the puzzle, and whether this number is known to be a minimum.

The very first update to the LCT catalog replaced all the known 17C grid entries (46,300) with a 17C puzzle and marked with status = "17X", indicating that a 17C puzzle exists, and we know that no 16 puzzle exists. When we find a 20C puzzle for a grid we don't already have a result for, it is assigned status code "20C", which means this puzzle exists (obviously) but is not necessarily the minimum clue count possible for this grid.

We also have, thanks to blue (and confirmed by dobrichev), 4 grids that have known 21C, but have been proven to have no 20C puzzles. These 4 are registered with status "21X".

The objective of LCT-20 is to identify for ALL unresolved grids whether or not a 20C puzzle exists, to register the puzzle exemplars for all those found to have one, and finally to identify the lowest number of clues for the (hopefully few) grids that remain unresolved. If 21C puzzles can be found for all of these then the hypothesised result of "LCT = 21" is proven, and we will have the evidence to back that up.

We will also have what I hope will be a useful resource in the form of the LCT catalog itself.

Finding 20C Puzzles

This can be done in 3 ways that I know of:

Morphing: this is the fastest method of finding 20C puzzles, by morphing existing ones. A "new grid" is identified when a morphed puzzle solution yields a CF form that is new to the catalog. Very effective, but diminishing returns apply, as the yield ( the % of morph results that find new grids) degrades over time.

Reduction: this tests individual grids, by repeated random reduction of the solution grid to a minimal puzzle state, until a puzzle of the desired size is found. For 20C/21C puzzles this typically takes, on my system, a few seconds per grid.

HSE: Hitting Set Enumeration, this is the slowest method of testing a specific grid, but has the advantage of rigorousness - it can definitively prove whether or not a 20C puzzle exists. Method of last resort for individual grid status resolution.

Current Status

I have 4 x worker processes running on each of my 2 PC's. Each worker uses morphing to find as many 20C grids/puzzles as possible. These are grouped together in batches, each batch containing a set of 20C ED grids+puzzles, sorted and with duplicate grids removed. The worker doesn't actually check the catalog, so these are not necessarily new grids. The new grid yield only becomes apparent when batches are processed by the catalog update process.

The update process is BAND driven - the catalog has a separate file for each band, and the largest band can be wholly loaded into available RAM, so the method I have adopted is to process a set of batch files (currently 64) in parallel, loading/updating each band in sequence. (The update process is probably sub-optimal, and I am still trying to improve it, but optimising IO-bound processes is a tricky proposition, particularly so on Windows).

Currently one update pass takes about 3-4 hours. For a database that is effectively 420Gb, that's probably reasonable enough. Unfortunately the process takes much longer when worker processes are running. Currently the workers are doing lots of IO themselves, and this seems to seriously affect the time taken for the update process to write the new band contents. I'm working on a new worker process that will use much less IO, but meanwhile I have to stop the workers whenever I want to do an update.

Nevertheless, despite these problems I am getting yields of 100 million new grids identified each day (initially it was around 150 million), and at the last update had found 20C puzzles for 2.27 billion grids (41.5%).

A batch set (64 batches) typically has up to 300 million grids, of which on average 230 million are distinct within the set. The workers, happily, deliver mostly distinct grids wrt each other.

The net yield at update time for a batch set is currently 45%. But this is coming down with each set, of course. 5 days ago (7 sets) it was 67%.

by **dobrichev** » Mon Jul 08, 2019 7:06 am

Hi Mathimagics,

For a particular size N, you need a file with one bit per grid. This is for the worst case "flat" model where neither almost all grids are expected to be set to "false" nor "true". (size 18?)
All you need is to obtain each grid "catalogue number" and update the bit, which can be done on single pass over catalogue, your ordered set, and the result.
While accumulating your set, a similar single pass merges can be done on the smaller but already ordered sets.

Of course for the 17s and 21s it is better to keep directly the ordered grid lists, either "by value" or "by reference".

Key phrases: "sort -mu", "comm -13".
On windows use gsort since the built-in stupid sort isn't easy to remove.

by **coloin** » Mon Jul 08, 2019 10:38 am

Well .... I knew this exercise might be coming ....
However .... I think most of the Ed grids will have a 19 ( or an 18 )
It might be more worthwhile to generate and register the 19 grids first
And then individually check the missing ? 50k solution grids

And of course in a true morphing process.... if the pattern is kept the same ..... all puzzles generated will be from different solution grids ( excluding 8/9 clue swops ) ( excluding automorphic )

by **champagne** » Mon Jul 08, 2019 11:09 am

coloin wrote:Well .... I knew this exercise might be coming ....
However .... I think most of the Ed grids will have a 19 ( or an 18 )
It might be more worthwhile to generate and register the 19 grids first
And then individually check the missing ? 50k solution grids

And of course in a true morphing process.... if the pattern is kept the same ..... all puzzles generated will be from different solution grids ( excluding 8/9 clue swops ) ( excluding automorphic )

and the "best process" described by our friend is very expensive in the range 18/19 clues.
If one wants really qualify a solution grid for the lowest possible number of clues, IMO, the start must be 18 clues (assuming that all 17s are known)

by **Mathimagics** » Mon Jul 08, 2019 2:22 pm

Thanks to everybody for those observations! 8-)

Mladen's comments will be addressed separately.

Regarding coloin's and champagne's points:

the objective here is not to record the absolute minimum number of clues for each and every grid, just to identify the best known result for each grid. For any 17C case, we can give it status 17X (indicating absolutely the minimum possible) because all grids have previously been rigorously 16C-checked (via HSE method) and no 16C puzzle exists, therefore any 17C is automatically a 17X.

But for any other number of clues, we can only prove absolutely that it has no smaller puzzles by HSE-testing. So we would need to check all the 18C cases (assuming we had found them) via HSE-testing in order to determine whether it's really 18X, or is a new 17X. This is a daunting task if there are billions, and all evidence suggests there are perhaps 1.8 billion or so (I think roughly 1/3 of grids is the current estimate?).

both 19C/20C categories are more easily handled, since most grids have a 19C puzzle, so the existence question can most likely be answered for all grids in reasonable time (where anything less than 6 months might be "reasonable", but anything requiring decades (or worse!) is definitely "unreasonable").

coloin

the number of grids that do NOT have 20C is likely to be very small (perhaps those found by blue are the complete set). This means that the number of grids I have to HSE-test will be very small (perhaps 0). HSE-testing is notoriously expensive, and I plan to do it as little as possible!

working back from 20C to 19C in a separate "LCT 19" phase is relatively simple. Many of the 20C puzzles registered actually have redundant clues already. Either way the grids that will require HSE-testing for 19-cue existence is the same.

Cheers!
MM

The New Sudoku Players' Forum

Low/Hi Clue Thresholds

Low/Hi Clue Thresholds

Hi Clue Threshold

Re: Low/Hi Clue Thresholds

Hi Clue Threshold

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds

Low Clue Threshold Project

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds

Re: Low/Hi Clue Thresholds