Low/Hi Clue Thresholds

Everything about Sudoku that doesn't fit in one of the other sections

Re: Low/Hi Clue Thresholds

Postby eleven » Mon Nov 11, 2019 12:00 am

The crowd is getting nervous. blue's lower limit almost reached. And 72 hours to go. Can it blow up the upper limit? Overtime now in the betting shops.
(a dollar on no)
eleven
 
Posts: 3235
Joined: 10 February 2008

Re: LCT Project Update

Postby eleven » Sat May 30, 2020 6:21 pm

Mathimagics wrote:[list][*]PC "JACK" (16-core) is currently being used to assist with champagne's 17C search process.

Good decision. Unfortunately i don't have any free resources (and no windows).
eleven
 
Posts: 3235
Joined: 10 February 2008

Re: Low/Hi Clue Thresholds

Postby m_b_metcalf » Mon Jul 08, 2019 4:00 pm

No sooner said, ...
Attachments
20clues.txt
(52.85 KiB) Downloaded 231 times
19clues.txt
(15.48 KiB) Downloaded 211 times
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13647
Joined: 15 May 2006
Location: Berlin

Re: Low/Hi Clue Thresholds

Postby m_b_metcalf » Tue Jul 09, 2019 6:32 pm

Mathimagics wrote:BTW guys, anybody who is in possession of puzzle collections of 20C or less can contribute to this project by simply sending me those puzzles.

18C or 19C collections would be particularly valuable, for obvious reasons! 8-)

Here are two small collections of 18-clue puzzles. (Are you hoping for hundreds or for zillions?) The first is derived (using a stochastic-walk vicinity search) from the 19-clue set. Many are symmetrical.

The second set is that posted by olimpia here, followed by some neighbours (obtained using the same method). They are all symmetrical.

Regards,

Mike
Attachments
18clues_2.txt
Symmetric
(28.81 KiB) Downloaded 203 times
18clues_1.txt
From 19s
(18.04 KiB) Downloaded 203 times
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13647
Joined: 15 May 2006
Location: Berlin

Low/Hi Clue Thresholds

Postby Mathimagics » Wed Jun 26, 2019 2:56 am

I have recently been giving some thought to this issue, prodded by a comment from coloin made recently (in the High clue tamagotchis thread).

coloin wrote:... maybe one in three solution grids have an 18 clue puzzle
... almost all solution grids have at least one 19
… the MC grid doesnt have a 19 and only has one ED 20-puzzle


Closely related to the MNC (Minimum Number of Clues) problem is what I call, for want of a better term, the Low Clue Threshold - the smallest value of N for which every solution grid has an N-clue puzzle.

From the observations above, it looks very much like LCT = 20 for standard Sudoku, but of course we would like proof of the fact. Having a grid with no 19-clue puzzle, if we were to produce evidence of 20-clue or better puzzles for all ED grids, this would establish LCT = 20 as fact.

(I see from old threads going back to 2005/2006 that the occurrence of 19C and 20C puzzles was discussed in some detail, eg Grid containing a 20 but no 19)

With faster CPU's and algorithms (mostly), proof that LCT = 20 looks feasible to me. Perhaps measurable in "CPU months" whereas the "No 16C" problem involved "CPU decades" …

It might also make feasible a subsequent identification of all grids with no 19C …
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Hi Clue Threshold

Postby Mathimagics » Wed Jun 26, 2019 3:02 am

There is also the companion problem, the High Clue Threshold - the largest of N for which every solution grid has a minimal N-clue puzzle.

From what I can gather, for standard Sudoku HCT is possibly 36 or 37?

This is a much tougher nut to crack than LCT, I think ...
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Hi Clue Threshold

Postby Mathimagics » Wed Jun 26, 2019 8:48 am

Good point! :cool:

Ok, HCT = N should mean that every grid has a minimal puzzle with at least N clues …
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Low/Hi Clue Thresholds

Postby Mathimagics » Wed Jun 26, 2019 10:37 pm

Hi blue!

Well done!

So we now have LCT > 20

dobrichev wrote:Confirming non-existence of 20-clue puzzles in these 4 grids.
Code: Select all
123456789456789123789123456214897365365214897897365214541632978632978541978541632 puz20=0 puz21= 7488
123456789456789123789132465218967534564213978937548216391875642645321897872694351 puz20=0 puz21= 3138
123456789457189326689327154216534897745891632938672541361245978574918263892763415 puz20=0 puz21= 3894
123456789457189326689327154216534897745891632938672541392765418574918263861243975 puz20=0 puz21=19682

User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Low/Hi Clue Thresholds

Postby Mathimagics » Tue Jul 02, 2019 12:49 am

Thanks Mladen.

@coloin: I believe that the 17C grid count is currently 46300
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Low Clue Threshold Project

Postby Mathimagics » Mon Jul 08, 2019 4:21 am

I've made a start on Project LCT. The first stage is "LCT-20", which will look at the existence of 20-clue puzzles. LCT-20 started with a copy of the ED grid catalog. When we find a 20C puzzles on an ED grid, we replace the grid with this representative puzzle. A status code identifies the number of clues in the puzzle, and whether this number is known to be a minimum.

The very first update to the LCT catalog replaced all the known 17C grid entries (46,300) with a 17C puzzle and marked with status = "17X", indicating that a 17C puzzle exists, and we know that no 16 puzzle exists. When we find a 20C puzzle for a grid we don't already have a result for, it is assigned status code "20C", which means this puzzle exists (obviously) but is not necessarily the minimum clue count possible for this grid.

We also have, thanks to blue (and confirmed by dobrichev), 4 grids that have known 21C, but have been proven to have no 20C puzzles. These 4 are registered with status "21X".

The objective of LCT-20 is to identify for ALL unresolved grids whether or not a 20C puzzle exists, to register the puzzle exemplars for all those found to have one, and finally to identify the lowest number of clues for the (hopefully few) grids that remain unresolved. If 21C puzzles can be found for all of these then the hypothesised result of "LCT = 21" is proven, and we will have the evidence to back that up.

We will also have what I hope will be a useful resource in the form of the LCT catalog itself.

Finding 20C Puzzles

This can be done in 3 ways that I know of:

  • Morphing: this is the fastest method of finding 20C puzzles, by morphing existing ones. A "new grid" is identified when a morphed puzzle solution yields a CF form that is new to the catalog. Very effective, but diminishing returns apply, as the yield ( the % of morph results that find new grids) degrades over time.
  • Reduction: this tests individual grids, by repeated random reduction of the solution grid to a minimal puzzle state, until a puzzle of the desired size is found. For 20C/21C puzzles this typically takes, on my system, a few seconds per grid.
  • HSE: Hitting Set Enumeration, this is the slowest method of testing a specific grid, but has the advantage of rigorousness - it can definitively prove whether or not a 20C puzzle exists. Method of last resort for individual grid status resolution.

Current Status

I have 4 x worker processes running on each of my 2 PC's. Each worker uses morphing to find as many 20C grids/puzzles as possible. These are grouped together in batches, each batch containing a set of 20C ED grids+puzzles, sorted and with duplicate grids removed. The worker doesn't actually check the catalog, so these are not necessarily new grids. The new grid yield only becomes apparent when batches are processed by the catalog update process.

The update process is BAND driven - the catalog has a separate file for each band, and the largest band can be wholly loaded into available RAM, so the method I have adopted is to process a set of batch files (currently 64) in parallel, loading/updating each band in sequence. (The update process is probably sub-optimal, and I am still trying to improve it, but optimising IO-bound processes is a tricky proposition, particularly so on Windows).

Currently one update pass takes about 3-4 hours. For a database that is effectively 420Gb, that's probably reasonable enough. Unfortunately the process takes much longer when worker processes are running. Currently the workers are doing lots of IO themselves, and this seems to seriously affect the time taken for the update process to write the new band contents. I'm working on a new worker process that will use much less IO, but meanwhile I have to stop the workers whenever I want to do an update.

Nevertheless, despite these problems I am getting yields of 100 million new grids identified each day (initially it was around 150 million), and at the last update had found 20C puzzles for 2.27 billion grids (41.5%).

A batch set (64 batches) typically has up to 300 million grids, of which on average 230 million are distinct within the set. The workers, happily, deliver mostly distinct grids wrt each other.

The net yield at update time for a batch set is currently 45%. But this is coming down with each set, of course. 5 days ago (7 sets) it was 67%.
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Low/Hi Clue Thresholds

Postby Mathimagics » Mon Jul 08, 2019 2:22 pm

Thanks to everybody for those observations! 8-)

Mladen's comments will be addressed separately.

Regarding coloin's and champagne's points:

  • the objective here is not to record the absolute minimum number of clues for each and every grid, just to identify the best known result for each grid. For any 17C case, we can give it status 17X (indicating absolutely the minimum possible) because all grids have previously been rigorously 16C-checked (via HSE method) and no 16C puzzle exists, therefore any 17C is automatically a 17X.

    But for any other number of clues, we can only prove absolutely that it has no smaller puzzles by HSE-testing. So we would need to check all the 18C cases (assuming we had found them) via HSE-testing in order to determine whether it's really 18X, or is a new 17X. This is a daunting task if there are billions, and all evidence suggests there are perhaps 1.8 billion or so (I think roughly 1/3 of grids is the current estimate?).
  • both 19C/20C categories are more easily handled, since most grids have a 19C puzzle, so the existence question can most likely be answered for all grids in reasonable time (where anything less than 6 months might be "reasonable", but anything requiring decades (or worse!) is definitely "unreasonable").
    coloin is correct in pointing out that 19C might have been a better choice for the initial build, and I considered doing so, but chose 20C initially are these:

    • the number of grids that do NOT have 20C is likely to be very small (perhaps those found by blue are the complete set). This means that the number of grids I have to HSE-test will be very small (perhaps 0). HSE-testing is notoriously expensive, and I plan to do it as little as possible!
    • working back from 20C to 19C in a separate "LCT 19" phase is relatively simple. Many of the 20C puzzles registered actually have redundant clues already. Either way the grids that will require HSE-testing for 19-cue existence is the same.

Cheers!
MM
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Low/Hi Clue Thresholds

Postby Mathimagics » Mon Jul 08, 2019 2:44 pm

coloin wrote:.... if the pattern is kept the same ..... all puzzles generated will be from different solution grids ( excluding 8/9 clue swops ) ( excluding automorphic )


True, and the worker processes use both same-pattern morphing and {+1, -1} morphing methods to find new grids. Unfortunately, using ONLY the same-pattern method doesn't succeed in enough cases to provide a sufficient supply of new "seeds" for the process. So we use both methods ...
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Low/Hi Clue Thresholds

Postby Mathimagics » Mon Jul 08, 2019 2:48 pm

BTW guys, anybody who is in possession of puzzle collections of 20C or less can contribute to this project by simply sending me those puzzles.

18C or 19C collections would be particularly valuable, for obvious reasons! 8-)
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Low/Hi Clue Thresholds

Postby Mathimagics » Mon Jul 08, 2019 4:32 pm

Thanks Mike! 8-)

PS: coloin, champagne: I have responded to the points you raised, see last post of the previous page ...
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

Re: Low/Hi Clue Thresholds

Postby Mathimagics » Tue Jul 09, 2019 6:40 am

Hi Mladen!

I think your model makes sense, but would be more appropriate if I did not want to actually keep track of a sample puzzle for each grid. A flat bit-mask would be more efficient otherwise. If I have misunderstood your outline, I'm sure you'll let me know!

dobrichev wrote:Key phrases: "sort -mu", "comm -13".
On windows use gsort since the built-in stupid sort isn't easy to remove.


The worker processes that generate the candidate puzzles do have to sort their results, and remove duplicates. Currently they invoke the builtin Windoze sort via system("sort …."). The duplicates are then removed by the program. Both sort and dup-removal steps are slow particularly on the final pass for each batch produced (typically 30 million records, reducing to 3 or 4 million).

I have a fast sort utility (XSM, from http://www.hhns.fr/xsm/en/) which can do both sort & dup removal in one go, and has excellent performance, but I can't use it here because the free version, while fully functional, has a "Press Enter to continue" fudge to foil automation. The package is geared towards high-end commercial users, so it is not cheap. But I do use it often for manual big sort/merge tasks so can't complain, it has been of great use.

I'm working on a new worker which will use the qsort function, and that should speed things quite up a bit.

But I don't know anything about "gsort", what is that? And your "key phrases" mean what?

Cheers
MM
User avatar
Mathimagics
2017 Supporter
 
Posts: 1926
Joined: 27 May 2015
Location: Canberra

PreviousNext

Return to General