scan solution grids for 17 clues as of blue

Programs which generate, solve, and analyze Sudoku puzzles

Re: scan solution grids for 17 clues as of blue

Fresh news on this project, with as usual both good and bad news.

First of all an amazing good surprise. Blue wrote several times that he had been surprised by some unexpected good results. This was the case with the tailor made brute force called during the processing of the bands 3.

As far as I can see, the global effect is to cut by 5 to 8 the previous run time. Say 5 for "easy cases" and 8 for "bad cases".

For the 50 "bad cases" shown in the post 2, the new run time is around 8 times faster than before.

For the sample of 10 000 ED bands 1+2 the new run time is in the range 5-6 times faster than the previous code.

Through private exchanges, mladen Dobrichev warned me against a possible bug. This is the permanent threat in such work. However, the secret seems to be in the new brute force tailor made algorithm.

The way guesses are done produces many new GUAs (gangster UAs having all cells located in one "pseudo mini row"). These GUAs can be reused in many ways. This is something claimed in blue's wording, but with missing details and code to understand how he did it.

So the 10 000 ED bands 1+2 sample with an average 13 bands per entry is now solved in 1.07 second per entry, less than 0.1 second per solution grid for this pass 2 a (clues distribution 656 or 566)

This is likely equivalent to 0.2 second per solution grid for the whole search of 17s (to compare to more than 3 seconds per solution grid in the Gary McGuire proof that no 16 clues exists)

On the bad side, I thought that the 10 000 ED bands 1+2 sample was very close to a full exploration of the sample. In fact, a full exploration requires 20 632 ED bands 1+2. Missing ED are at the bottom of the sample, with a small number of attached bands 3, but many more XY to explore in average. At the end, the full sample had an average run time of 1.7 second per entry. This gives an average 2.5-3 seconds per entry for the second part of the sample and lets expect an average 5-10 seconds per entry for the bottom part of the file

The new bad cases have now a run time in the range 10-15 seconds per entry and I added in post 2 a list of the corresponding items out of the sample.

The second relatively bad news is the multi cores penalty. I run the same sample with 4 cores active on tests of the same program in other areas. The run time was higher by about 30%.

The current version of the program passed the test on the known 17, but this is not enough to assess that the program is correct. The test has a big bias, it is done with a unique band 3 attached to the band 1+2. The next test (ongoing with an outdated version of the code) is to run the program in an area where 17s with the adequate distribution are known.

This will be done with the band 1 of rank 9 (band n° 48) where 8 17s are expected (btw my sample test delivers one 17). Band of rank 9 should have an average low time per ED band 1+2 but between 2 and 3 millions ED, so this is a relatively long test. I have next week off and I'll try to run the test during that week.

In parallel, I have to update the comments . The revised version of the comments will be uploaded without notice; Next post on my side will comment the test on the band of rank 9.

The current code has been released on my website. The release can be downloaded at the following place here

This code has a frame different from my working file. The changes were transferred in the released code but I still have a minor difference in the 2 versions. This likely reflects a small variation in the place where calls to the brute force are done in band 3. As I have my week off soon, I did not want to postpone the release.

More on the tailor made brute force.

The general process is very similar to what is done for bands 1+2.

Produce fresh GUAs when a call to brute force is done if the puzzle has a multiple solution
Re use the fresh GUAs in the vicinity using FIFO tables
Re use the GUAs as "primary GUAs" in the next steps where the GUAs are put in vector mode (2X2Y steps, processing band 1 and band 2 valid sub puzzles having the same first 2 cells).

The guess process try to start the false guess with 2 stacks in band 3 set to the "true value". Doing so, in most cases, the first "false" solution is GUA pattern.

To be re used later, the false solution is switched to the gangster equivalent (band 1+2 GUA plus socket id).

As we have 81 + 81 sockets, and may calls to the brute force in band 3, this generate plenty of GUAs. Even using smallest GUAs only, the total number of GUAs goes up sharply and they must be kept to have benefit of the work done in the brute force.

The current frame has only one dimension for the FIFO tables. The dimension has been pushed to 64 and the "false" produced in band 3 are split in three groups {small; medium; big). Only the "small" are carried in the next steps.

The GUAs tables (pairs and triplets sockets) have been extended to a possible 2000 GUAs per table and the limit for the GUA vectors pushed to 20x64 = 1280 GUAs in a 2X2Y step.

These parameters work well for the 50 bad case file available in the post 2, More tests are needed to be sure to be in the right area.

EDIT 1 processors performance.

I am using 3 different processor, all 8 cores with 8 GO of ram.

Code: Select all
`processor performance indexAMD FX-8320E  7511I7 -2600      8214i7 -4770      9309`

This is far from the performance index of the best processor. Run times given here have been usually done on the best i7.
I made benchmarks with the program searching 17s and I have results well correlated with the performance index.
champagne
2017 Supporter

Posts: 5620
Joined: 02 August 2007
Location: France Brittany

Re: scan solution grids for 17 clues as of blue

coloin wrote:B-grids which have all bands 5 or less

With the B-grids - what about only taking 15 clues with a {555,555} pattern - and then adding two clues generally ?

Maybe this doesn't fit well in the model.
several microseconds for +2 x huge number of tests = ages.
From other hand, at this point the process has collected sufficient constraints to discard say 99.9999999% of them in few CPU cycles.
dobrichev
2016 Supporter

Posts: 1311
Joined: 24 May 2010

Re: scan solution grids for 17 clues as of blue

dobrichev wrote:
coloin wrote:

Maybe this doesn't fit well in the model.
several microseconds for +2 x huge number of tests = ages.
From other hand, at this point the process has collected sufficient constraints to discard say 99.9999999% of them in few CPU cycles.

I agree with you. When blue started with valid band1 x valid band 2, he made a very simple act nearly impossible to beat.

As I wrote in the comments, an optimized filter to clear 9…..% of the cases is a single “and” native instruction.

To beat that, an unseen property of a Sudoku has to be explored.

To be honest, the second key in blue’s process is the GUAs 2/3 intensive use and optimization.

This is something we (blue and me) discovered in the XY27 search, but he really did the best out of it.

BTW, I checked the ED bands 1+2 situation for bands of low rank, Band 9 (index 8) has more ED bands 12 than average. Reversely, the average run time in that area is better than I expected. So at the end, if I am not too bad in the batch control (and without power failure) , I should have results for the top 10 bands 1 when I am back from the week off (say October 16th)
champagne
2017 Supporter

Posts: 5620
Joined: 02 August 2007
Location: France Brittany

Re: scan solution grids for 17 clues as of blue

..... what about only taking 15 clues with a {555,555} pattern - and then adding two clues generally ?

Yes i take your point about it not fitting with the model - and i think it might be flawed as not all 6 clues which solve a band in a puzzle - can originate from a 5 clue which solves a band , plus 1 more clue added.

Code: Select all
`+---+---+---+|..5|...|...||28.|...|...||...|...|93.|+---+---+---+|123|456|789||764|298|513||598|137|624|+---+---+---+|657|823|491||931|745|268||842|961|375|+---+---+---+  this band 1 is solvable in 5 clues+---+---+---+|..5|.1.|..2||...|3..|1..||4..|...|...|+---+---+---+|123|456|789||764|298|513||598|137|624|+---+---+---+|657|823|491||931|745|268||842|961|375|+---+---+---+  but all 6 clues in the band 1 are necessary here [ same solution grid]`

another angle

search for18s {6,6,6} {6,6,6} and check for a 17
splitting it up into 3 different runs where there is maximum clues in a box of 2,3 or 4 clues

the 2 search would involve all boxes with 2 clues - has advantage that 1 pass only is needed
the 3 search would involve one or more bands being 321 or 330 { this would affect the perpendicular bands]
the 4 search would involve one or more bands being 420 or 411

maybe this fits with the model better ?

Code: Select all
`222222222  one way321213132321222123330123213   330033303    ways with max 3 in a box   [other isomorphic permutations too ]411132123   420123123   420132114   420222024 411141114    ways  with max 4 in a box510123033  510132024510141015510114042600033033 600042024600051015`
coloin

Posts: 1629
Joined: 05 May 2005

Re: scan solution grids for 17 clues as of blue

another angle

search for18s {6,6,6} {6,6,6} and check for a 17
splitting it up into 3 different runs where there is maximum clues in a box of 2,3 or 4 clues

the 2 search would involve all boxes with 2 clues - has advantage that 1 pass only is needed
the 3 search would involve one or more bands being 321 or 330 { this would affect the perpendicular bands]
the 4 search would involve one or more bands being 420 or 411

maybe this fits with the model better ?

Hi coloin,

The direct search of a 6,6,5 (in bands) is possible with the released program with minor changes.

The problem foreseen by blue with this case is that the number of valid bands 1+2 grows sharply and to use it’s wording, this could be the show stopper.

Reading your post, I am wondering whether with the last feature (the tailor made brute force and the carry over of the fresh GUAs ) the threat is still there. I could have to re think the option to have a separate pass2 with another entry file.

But this does not change another key option of the blue’s strategy: the sharing of the band 1+2 process with all attached bands 3. Here is a 10 factor (more in the pass 1) that has to be replaced by something else if another strategy is used.

Regarding the “add” strategy, I had long discussions with blue not so far from what you have in mind. I expected good results using only the set of valid minimal bands, then adding clues if necessary. Blue was so strongly convinced that this was not a good strategy that he had difficulty to just read properly what I proposed. When I compared the corresponding implementation with the optimized version of blue’s approach, I stopped the alternative way.

Just to say that any alternative solution has to be nearly implemented to say whether it has a chance or not to beat the released one.
champagne
2017 Supporter

Posts: 5620
Joined: 02 August 2007
Location: France Brittany

Re: scan solution grids for 17 clues as of blue

champagne wrote:Just to say that any alternative solution has to be nearly implemented to say whether it has a chance or not to beat the released one.

of course ...
splitting it up into sections though might well be faster .... and its better to get it sorted before hitting the button !

i think filling the bands with clues {5,5,5} case needs more thought - i re-evaluated this [ as you may well have done}
I think it only works when all the {3} bands each have 5 disjoint unavoidables - and that will be a very rare solution grid. [MCN-15]
The main thing to note about that search is that you have to know the MCN for each of the 416 individual bands - there has to be a clue in each disjoint unavoidable - but that is another add-on for mladin !

now getting back to yours and blue's {6,6,5} search - with each ED band1and2 extrapolate the 6+6 which solve the individual bands
remove the {6+6} which don't solve the complete band1band2
complete with 5 clues from band 3 {3 options per cell} {gangster} { these are known}
it looks like the ~ x10 improvement comes from the 416/44 ratio and effectively multiple grids are simultaneously searched - is that correct ?

I just wonder if with your method the max 2 clues in a box could be implemented .... and done rapidly ?
There is only one pass in this process....
reduce the options to test further by excluding any {6+6} which has more than 2 clues per box and only attempt to add the band3 {gangster} solutions which have 2 clues in each box, adding 6 clues.
this finds 18s with {6,6,6}

we could then find all 222222222 18s and therefore all 17s with 222222221 box distribution
coloin

Posts: 1629
Joined: 05 May 2005

Re: scan solution grids for 17 clues as of blue

i think filling the bands with clues {5,5,5} case needs more thought - i re-evaluated this [ as you may well have done}
I think it only works when all the {3} bands each have 5 disjoint unavoidables - and that will be a very rare solution grid. [MCN-15]
The main thing to note about that search is that you have to know the MCN for each of the 416 individual bands - there has to be a clue in each disjoint unavoidable - but that is another add-on for mladen !

I am not sure to have a clear view about your implementation of this. Following blue’s view, we already have all the valid bands with 5 clues and all minimal solutions with 6 clues. Here we don’t use the disjoints UAs (somehow it has been done building the valid solutions) BTW finding the MCN per band is not a problem and you even can think of finding higher degree UAs.

now getting back to yours and blue's {6,6,5} search - with each ED band1and2 extrapolate the 6+6 which solve the individual bands
remove the {6+6} which don't solve the complete band1band2
complete with 5 clues from band 3 {3 options per cell} {gangster} { these are known}
it looks like the ~ x10 improvement comes from the 416/44 ratio and effectively multiple grids are simultaneously searched - is that correct ?

Small adjustment, in blue’s approach, band 3 has always 6 clues. And you are right; the x10 is the average number of grids attached to a given band 1+2 in the pass2. In the pass1 it is around 17 bands 3 per ED band 1+2.
It’s difficult to tell what is the true effect of an average 10 band 3 per ED band 1+2 (in fact 9), but with the last code, each band 3 contributes to a better filtration of bands 1+2 to send to “part 2”

Code: Select all
`I just wonder if with your method the max 2 clues in a box could be implemented .... and done rapidly ?There is only one pass in this process....reduce the options to test further by excluding any {6+6} which has more than 2 clues per box and only attempt to add the band3 {gangster} solutions which have 2 clues in each box, adding 6 clues.this finds 18s with {6,6,6} we could then find all 222222222 18s and therefore all 17s with 222222221 box distribution`

A very simple way to go in that direction would be to store only valid bands having no box with more than 2 clues. So I would say “yes it is easy and it would go very fast”.
champagne
2017 Supporter

Posts: 5620
Joined: 02 August 2007
Location: France Brittany

Re: scan solution grids for 17 clues as of blue

Yes the 54-plus-n completions inherently uses the UAs in a band - and include minimal and non-minimals....

champagne wrote:... all possibilities to speed up the process limiting stacks to 6 clues are used.

so this would mean you are already optimizing the band 3 additions...
although maybe a gain in not searching

222
221
xxx

since

321
122
222

puzzle would still be found [?]

incidentally - i am not sure how you make the band1and2 {6,5} reductions - {50000,4000} is a big product !
surely it cant be quick enough to just - solve and discard if sol >1

choosing band 3 to have 6 clues
Pros
- the number of {6,6} is too large ?
Cons
- only one pass is needed at searching a {6,6,5}
- band 3 search is quicker with 5 clues

and just out of interest ...
for a single band1and2 approx how many {5,6} and {6,5} completions are there per band 3 gangster representative ? and how many valid {6,6}

how do you make sure that you consider every ED band in band 3 gangster - adding a GUA to band 3 excludes some ED bands. Is this your worry ?
coloin

Posts: 1629
Joined: 05 May 2005

Re: scan solution grids for 17 clues as of blue

Hi coloin,

I am travelling with irregular links to the net. Some quick answers

champagne wrote:... all possibilities to speed up the process limiting stacks to 6 clues are used.
so this would mean you are already optimizing the band 3 additions...
although maybe a gain in not searching

222
221
xxx

since

321
122
222

puzzle would still be found [?]

In your example, one box has more than 2 clues. This is assumed filtered in my answer

incidentally - I am not sure how you make the band1and2 {6,5} reductions - {50000,4000} is a big product !
surely it cant be quick enough to just - solve and discard if sol >1

I answer as if you were here in the general search problem. My rough estimate is an average number of XY over 1 billion as written in the comments. The first reduction is done using UAs located in bands 1+2 and more precisely bit field vectors already settled for each band. This is explained with many details in the comments and a simple “and” on 2 64 bits fields already clear more than 90% of the {valid band1} x {valid band2} XY

choosing band 3 to have 6 clues
Pros
- the number of {6,6} is too large ?
Cons
- only one pass is needed at searching a {6,6,5}
- band 3 search is quicker with 5 clues

In the 6 6 5 (5 clues in band3) the threat is not only the bigger number of XY but more the intrinsic bigger number of finally valid Bands 1+2 (toward the double gangster). As you wrote, the brute force remains a costly process, especially when the solution is unique. I am hesitating to-day because the filter on GUAs is now so efficient that, applied just after the previous one, it can save most of the calls to the uniqueness check for bands 1+2. This can make the 665 search better than a 656 search with another entry file.

and just out of interest ...
for a single band1and2 approx how many {5,6} and {6,5} completions are there per band 3 gangster representative ? and how many valid {6,6}

I keep this for later but just a remark. Having a given ED band 1+2, the gangster in band 3 is unique. The number of attached bands 3 vary.

how do you make sure that you consider every ED band in band 3 gangster - adding a GUA to band 3 excludes some ED bands. Is this your worry ?

Again, I have some problems with the wording. The entry is an ED band 1+2. The gangster in band 3 is just the consequence of missing digits in each column, and the attached bands 3 are all possible fills of the band 3 gangster filtered to avoid redundancy (the total number of bands 3 is here equal to the number of ED solution grids).

In this context, a band 3 is considered if the band 1+2 has been checked valid and if known GUAs did not pass the limits (band 3 number of clues or stack limit).

Then, a brute force for the entire grid can produce a new UA with a corresponding GUA.

And yes, such GUAs added to known ones can filter other ED bands 1+2.

Is this the answer to you point ?? If yes, it’s not a worry, it’s a chance to find so many such GUAs
champagne
2017 Supporter

Posts: 5620
Joined: 02 August 2007
Location: France Brittany

Re: scan solution grids for 17 clues as of blue

The results of the test run during my week off are there.

First of all, I identified a bug in the code but the test was started before I could fix it, so many batches failed on the short in one table dimension.
I have seen another small bug, affecting the performance but not the accuracy of the search
And I have still a small deviation to investigate between 2 versions of the code.

I got in the batches 15 17 clues puzzles, all known.

I should here restart the test with the fixed code, but I'll first try several things :

Explain the deviation between the 2 codes (and fix the code to have the same print)
Open the code to the case with 5 clues in band 3
Create an interim activation of vectors on the "more UAs GUAs" at the end of a 656; 566 case.

The last "to do" is expected to have a significant effect on the performance, especially if the 665 case is done here,in the first steps 2X2Y

All this could take one full week.
champagne
2017 Supporter

Posts: 5620
Joined: 02 August 2007
Location: France Brittany

Re: scan solution grids for 17 clues as of blue

I updated the zip file after bugs fix and small improvements

The documentation has not changed since the previous post. (URL in the first post)

The zip file contains

. the source code in the directory src
. the project definition used on the platform microsoft visual studio
. sk17p2a.exe executable code under windows
. _zbad50.txt a list of 20 puzzles giving very bad run times in pass 2.

The _zbad50.txt file is a list of entries with some weeks ago a run time over 50 seconds and now an average run time of 5.4 seconds per entry on my PC.
The _zbad50 file is also available in the post 2 of this thread.to have in mind the output of that phase.

Next step is to include the 665 pass (5 clues in band 3) and to switch to the vectors mode for the "more" at each of the 566 656 666 cases

The current code has not been checked with known 17 clues. This will be done after he next change
champagne
2017 Supporter

Posts: 5620
Joined: 02 August 2007
Location: France Brittany

Previous