The New Sudoku Players' Forum

by **champagne** » Sun Jun 26, 2016 3:37 pm

I open that thread on an old topic: The search of 17 clues in the catalog.

mladen Dobrichev recently published the negative result of the depth 3 vicinity exploration, this is grate, but can not answer to the final question : do we have the entire list of 17.
I made partial exploration in X+Y+27 mode (the 27+6+3 is ongoing) but with no reasonable chance to cover the field using the existing process.
Here, we are back on an attempt to scan the catalog with a new process.

The corresponding work is ongoing and the feasibility of the process proposed has still to be proven.
But the first test show enough potential to share some ideas and, why not, to have some contributions from other members of the forum.

The subject has already discussed in private with mladen Dobrichev (and Blue, although not recently).
I start with a document made of several web pages describing the basic ideas
.
That document starts here.

Next steps will be to publish the results of the first test, what should come within the next 2 or 3 weeks,
then the results of a feasibility study made on 1/10^6 of the catalog, and if the feasibility study is convincing, a test made on 1/1000 of the catalog.

I lock several posts at the start to have them available before any discussion post (in case)

by **champagne** » Sun Jun 26, 2016 3:37 pm

locked

by **champagne** » Sun Jun 26, 2016 3:37 pm

locked

by **champagne** » Sun Jun 26, 2016 3:38 pm

locked

by **dobrichev** » Thu Jun 30, 2016 8:50 pm

Hi Gerard,

Reading your document I found too many open issues. And the eventual success of this approach significantly depends on them.
To me it is obvious that more UA sets are required for efficient scanning, more than those of size 12 used for 16-clue search. I have some ideas for extremely fast UA collection, maybe similar to McGuire's process up to some level.

Concerning counting the bits in a bitfield, there is machine instruction popcount (population count).

Do you have statistics how the existing 17s deviate from the minimal number of clues required for band completion for the band with highest minimal clues (the criterion which you used to reorder the catalogue)?
Such statistics is easy to collect, and if for half of the 17s the number of clues match, then this order can help. If almost all of the 17s have "redundant" clues in this band I see no reason to prefer it as a startpoint.
Or, at least, correlation between the preferred band and the band with maximal clues in the 17s can be investigated.

Also it is still unclear how the loops should be arranged so that as much as possible of the collected data remain reusable for the next grids. Isn't it?

The cell selection strategy is somehow controversial. What is the benefit of faster reduction of the unhit UA list against the opposite - leaving as much as possible unhit UA on order impossibility of covering all of them to be detected earlier? Gridchecker uses the second strategy in some indirect form.

I will be happy to see progress in this complex project and hope you will receive support from the community.

by **champagne** » Fri Jul 01, 2016 6:51 am

Hi mladen,

I see key questions in your post, and minor point.
Let me answer here for short points, in a separate post for question referring to the basic strategy, and in a third post to questions where the strategy as I see it, play a role in the answer.

dobrichev wrote:Concerning counting the bits in a bitfield, there is machine instruction popcount (population count).

All intrinsic instruction can be added and I use __popcnt64 in the code, I did not mention it in the document for two reasons
- in the current code, the size of UAs does not play a key role, contrary to the bitscanforward instruction,
- if the strategy applied works well (priority to cells hitting more UAs), I could just kill the size information. In most cases, the only required information is "is there still active cells in that UA"

dobrichev wrote:Do you have statistics how the existing 17s deviate from the minimal number of clues required for band completion
for the band with highest minimal clues (the criterion which you used to reorder the catalogue)?
Such statistics is easy to collect, and if for half of the 17s the number of clues match, then this order can help.

The answer is "no" although, as you say, it would not be too hard to produce it.
The only statistic I have (chapter B.1) is the distribution of the minimum number of given in the band 3 for morphed known 17.

In the way I see the problem, such a statistic would not be relevant

by **champagne** » Fri Jul 01, 2016 6:54 am

As a preliminary answer to mladen remarks, I feel necessary to summarize the project in that way

We start from a solution grid

Ux is the exhaustive list of UAs located in bands 1+2
Uy is the exhaustive list of UAs located in band 3
Uxy is the exhaustive list of UAs not in Ux and not in Uy (touching both band 3 and bands 1+2)

Ny is the minimum number of clues required to have a valid solution in band 3 (solving G3)

a valid 17 solution has the following properties

All UAs in Ux are hit with a set of given of maximum size Nx=17-Ny
All UAs in Uxy are hit with Nx2;Ny2 given with a new constraint Y=(Ny2>=Ny)?0:(Ny-Ny2)
the maximum size for Nx2 is {17 - Y - Ny2}
All UAs in Uy are hit

Assuming the problem solved in that order, The first step is a pure 64 bits problem.
The target is to find SG=all sets of given of size 17-Ny giving a minimal and unique solution for the bands 1+2
The best strategy to do it is an open question, and one could think of using a reduced version of the McGary program.
Here, the current strategy applied is to kill in priority as many UAs as possible

The SG size is not dependent of the strategy used to find them, it is a property of the bands 1+2.
The validation test should show that different strategies give the same SG.
This my current task, having also to check that all "given sets" are minimal.

The SG is valid for any band 3 having the same [bands 1+2}. Unhappily, the 2 next steps are specific, at least partially to each band3.
But as we have in average 7 bands 3 for the same bands 1+2, this is an optimization issue to study in due time.

by **champagne** » Fri Jul 01, 2016 7:01 am

And now, detailed answers to mladen

dobrichev wrote:
To me it is obvious that more UA sets are required for efficient scanning, more than those of size 12 used for 16-clue search.
I have some ideas for extremely fast UA collection, maybe similar to McGuire's process up to some level.

I share your views, although this is not a clear point in McGuire's article.
I did not work that much on the optimization of the data collection, I use the brute force when the list of UAs is exhausted, which is a reasonable start in terms of performances.
The code itself has to be improved if the number of calls is too big.

One problem is that the numbers of UAs of size>12 grows very fast, so there is here an optimization issue:

more UAs added-> more work to optimize the cells priority (in that process)
less UAs added -> more calls later

Another optimization factor is "how far upstream do you enter the added UAs"

One thing I noticed in the test is that sometimes(late in the process), one of the added UA can be "dead", killing the branch.

dobrichev wrote:
Do you have statistics how the existing 17s deviate from the minimal number of clues required for band completion
for the band with highest minimal clues (the criterion which you used to reorder the catalogue)?
Such statistics is easy to collect, and if for half of the 17s the number of clues match, then this order can help.
If almost all of the 17s have "redundant" clues in this band I see no reason to prefer it as a startpoint.
Or, at least, correlation between the preferred band and the band with maximal clues in the 17s can be investigated.

Just another answer to that to explain why I think that this is nor relevant.
The true question IMO, considering the planned 3 steps strategy, is "how many sets of given" solving bands 1+2 wild be below the limit.
I have a provisional answer out of my ongoing test.

8% of SG(sets of given) require one more clue than the minimum
2% require 2 more clues

the rest is peanut (3 and even 4more clues)
in that example, the 17 is part of the 8%.

dobrichev wrote:
Also it is still unclear how the loops should be arranged so that as much as possible of the collected data remain reusable for the next grids. Isn't it?

IMO, this is clearly how your call steps 2 and 3 for each solution of {Bands 1+2}
It is too early for me to enter in that task, a preliminary feasibility is to have some timing on the phase 1, the production of valid bands 1+2.
A grid producing a 17 is generally a worst case, I intend to apply as soon as possible the phase 1 to the sample 1/10^6 of the catalog.

dobrichev wrote:

The cell selection strategy is somehow controversial. What is the benefit of faster reduction of the unhit UA list against the opposite
- leaving as much as possible unhit UA on order impossibility of covering all of them to be detected earlier?

This is an open point. In my case, the expected bonus are the following

- shrinking the table is faster. Just a remark : having a 64 bits description of a UA, copying UAs still active in the new table is likely the faster way to go ahead. Revising the cells order is another question and could be changed.
- having more dead cells, the early filter to cancel a branch should work better. This is an alternative view to the Higher degree UAs in the McGuire process
- Going fast to an empty list of UA's, we should have (again in the strategy followed here) less implementations of "look for more UAs"

by **dobrichev** » Fri Jul 01, 2016 7:24 am

Thank you for the explanations. They answered my questions.

champagne wrote:One problem is that the numbers of UAs of size>12 grows very fast, so there is here an optimization issue...

This is not quite true. Actually, the complete number of UA grows, but we can choose to use only these larger UA which are more common and/or are sufficiently easy to identify. This is a key point.

by **champagne** » Fri Jul 01, 2016 12:36 pm

dobrichev wrote:Thank you for the explanations. They answered my questions.

champagne wrote:One problem is that the numbers of UAs of size>12 grows very fast, so there is here an optimization issue...

This is not quite true. Actually, the complete number of UA grows, but we can choose to use only these larger UA which are more common and/or are sufficiently easy to identify. This is a key point.

I am not sure that I catch your point which are more common and/or are sufficiently easy to identify.

Currently, I add UAs in about 2 steps:
in a first step, in the limit of N UAs, UAs of size <= X (say X=24 for example) when the primary list of UAs is exhausted
In a second step, all remaining UAs in the local context, normally, around the limit for a 17 search (more than 10 given).

IMO, checking that the solution for bands 1+2 is unique is compulsory, unless you have earlier loaded the entire set of UAs in the local context (after about 10 given have been selected). But the following remark let me think that we can accept a very late final addition of UAs.

I also checked that the combination of cells does not guaranty that the set of given is minimal. Clearing sets not minimal has 2 qualities:

it is likely quicker than looking for more given, knowing that in most cases it can not be the final solution
Such a set will anyway be redundant with the minimal set

But that point has to be checked carefully.

by **dobrichev** » Fri Jul 01, 2016 9:47 pm

several ua of size 13 with only 3 different digits exist. this means that each such ua exists in one or more well known 3-templates. each grid consist of 84 3-templates which are easy to identify. then predefined for each of these templates ua can be easily remapped to the grid. all this doesnt require even a single solver call. the same is applicable for 4-templates, 126 in a grid, but the essentially different 4-templates are too much. nevertheless for some of them the short ua can be included in a precalculated catalog.
the same is applicable for any other grid subsets that are easy to transform to a canonical form - the bands, any combination of 3 or 4 boxes, etc.
again, it isnt necessary to build full catalog. partial catalog will work for its elements. which subsets and which ua for a particular subset shoud be included, and the choice where to stop collection - after full catalog scan or after some amount of ua is accumulated, is matter of experimenting.
additional potential exists in the generation of the composite ua - those consisting of 2 or more disjoint minimal ua. the ua obtained from disjoint grid subsets are disjoint. for example examining 2-templates for digit pairs (1,2) and (3,4) will give only disjoint ua because the two templates dont intersect and all 2-digit ua within the same template are disjoint too.

by **champagne** » Sat Jul 02, 2016 6:48 am

Thanks mladen, I see now what you mean.

Your remark about higher degree UAs is somehow off topic in the process I am testing, but could be re used later.

Regarding the family of UAs you describe, I don't see clearly yet how you would produce them. In the McGary process, you just apply blue prints to the grid, here you seem to have in mind to do it by another way, piling experience.

Some practical things, in the scope of my current research,I am focusing on the bands 1+2 of the appropriate morph of one solution grid.

At the start, we have

the minimal set of UAs solving separately band 1 and band 2
all UAs (or nearly all) of size <=12 covering band 1+band 2

in my test 17, this is 119 UAs.

I have no doubt that adding at the start the missing smallest UAs up to say 256 UAs would be a bonus and change the strategy to act later in a branch.
So my question would be : do you have the adequate process to do so without reference to any other solution (knowing that this can be done for an average 7 solution grids having the same Bands 1+2)

by **champagne** » Tue Jul 12, 2016 7:19 am

as explained here I change the start to use more UAs.

My current test is done with the 2560 smallest UAs for bands 1+2 having 6 or less digits.
This change completely the critical path. The tested process is derived from some techniques used in the Mc Gary program for higher degree UAs.

The UAs table is shown in vector mode (one vector per cell where bits are set to 1 if the corresponding ua contains the cell).

After n1 steps (currently 4) the UAs table is reduced and the vectors rebuilt consequently.

When the number of still valid UAs is <=64, the table is reduced again for the final step.

The main point, out of my tests, is that now when a set of given appears, it is a valid solution for bands 1 + 2 for nearly any set.

As the reduction of the table is costly, I am back to a priority of small UAs to have a limited number of such reductions.

So far, I did not consider higher degree AUs

I am looking for an optimization of the parameters in 64 bits mode, keeping aside a switch to the 128 bits registers later.

The New Sudoku Players' Forum

16_17 clues search in the catalog

16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog

Re: 16_17 clues search in the catalog