The New Sudoku Players' Forum

by **blue** » Wed Oct 19, 2016 12:30 pm

champagne wrote:PS the number of UAs of any size with no subset is very big in a solution grid. I suspect several thousands in average, but may be somebody have a better approach of the count.

Here are some "average # per grid" estimates (with error estimates), from long ago.
The "(known) grids with 17s" list, was slightly smaller, at the time.

Code: Select all: UA size | all grids | grids with 17s --------+--------------------+------------------- 4 | 1.1580e+1 ( 0.04%) | 7.1160e+0 ( 0.03%) 6 | 1.5338e+1 ( 0.06%) | 1.4746e+1 ( 0.03%) 8 | 2.2038e+1 ( 0.07%) | 2.2646e+1 ( 0.04%) 9 | 6.0455e+0 ( 0.16%) | 7.2594e+0 ( 0.08%) 10 | 5.8380e+1 ( 0.08%) | 6.7124e+1 ( 0.04%) 11 | 2.9499e+1 ( 0.12%) | 3.8499e+1 ( 0.06%) 12 | 1.3667e+2 ( 0.09%) | 1.6819e+2 ( 0.05%) 13 | 1.0483e+2 ( 0.11%) | 1.3118e+2 ( 0.06%) 14 | 3.6172e+2 ( 0.09%) | 4.4636e+2 ( 0.05%) 15 | 3.9133e+2 ( 0.11%) | 5.0261e+2 ( 0.05%) 16 | 1.0498e+3 ( 0.10%) | 1.3534e+3 ( 0.05%) 17 | 1.4800e+3 ( 0.10%) | 1.9305e+3 ( 0.05%) 18 | 3.4658e+3 ( 0.10%) | 4.5559e+3 ( 0.05%) 19 | 5.5821e+3 ( 0.10%) | 7.4088e+3 ( 0.05%) 20 | 1.1778e+4 ( 0.10%) | 1.5759e+4 ( 0.05%) 21 | 2.0773e+4 ( 0.11%) | 2.8016e+4 ( 0.05%) 22 | 4.1781e+4 ( 0.11%) | 5.6895e+4 ( 0.06%) 23 | 7.7262e+4 ( 0.11%) | 1.0601e+5 ( 0.06%) 24 | 1.5191e+5 ( 0.12%) | 2.1048e+5 ( 0.06%) 25 | 2.8748e+5 ( 0.12%) | 4.0185e+5 ( 0.06%) 26 | 5.5831e+5 ( 0.13%) | 7.8740e+5 ( 0.07%) 27 | 1.0649e+6 ( 0.13%) | 1.5137e+6 ( 0.07%) 28 | 2.0469e+6 ( 0.14%) | 2.9377e+6 ( 0.07%) 29 | 3.8959e+6 ( 0.15%) | 5.6359e+6 ( 0.08%) 30 | 7.3881e+6 ( 0.16%) | 1.0791e+7 ( 0.08%) 31 | 1.3891e+7 ( 0.17%) | 2.0447e+7 ( 0.09%) 32 | 2.5872e+7 ( 0.18%) | 3.8398e+7 ( 0.09%) 33 | 4.7497e+7 ( 0.19%) | 7.1136e+7 ( 0.10%) 34 | 8.5945e+7 ( 0.21%) | 1.2982e+8 ( 0.11%) 35 | 1.5226e+8 ( 0.23%) | 2.3212e+8 ( 0.12%) 36 | 2.6388e+8 ( 0.25%) | 4.0553e+8 ( 0.13%) 37 | 4.4514e+8 ( 0.28%) | 6.9067e+8 ( 0.14%) 38 | 7.2814e+8 ( 0.32%) | 1.1396e+9 ( 0.16%) 39 | 1.1460e+9 ( 0.37%) | 1.8111e+9 ( 0.18%) 40 | 1.7355e+9 ( 0.43%) | 2.7683e+9 ( 0.21%) 41 | 2.5057e+9 ( 0.53%) | 4.0162e+9 ( 0.25%) 42 | 3.4120e+9 ( 0.68%) | 5.5462e+9 ( 0.32%) 43 | 4.3680e+9 ( 0.94%) | 7.1899e+9 ( 0.43%) 44 | 5.2410e+9 ( 1.36%) | 8.6178e+9 ( 0.62%) 45 | 5.8689e+9 ( 2.19%) | 9.8323e+9 ( 0.97%) 46 | 5.8592e+9 ( 3.86%) | 9.8494e+9 ( 1.70%) 47 | 5.2146e+9 ( 7.47%) | 8.7982e+9 ( 3.34%) 48 | 3.9547e+9 (17.54%) | 7.3022e+9 ( 7.37%) 49 | 3.3898e+9 (40.82%) | 6.0837e+9 (17.76%)

The grids with 17's, have fewer UA4's and UA6's than average.
The situation is reversed, though, for UA8's and larger ... through size 49, at least.

by **champagne** » Wed Oct 19, 2016 2:19 pm

Hi blue,

nice to see you back. Your data are far above my expectations, although I was sure to be on the safe side with "thousands".

Just a question

Here are some "average # per grid" estimates (with error estimates), from long ago.
The "(known) grids with 17s" list, was slightly smaller, at the time.

Did you in that task filter the UAs having a subset? Seen the huge number of UAs of size 49, I suspect that this is without any filter.

by **blue** » Wed Oct 19, 2016 4:47 pm

Hi champagne,

Did you in that task filter the UAs having a subset? Seen the huge number of UAs of size 49, I suspect that this is without any filter.

Yes I did, if by that you mean that it only counted "minimal UA's": UA's having no proper subset that is also UA.
I remember being amazed by the numbers, too.

The code used a method that Red Ed invented, for estimating the average #'s of minimal puzzles of a given size.
It worked like this:

Choose a random grid and a size S subset of that grid.
Find all minimal UA's in that subset, and tally the results by size.
Do (1) and (2), some large number of times ... N.
Divide the final counts by N, and then multiply them by a size-dependent scale factor, (1 / P), where P is the probablity that a (UA) set of that size would be a subset of a randomly chosen subgrid of size S.

For the "all grids" column, N was 10,320,000 and S was 55.
For the size 49 entry, only 28 UA's of that size, were actually found.
For sizes 45 & 46, where the peak is, it was 13598 and 3771 sets found, respectively.

by **dobrichev** » Wed Oct 19, 2016 9:19 pm

champagne wrote:
coloin wrote:Indeed ... but I am thinking more like if there are more UAs - then there is more chance of a 17 [paradoxically]

I have no evidence of that...

I see no paradox. Directly from definition of UA, it is the difference between the grid in question and another valid grid. So, every solution grid has exactly 6,670,903,752,021,072,936,959 minimal + non-minimal UA sets. Each small-sized UA makes many larger ones non-minimal, being a subset of them. Assuming the majority of 17-clue puzzles are already discovered :lol:

it was proven that their solution grids trend to have less number of UA4 and UA6 (click). Ignoring the higher order effects (how large UA sets invalidate larger ones) we can assume that the grids with small number of short UA sets have large total number of minimal UA sets.

So far the bands don't help in discovery of new 17s.

A different approach is to partition grids in other dimension, checking whether N-digit UA sets regardless of size can help. Entire set of 2, 3, and 4-digit UA sets can be generated analysing N-digit templates, see http://forum.enjoysudoku.com/post211112.html#p211112 which suggests 2648603 subgrids for 4-digit UA analysis and eventual remapping to 119503485 4-templates. Statistics for 17 vs all solution grids can be collected even w/o UA generation, on the rookery/template basis.

by **dobrichev** » Sun Oct 30, 2016 7:36 pm

Hi Blue,

blue wrote:Here are some "average # per grid" estimates (with error estimates), from long ago.

Could you do another run, this time counting the number of different values involved in the UA and respectively adding 8 columns with counters for 2 to 9 values respectively.
Of practical interest are UA with up to 20 cells and their distribution for up to 5 different values, so you can save time by generating less grids/UA than in previous exercise.
I think an exhaustive catalogue with 2 to 4-values UA is achievable and for particular grid all UA from the catalogue could be found fairly fast.
Another option is somehow to generate partial catalogue for short UA with 5 values.
This way testing a grid for 17-given puzzle could start with few thousands of UA for hitting, all obtained in fast and deterministic way.

by **champagne** » Mon Oct 31, 2016 6:30 am

dobrichev wrote:Hi Blue,
blue wrote:Here are some "average # per grid" estimates (with error estimates), from long ago.

Could you do another run, this time counting the number of different values involved in the UA and respectively adding 8 columns with counters for 2 to 9 values respectively.
Of practical interest are UA with up to 20 cells and their distribution for up to 5 different values, so you can save time by generating less grids/UA than in previous exercise.
I think an exhaustive catalogue with 2 to 4-values UA is achievable and for particular grid all UA from the catalogue could be found fairly fast.
Another option is somehow to generate partial catalogue for short UA with 5 values.
This way testing a grid for 17-given puzzle could start with few thousands of UA for hitting, all obtained in fast and deterministic way.

Hi mladen,

I see that we are looking for similar new ways to search 17s.

I assume that "values" means digits 1 to 9.

Looking at the table prepared by blue, I was thinking of a cut off at the size 18, pushing to six possible digits. I am drafting a special brute force with such limitations to see if it is possible in some milliseconds (accepting if necessary missing UAs and/or not seen subsets).

Next is how to do the search. Here again, I have something in mind to try, using the bands properties.

by **dobrichev** » Tue Nov 01, 2016 4:31 pm

Yes, values == digits from 1 to 9.
I am waiting for good news from you when ready.

by **champagne** » Tue Nov 01, 2016 8:45 pm

dobrichev wrote:Yes, values == digits from 1 to 9.
I am waiting for good news from you when ready.

May be some indications after the first tests.

Finding the full mc Garry set of UAs (<=12 cells) takes about 15 ms in my test (~280 UAs for each of the 2 puzzles)

Findings all "up to 4 digits" UAs takes about 7 ms for the same puzzles and gives around 700 UAs of size <= 18.

Looking for "5 digits" UAs is already much more expensive with the process I tested (more than 100 ms) and, compared to the Mc Garry set, we are still missing some 6 digits UAs of size 12. Depending on the shortcuts used, the number of UAs was in the range 1200_3500.

The test I made earlier did not pass the 3 digits UAs in addition to the Mc Garry set, and the total number of UAs was in the range 500-768 UAs. This was enough to influence significantly the process.

Improving the process for 5 digits can be done. Contrary to my previous statement, I think that the 6 digits analysis is of null interest. Using the Mc Garyy generation for the size 12 will be better. (normally, the 4 digits produces all UAs of smaller size).

The next test will be done using the 4 digits generation as a basis and adding the McGarry missing Uas of size >10

by **blue** » Tue Nov 01, 2016 9:31 pm

dobrichev wrote:Hi Blue,
blue wrote:Here are some "average # per grid" estimates (with error estimates), from long ago.

Could you do another run, this time counting the number of different values involved in the UA and respectively adding 8 columns with counters for 2 to 9 values respectively.
Of practical interest are UA with up to 20 cells and their distribution for up to 5 different values, so you can save time by generating less grids/UA than in previous exercise.

Here's the output from a short run.
The numbers aren't very precise, but they're probably OK for your interests (?)
[ The % numbers are +/- 95% confidence intervals. ]

Code: Select all: sz total 2d 3d 4d 5d ------------------------------------------------------------------------------------ 4 11.58( 0.02%) 11.58( 0.02%) 6 15.34( 0.04%) 6.72( 0.06%) 8.62( 0.05%) 8 22.03( 0.09%) 4.07( 0.20%) 9.11( 0.14%) 8.85( 0.14%) 9 6.06( 0.28%) 2.38( 0.44%) 3.68( 0.36%) 10 58.32( 0.16%) 3.52( 0.58%) 11.34( 0.34%) 26.86( 0.23%) 16.61( 0.29%) 11 29.50( 0.36%) 4.19( 0.94%) 13.80( 0.53%) 11.51( 0.57%) 12 136.74( 0.30%) 3.65( 1.67%) 13.18( 0.91%) 44.82( 0.50%) 55.22( 0.46%) 13 104.85( 0.57%) 7.44( 2.11%) 31.69( 1.03%) 45.33( 0.87%) 14 362.55( 0.55%) 5.99( 4.04%) 18.15( 2.37%) 78.87( 1.15%) 138.83( 0.88%) 15 387.98( 0.94%) 12.63( 5.16%) 75.79( 2.11%) 153.75( 1.48%) 16 1048.25( 1.05%) 21.87( 7.02%) 144.45( 2.77%) 362.92( 1.75%) 17 1483.89( 1.64%) 16.89(14.94%) 163.34( 4.81%) 505.76( 2.77%) 18 3417.05( 2.03%) 18.45(26.35%) 42.61(18.04%) 263.03( 7.17%) 958.85( 3.76%) 19 5596.17( 3.05%) 27.35(41.72%) 293.42(12.92%) 1380.08( 6.09%) 20 12010.91( 4.13%) 81.90(52.81%) 525.15(19.61%) 2168.03( 9.62%) ------------------------------------------------------------------------------------ sz 6d 7d 8d 9d ------------------------------------------------------------------- 4 6 8 9 10 11 12 19.88( 0.76%) 13 20.37( 1.29%) 14 94.31( 1.07%) 26.42( 2.02%) 15 115.44( 1.72%) 30.37( 3.34%) 16 355.64( 1.79%) 145.91( 2.80%) 17.46( 7.94%) 17 536.62( 2.68%) 230.42( 4.11%) 30.86(10.99%) 18 1248.05( 3.33%) 717.30( 4.37%) 152.32( 9.46%) 16.44(27.90%) 19 2198.19( 4.84%) 1351.49( 6.08%) 329.48(12.79%) 16.16(54.22%) 20 4634.78( 6.51%) 3550.76( 7.47%) 929.85(14.25%) 120.45(40.72%) -------------------------------------------------------------------

by **champagne** » Wed Nov 02, 2016 6:40 am

blue wrote:The numbers aren't very precise, but they're probably OK for your interests (?)

Hi Blue,

I let mladen answer for its own interest. On my side, the trade off is the following :

In the search of low clues puzzles in a grid, you suffer a high penalty when you get short in UAs. (no more UA to hit in your table);

Adding UAs at the start and processing them in the same way as Mc Garry processed UAs of higher degree reduces the count for such situations.

So the target is not to add all UAs of size <= "x" but to select UAs not to costly to add having at the end a positive effect on the process.

From the test I run, I am thinking of a relevant table in the range 1000-2000 UAs

by **coloin** » Wed Nov 02, 2016 11:35 am

Interesting ....
Is there value in distinguishing UAs which envolve all 9 boxes
depending on methods .... UAs whch just envolve 6 boxes / 2 bands might be less relevant ...

by **champagne** » Thu Nov 03, 2016 7:03 am

coloin wrote:Interesting ....
depending on methods .... UAs whch just envolve 6 boxes / 2 bands might be less relevant ...

Hi coloin,

depending on methods is the right word.
I start from scratch (but with many pieces of code) a new test on that solution grid

Code: Select all: 123456789456789123789123456214368597867945231935217864348672915591834672672591348 ......7.........23.891.....2..........79......3.....64..8...91.....34............ existing 17

I intend to do the following

solve bands 1+2
then solve stacks 1+2
then ??? depending on the results, but possibly
solve band 3
solve stack 3
solve the full grid.

here, the band 3 is the band with a possibility of 2 clues, but we know that for a 17, the minimum is 3 clues,
so the solution for bands 1+2 can not have more than 17-3= 14 clues.
Generally speaking, the maximum is in the range 17-3=14 to 17-6=11 (morphing the solution grid, we can always find a 4 clues minimum)

Applying Mc Garry blue prints, I find 277 UAs of size <= 12.
adding all 2-4 digits UAs of size <=18 and 5 digits UAs of size <=18 for bands 12, I find 879 UAs with the following split

Code: Select all: Mc Garry <=12 prepared set Bands 12 53 139 stacks 12 42 67 (not in bands 12) band 3 18 18 (not in bands 12 "could use the full list") stack 3 11 11 (id) others 153 644 box9 not in band/stack 3 277 879

That task was done in 13 milliseconds, giving room for more UAs if necessary (and for optimization of the solution grid through morphing)
In that case, the deeper search is done for bands 1-2
And as you can see, most of the UAs have cells in box 9 + {boxes 1245}, but this is likely specific to that grid

by **coloin** » Fri Nov 04, 2016 8:02 pm

champagne wrote:That task was done in 13 milliseconds,]

Sounds promising
However ... I realized that my comment on a UA envolving 9 boxes , trivially needs minimum 18 clues ...and these are 2-clue UAs
Maybe UAs from a specific 6 box patterns would focus things eg B124689 x6 ?

I dont think there can be a UA which solely comes from B12459 !!!!

C

by **champagne** » Sat Nov 05, 2016 5:47 am

Hi coloin,

some data that could help for your topic

champagne wrote:And as you can see, most of the UAs have cells in box 9 + {boxes 1245}, but this is likely specific to that grid

This does not seem correct. I made a quick test to see if morphing the puzzle I could reduce significantly the count, Nothing came. The reason is likely (partially) in the table here below where you find the distribution of the UAs of my test solution grid per number of boxes hit.

Code: Select all: 2 19 3 93 4 111 5 199 6 215 7 159 8 48 9 35

I made a second analysis summarized in that table

Code: Select all: 0 436 217 653 1 457 203 660 2 483 187 670 3 523 157 680 4 519 155 674 5 561 132 693 6 524 104 628 7 545 86 631 8 571 73 644

Here, the last item is the split of the 644 "others" in a better way

first column is the box index (assuming the grid morphed to have it as box9)
the second item is the count of UAs of that lot hitting box9
the third column is the count of UAs hitting both band 3 and stack 3, but not box9

The last column is just the sum of columns 2 and 3, giving the expected count for the lot previously called "others".

Morphing would have a small effect on that side, but could be of interest on 2 points:

Having a higher minimum of clues for band 3
Having a better lot of UAs to solve bands 1+2 (still to investigate)

by **dobrichev** » Sun Nov 06, 2016 9:11 am

blue wrote:Here's the output from a short run.

Thank you, Blue!
Below is my reading of your results.
If you choose to generate UA of size up to M cells and N different digits, then on average grid you will work with the given below number of UA sets.
The percentage is the obtained UA compared to all UA of size below or equal to the given number of cells. I.e. the difference comes from the UA sets of same or less size but with more digits involved.

Code: Select all: cells\digits Total <=2 <=3 <=4 <=5 <=6 <=7 <=8 <=4 12 12 (100%) <=6 27 18 (68%) 27 (100%) <=8 49 22 (46%) 40 (82%) 49 (100%) <=9 55 22 (41%) 42 (77%) 55 (100%) <=10 113 26 (23%) 57 (51%) 97 (85%) 113 (100%) <=11 143 26 (18%) 62 (43%) 115 (80%) 143 (100%) <=12 280 30 (11%) 78 (28%) 176 (63%) 260 (93%) 280 (100%) <=13 384 30 (8%) 86 (22%) 216 (56%) 344 (90%) 384 (100%) <=14 747 36 (5%) 110 (15%) 319 (43%) 586 (78%) 721 (96%) 747 (100%) <=15 1135 36 (3%) 123 (11%) 407 (36%) 828 (73%) 1078 (95%) 1135 (100%) <=16 2183 36 (2%) 144 (7%) 573 (26%) 1357 (62%) 1963 (90%) 2166 (99%) 2183 (100%) <=17 3667 36 (1%) 161 (4%) 753 (21%) 2043 (56%) 3186 (87%) 3619 (99%) 3667 (100%) <=18 7084 54 (1%) 222 (3%) 1078 (15%) 3326 (47%) 5717 (81%) 6867 (97%) 7068 (100%) <=19 12680 54 (0%) 250 (2%) 1398 (11%) 5027 (40%) 9616 (76%) 12118 (96%) 12648 (100%) <=20 24691 54 (0%) 332 (1%) 2005 (8%) 7802 (32%) 17026 (69%) 23078 (93%) 24538 (99%)

To be closer to the reality in the context of low-clue puzzle generation, one should weight the UA depending of their size. For example, working with 1078 UA having cells <=18 and digits <= 4, you work with only 15% of all UA of size <= 18, but these 15% include significant part of the most valuable short-sized UA.

Thinking about earlier dead-end branch identification, the UA of size 4 and 6 are hit with the first few clues and are out of interest.(Is this true for an average grid or the number of exceptions is ~100% ?)
To identify a dead-end branch when placing 13th clue in 17s generation, you must check whether it hits all alive UA composites consisting of 5 disjoint UA. In order to have chance not to be hit with the first 13 clues, their size must be at most 81-13=68 cells, or they must be assembled from 5 UA of average size of 13.6 cells. So, in such a composite, all UA of size >= 14 must be compensated by one or more shorter UA of size 8 to 13.
For 14th clue the average UA in the largest composite is 67/4 = 16.75 cells; for 15th and 16th they are 22 and 32.5 cells respectively.

The New Sudoku Players' Forum

Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles

Re: Bands and low-clue puzzles