Skewed distribution of initial values

Everything about Sudoku that doesn't fit in one of the other sections

Skewed distribution of initial values

Most sudokus have all of the values 1 to 9 present in the starting clues. Recently, someone commented that he found sudokus with a missing value subjectively more difficult. Looking among the 36000-odd puzzles in the 17-clue file, nearly half have a value missing from the clues. Further, most of those have, additionally, one value represented only once. There are even three puzzles that have: one value missing, three others with only one occurence each, and two other values with only two occurrences each (meaning the remaining three values share 10 clues among them):

Code: Select all
`000401200800600010300000500040100700200030000000000000000020080010000000000000030040030000000000052000000670080000300000200000000700000203000100700502000000080000600300000000000081000000000018000050005600000000703000200010000400000600060000300`

My programs don't reach this level of refinement, but one of them can churn out puzzles with one missing value and three occurring only once each, at the rate of four per minute, with between 22 and 26 clues. They come out on the moderately difficult side, and in the selection below, the SE rating is given. In fact, for the first one, SE reported (if I copied it correctly):

2 x Forcing X-chain (Turbot Fish)
2 x Forcing X-chain
1 x XYZ-Wing
6 x Forcing chain
2 x Bi-directional X-cycle
1 x Swordfish

The puzzles all have the value 1 missing and the values 2, 3 and 4 only once each.

I haven't seen this aspect of sudokus discussed here before (apologies if I've missed something), and post this simply for information. Comments welcome.

Regards,

Mike Metcalf

Code: Select all
`  0  5  0  0  0  0  0  0  0  SE 7.2  7  0  2  0  0  9  0  0  6  6  0  0  8  0  0  0  0  0  0  0  0  3  0  0  0  0  0  0  0  0  0  0  6  0  9  7  0  8  0  5  9  0  0  0  0  9  0  0  0  7  8  0  6  0  8  0  6  0  0  0  5  0  0  0  0  0  0  0  0  4  0  0  0  0  0  6  0  0  8  0  2  SE 6.6  0  7  3  0  0  0  0  0  5  6  0  0  0  0  0  0  9  0  5  8  0  0  0  0  7  0  0  0  0  0  9  0  0  0  0  0  9  6  0  0  0  0  0  0  0  0  5  0  7  9  0  0  6  8  0  0  0  0  5  0  0  0  0  4  0  0  0  8  0  0  7  0  0  0  0  6  0  3  0  0  0  SE 4.5  7  0  0  0  5  0  6  0  9  0  0  0  0  0  0  5  0  0  9  0  0  0  0  0  0  0  8  0  0  0  9  7  0  0  5  0  6  0  0  0  8  0  0  0  0  0  7  0  8  9  6  0  0  5  8  0  0  5  4  0  0  0  0  0  9  0  0  0  0  2  0  0  2  7  0  9  0  6  0  0  0  SE 6.7  0  0  9  0  0  0  0  5  0  0  8  0  0  0  0  0  0  0  0  0  5  0  9  0  8  4  0  0  0  0  0  0  0  5  6  0  0  0  0  0  7  0  0  0  0  9  0  8  5  0  0  7  0  0  0  0  3  0  6  0  0  0  5  7  0  0  0  8  0  9  0  0    0  0  0  9  0  0  8  0  6  0SE 6.6  0  0  9  6  2  0  0  7  0  5  0  0  0  0  0  0  0  0  0  0  6  8  7  0  0  5  9  0  0  0  0  0  5  0  0  0  0  0  0  0  9  0  0  0  0  0  5  8  0  0  0  0  3  0  0  4  7  0  0  6  0  0  0  0  0  0  0  0  0  9  8  0   8  0  0  0  6  7  0  0  0  SE 7.1  4  0  0  0  5  0  0  9  6  0  0  0  9  0  0  8  0  0  0  0  0  0  7  0  0  0  5  0  5  0  0  0  0  9  7  0  0  6  0  0  0  0  0  0  0  0  7  9  5  0  0  0  3  0  0  0  0  0  8  0  0  5  9  0  2  0  0  0  0  0  0  0  5  0  0  0  0  0  6  0  4  SE 6.6  0  2  0  0  0  5  0  0  8  0  0  0  0  0  0  0  0  0  0  6  0  0  0  9  0  0  5  0  0  0  8  0  0  9  3  0  7  0  0  0  0  0  0  0  0  0  0  7  9  0  0  0  0  0  0  5  0  0  0  8  0  0  6  6  9  0  0  0  0  5  7  0  0  0  6  0  0  0  0  0  0  SE 7.1  7  0  9  3  0  0  6  0  0  0  8  0  0  0  0  9  0  0  6  0  4  0  5  8  0  7  0  0  0  0  0  0  6  0  5  0  0  0  0  0  0  0  0  9  0  8  0  7  0  9  0  0  6  0  5  0  0  6  0  0  0  0  7  0  0  0  0  2  0  8  0  0  9  0  6  0  0  0  0  8  0  SE 4.6  0  0  0  0  0  7  4  0  0  5  3  0  0  0  0  0  0  0  0  9  0  6  0  0  0  0  0  2  7  0  0  8  5  0  0  0  6  0  0  0  7  0  8  0  0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  9  0  6  5  8  5  0  0  0  0  0  9  0`
[/code]

m_b_metcalf
2017 Supporter

Posts: 10482
Joined: 15 May 2006
Location: Berlin

I don't have SE, so I don't have any idea if these are appropriate for this thread.

Code: Select all
`6..2....8..8.6..349.7....2..9...5..3..5.8.4..3..6...5..7....3.685..4.2..2....7..5..4.....59..4...7..25.6.9...7.15.......9.2.......46.8...2.7.41..6...1..78.....2....2..8.5.5...16...8..9...1.3.1.....6.5.1.4.2.4.....5.1.3...2..5...54...3.6.8..4......71.32...56.49.......6....91....41...3...97....62....4.......17.94...35.61....6.9.5.7...2.9....6.....4.98..75.9..42.......99..3.68..3..4...8......5.6...2.8.9.786....3...4.6....2..2.84..61...6.42...9.4.1...34.1...84..82.5..5....1.....34...9197....5.2...195..7.5...79....97...4...74698...1...879..9.5...7.7..986...6.5.7...9`
daj95376
2014 Supporter

Posts: 2624
Joined: 15 May 2006

Re: Skewed distribution of initial values

here's the frequency distribution for gordon's 17s
the second column number is nine digits, one for each clue, representing the
number of occurrences for that clue, in occurrence order -- not clue label
e.g., first digit 0 means there was one clue that occurred 0 times
Code: Select all
`    8 011122334  203 011123333  253 011222234 6785 011222333   14 012222224 9279 012222233  787 022222223   11 111113333  100 111122234 2505 111122333   16 11122222412262 111222233 4400 112222223    5 122222222`
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Re: Skewed distribution of initial values

gsf wrote:here's the frequency distribution for gordon's 17s
the second column number is nine digits, one for each clue, representing the
number of occurrences for that clue, in occurrence order -- not clue label
e.g., first digit 0 means there was one clue that occurred 0 times
Code: Select all
`    8 011122334  203 011123333  253 011222234 6785 011222333   14 012222224 9279 012222233  787 022222223   11 111113333  100 111122234 2505 111122333   16 11122222412262 111222233 4400 112222223    5 122222222`

Interesting stats

0<=x1<=x2<=...<=x9<=9 and x1+x2+...+x9=17
has 207 solutions

With x2>0 (8 digits must be present), there are still 50 solutions.

With x2>0 and x9<5 => 21 solutions.

With x2>0, x8<4 , x9<5 => 16 solutions :
Code: Select all
`1    0 1 1 1 1 3 3 3 4   x2    0 1 1 1 2 2 3 3 4   3    0 1 1 1 2 3 3 3 3   4    0 1 1 2 2 2 2 3 4   5    0 1 1 2 2 2 3 3 3   6    0 1 2 2 2 2 2 2 4   7    0 1 2 2 2 2 2 3 3   8    0 2 2 2 2 2 2 2 3   9    1 1 1 1 1 2 3 3 4   x   10   1 1 1 1 1 3 3 3 3   11   1 1 1 1 2 2 2 3 4   12   1 1 1 1 2 2 3 3 3   13   1 1 1 2 2 2 2 2 4   14   1 1 1 2 2 2 2 3 3   15   1 1 2 2 2 2 2 2 3   16   1 2 2 2 2 2 2 2 2`

but lines 1, 9 don't exist for the gordon's list.
[edit : error in the lines number]
JPF
Last edited by JPF on Thu Sep 21, 2006 3:58 am, edited 1 time in total.
JPF
2017 Supporter

Posts: 3765
Joined: 06 December 2005
Location: Paris, France

Re: Skewed distribution of initial values

gsf wrote:here's the frequency distribution for gordon's 17s
the second column number is nine digits, one for each clue, representing the
number of occurrences for that clue, in occurrence order -- not clue label
e.g., first digit 0 means there was one clue that occurred 0 times
Code: Select all
`    8 011122334  `

Whoops, I clearly slipped up when filtering the file. Here is the correct list:

Code: Select all
`000000130000080005420000000600000020005010000000000400070402000000600200000000001000031600200000007000000100050200040036000000001000000800710000000000020000000300000401200800600010300000500040100700200030000000000000000020080010000000000000030040030000000000052000000670080000300000200000000700000203000100700502000000080000160500000000008700500000000450100060007000300000020000000000051000073000000000000500000740080000050000100000030006000000050200000047000705000000000200003200000000600300000000000081000000000018000050005600000000703000200010000400000600060000300830020000200500100000000000070000082000601000000000000020080030001000600400000000`

Sorry,

Mike Metcalf

m_b_metcalf
2017 Supporter

Posts: 10482
Joined: 15 May 2006
Location: Berlin

So it it logical to suggest that these puzzles might be the "hardest" of the 17s.

They might also be the puzzles which have the best "mega-clue" !

E.g.

Remove a clue from one of the SF 17s and you get

Code: Select all
`+---+---+---+|...|24.|7..||.8.|...|.9.||.1.|...|...|+---+---+---+|...|8..|..6||7..|...|...||4..|...|2..|+---+---+---+|3..|.7.|...||...|...|..2||...|..6|.18|+---+---+---+  remove the 7 in r7c5+---+---+---+|...|24.|7..||.8.|...|.9.||.1.|...|...|+---+---+---+|...|8..|..6||7..|...|...||4..|...|2..|+---+---+---+|3..|...|...||...|...|..2||...|..6|.18|+---+---+---+      94763 sol. `

A megaclue !

But can it live up to its name ?

But what is the best one ?

Yes it can ! ....taking your first grid and removing the 1 at r5c5
Code: Select all
`+---+---+---+|...|...|13.||...|.8.|..5||42.|...|...|+---+---+---+|6..|...|.2.||..5|...|...||...|...|4..|+---+---+---+|.7.|4.2|...||...|6..|2..||...|...|..1|+---+---+---+   1197093 sol.   I didnt expect that !`

All the counts look heavy compared to the SF, but not a difficult puzzle...
Code: Select all
`961 sol.   1220 sol.313384 sol.980911 sol.122669 sol.543512 sol.35016 sol.586378 sol.249591 sol.1197093 sol.90288 sol.1578 sol.201335 sol.437590 sol.67867 sol.93308 sol.347610 sol.`
coloin

Posts: 1857
Joined: 05 May 2005

Re: Skewed distribution of initial values

m_b_metcalf wrote:.... I haven't seen this aspect of sudokus discussed here before (apologies if I've missed something), ....

The discussion "A different sort of minimalization", which I started on March 19, 2006, touches on this subject. It received a few interesting replies before it died on March 20. It's in the "General/puzzle" forum, on about page 11 (as of this moment), I believe. Check it out.

Bill Smythe
Smythe Dakota

Posts: 563
Joined: 11 February 2006

coloin wrote:... remove the 7 in r7c5 ... 94763 sol.

A megaclue !

But can it live up to its name ?

But what is the best one ?

Yes it can ! ....taking your first grid and removing the 1 at r5c5
(...)
1197093 sol. I didnt expect that !

Two questions for the combinatorics guys (or the unavoidable experts).

1. Does this imply: This solution grid contains at least log2(94763), at least 17, unavoidable sets where the 7 in r7c5 is a member, and no other clue cells are members? [for the other grid/r5c5: at least log2(1197093), or at least 21]

2. What is the maximum possible number of (minimal) unavoidable sets in a subgrid with N cells, where one specific cell is part of all unavoidable sets in that subgrid? If we call this number UMAX, is then 2^(UMAX) an upper limit for the number of solutions we can have when removing one clue from a valid sudoku with 82-N clues?
Ocean

Posts: 442
Joined: 29 August 2005

re: "A different sort of minimalization "

Smythe Dakota wrote:The discussion "A different sort of minimalization", which I started on March 19, 2006, touches on this subject. It received a few interesting replies before it died on March 20. It's in the "General/puzzle" forum, on about page 11 (as of this moment)

here's the link - A different sort of minimalization

Pat

Posts: 3879
Joined: 18 July 2005

Re: re: "A different sort of minimalization "

Pat wrote:
Smythe Dakota wrote:The discussion "A different sort of minimalization", which I started on March 19, 2006, touches on this subject. It received a few interesting replies before it died on March 20. It's in the "General/puzzle" forum, on about page 11 (as of this moment)

here's the link - A different sort of minimalization

Thanks to you both for the reference. Interesting. In the meantime, I have had a few more short runs, and came up with a Sudoku Explainer level 8.9 with many high-level steps (as before, the value 1 missing, and 2, 3, and 4 once only):

Code: Select all
`  0  0  2  0  0  0  6  0  0 SE 8.9 (2 x Dynamic Region Forcing Chains and much more)   0  0  0  9  0  0  0  0  0   0  0  0  0  8  0  5  7  0  0  4  0  0  0  0  0  0  0  0  7  0  0  9  6  0  5  0  6  0  0  0  5  0  8  9  0  5  0  0  6  0  0  0  0  0  0  0  0  0  0  0  7  0  3  8  9  0  0  7  0  0  0  0`

I then tried to combine this approach with the diagonal stripe pattern that is known to give harder puzzles. Here, I could generate only puzzles with the value 1 missing and 2 once only:

Code: Select all
`  3  0  0  2  0  0  9  0  0  SE 8.3 and much more  0  7  0  0  0  0  0  4  0    0  0  0  0  0  3  0  0  8  0  0  0  4  0  0  0  0  0  0  0  0  0  7  0  0  5  0  0  0  8  0  0  6  0  0  3  9  0  0  3  0  0  8  0  0  0  4  0  0  5  0  0  6  0  0  0  6  0  0  9  0  0  7    7  0  0  0  0  0  8  0  0  SE 8.4 and more  0  0  0  0  7  0  0  3  0    0  0  0  0  0  3  0  0  4  8  0  0  9  0  0  6  0  0  0  4  0  0  0  0  0  5  0  0  0  3  0  0  5  0  0  7  2  0  0  6  0  0  7  0  0  0  8  0  0  0  0  0  4  0  0  0  4  0  0  9  0  0  5`

Finally, I allowed some non-diagonal clues, but came up with only (1 missing, 2 once):

Code: Select all
`  3  0  0  4  0  0  0  0  0  SE 7.7  0  4  0  0  0  0  0  3  2    0  0  0  0  0  8  0  9  6  6  0  0  7  0  0  0  0  0  0  7  0  0  8  0  0  4  0  0  0  9  0  0  0  0  0  0  5  0  0  6  0  0  7  0  0  0  0  6  0  9  0  0  5  0  0  0  0  0  0  7  8  0  0`

I think I'll stop now, but this does seem to be a productive way to generate hard puzzles (and with an auxilliary program of less than 400 lines that I wrote as a test for 9x9 only!).

Regards,

Mike Metcalf

m_b_metcalf
2017 Supporter

Posts: 10482
Joined: 15 May 2006
Location: Berlin

Ocean wrote:1. Does this imply: This solution grid contains at least log2(94763), at least 17, unavoidable sets where the 7 in r7c5 is a member, and no other clue cells are members? [size=9][for the other grid/r5c5: at least log2(1197093), or at least 21]

We could easily rearrange the grid so that our megaclue was at position 11 - checker will give us the sets - and we could cross off the rest.

99 out of 537 unavoidables [of size 14 or less] covered by clue at position 11
Code: Select all
`+---+---+---+|1..|..5|...||...|6..|.2.||...|...|4..|+---+---+---+|...|...|13.||8..|...|..5||...|42.|...|+---+---+---+|.42|.7.|...||.6.|...|2..||...|...|..1|+---+---+---+ rearranged megaclue at position 11`

Clue numbers
megaclue = 11

11,16,24,28,37,47,48,51,59,64,65,72,73,75,82,87,99

Unavoidables not covered
Code: Select all
`{11,12,13,15,21,22,25,41,42,43}{11,12,14,15,16,17,21,22,25,27,31,32,34,36}{11,12,14,15,18,22,29,31,34,35,38,39}{11,12,14,15,21,22,23,25,31,32,34,91,92,93}{11,12,14,31,32,33,34,61,62,63}{11,12,15,17,18,21,22,25,29,67,68,69}{11,12,19,22,29,32,33,39,61,62,63}{11,13,34,36,41,43,54,56,71,74,83,86}{11,13,41,43,54,55,56,71,74,75,83,86}{11,14,15,16,17,21,25,26,27,94,95,96}{11,14,15,21,25,26,31,36,94,95,96}{11,15,18,19,32,33,35,38,39,61,62,63}{11,15,19,22,25,32,33,39,61,62,63}{11,15,22,23,25,81,83,84,92,94}{11,15,22,25,53,56,62,66,81,83}   {11,12,18,23,25,29,34,38,55,56,67,68,71,74,79,83,84,86,92,97} and others ?`

Of course the number of grid solutions is determined by the number/size/iterations of the unavoidable sets in the other grids solutions. Our megaclue hits all the remaining sets in the solution grid.

The high number of grid solutions doesnt preclude a grid completion with one clue....it just makes it less likely.

Ocean wrote:2. What is the maximum possible number of (minimal) unavoidable sets in a subgrid with N cells, where one specific cell is part of all unavoidable sets in that subgrid? If we call this number UMAX, is then 2^(UMAX) an upper limit for the number of solutions we can have when removing one clue from a valid sudoku with 82-N clues?

Isnt this what the megaclue does ? The megaclue is only good in combination with the other [not so good] clues which together define and solve the grid. The subgrid would be the megaclue plus the 64 other non-given or insertable clues then ?

C
coloin

Posts: 1857
Joined: 05 May 2005