Skewed distribution of initial values

Everything about Sudoku that doesn't fit in one of the other sections

Skewed distribution of initial values

Postby m_b_metcalf » Wed Sep 20, 2006 1:27 am

Most sudokus have all of the values 1 to 9 present in the starting clues. Recently, someone commented that he found sudokus with a missing value subjectively more difficult. Looking among the 36000-odd puzzles in the 17-clue file, nearly half have a value missing from the clues. Further, most of those have, additionally, one value represented only once. There are even three puzzles that have: one value missing, three others with only one occurence each, and two other values with only two occurrences each (meaning the remaining three values share 10 clues among them):

Code: Select all
000401200800600010300000500040100700200030000000000000000020080010000000000000030
040030000000000052000000670080000300000200000000700000203000100700502000000080000
600300000000000081000000000018000050005600000000703000200010000400000600060000300


My programs don't reach this level of refinement, but one of them can churn out puzzles with one missing value and three occurring only once each, at the rate of four per minute, with between 22 and 26 clues. They come out on the moderately difficult side, and in the selection below, the SE rating is given. In fact, for the first one, SE reported (if I copied it correctly):

2 x Forcing X-chain (Turbot Fish)
2 x Forcing X-chain
1 x XYZ-Wing
6 x Forcing chain
2 x Bi-directional X-cycle
1 x Swordfish

The puzzles all have the value 1 missing and the values 2, 3 and 4 only once each.

I haven't seen this aspect of sudokus discussed here before (apologies if I've missed something), and post this simply for information. Comments welcome.

Regards,

Mike Metcalf

Code: Select all
  0  5  0  0  0  0  0  0  0  SE 7.2
  7  0  2  0  0  9  0  0  6
  6  0  0  8  0  0  0  0  0
  0  0  0  3  0  0  0  0  0
  0  0  0  0  0  6  0  9  7
  0  8  0  5  9  0  0  0  0
  9  0  0  0  7  8  0  6  0
  8  0  6  0  0  0  5  0  0
  0  0  0  0  0  0  4  0  0

  0  0  0  6  0  0  8  0  2  SE 6.6
  0  7  3  0  0  0  0  0  5
  6  0  0  0  0  0  0  9  0
  5  8  0  0  0  0  7  0  0
  0  0  0  9  0  0  0  0  0
  9  6  0  0  0  0  0  0  0
  0  5  0  7  9  0  0  6  8
  0  0  0  0  5  0  0  0  0
  4  0  0  0  8  0  0  7  0

  0  0  0  6  0  3  0  0  0  SE 4.5
  7  0  0  0  5  0  6  0  9
  0  0  0  0  0  0  5  0  0
  9  0  0  0  0  0  0  0  8
  0  0  0  9  7  0  0  5  0
  6  0  0  0  8  0  0  0  0
  0  7  0  8  9  6  0  0  5
  8  0  0  5  4  0  0  0  0
  0  9  0  0  0  0  2  0  0

  2  7  0  9  0  6  0  0  0  SE 6.7
  0  0  9  0  0  0  0  5  0
  0  8  0  0  0  0  0  0  0
  0  0  5  0  9  0  8  4  0
  0  0  0  0  0  0  5  6  0
  0  0  0  0  7  0  0  0  0
  9  0  8  5  0  0  7  0  0
  0  0  3  0  6  0  0  0  5
  7  0  0  0  8  0  9  0  0
 
  0  0  0  9  0  0  8  0  6  0SE 6.6
  0  0  9  6  2  0  0  7  0
  5  0  0  0  0  0  0  0  0
  0  0  6  8  7  0  0  5  9
  0  0  0  0  0  5  0  0  0
  0  0  0  0  9  0  0  0  0
  0  5  8  0  0  0  0  3  0
  0  4  7  0  0  6  0  0  0
  0  0  0  0  0  0  9  8  0
 
  8  0  0  0  6  7  0  0  0  SE 7.1
  4  0  0  0  5  0  0  9  6
  0  0  0  9  0  0  8  0  0
  0  0  0  0  7  0  0  0  5
  0  5  0  0  0  0  9  7  0
  0  6  0  0  0  0  0  0  0
  0  7  9  5  0  0  0  3  0
  0  0  0  0  8  0  0  5  9
  0  2  0  0  0  0  0  0  0

  5  0  0  0  0  0  6  0  4  SE 6.6
  0  2  0  0  0  5  0  0  8
  0  0  0  0  0  0  0  0  0
  0  6  0  0  0  9  0  0  5
  0  0  0  8  0  0  9  3  0
  7  0  0  0  0  0  0  0  0
  0  0  7  9  0  0  0  0  0
  0  5  0  0  0  8  0  0  6
  6  9  0  0  0  0  5  7  0

  0  0  6  0  0  0  0  0  0  SE 7.1
  7  0  9  3  0  0  6  0  0
  0  8  0  0  0  0  9  0  0
  6  0  4  0  5  8  0  7  0
  0  0  0  0  0  6  0  5  0
  0  0  0  0  0  0  0  9  0
  8  0  7  0  9  0  0  6  0
  5  0  0  6  0  0  0  0  7
  0  0  0  0  2  0  8  0  0

  9  0  6  0  0  0  0  8  0  SE 4.6
  0  0  0  0  0  7  4  0  0
  5  3  0  0  0  0  0  0  0
  0  9  0  6  0  0  0  0  0
  2  7  0  0  8  5  0  0  0
  6  0  0  0  7  0  8  0  0
  0  0  0  7  0  0  0  0  0
  0  0  0  0  0  9  0  6  5
  8  5  0  0  0  0  0  9  0
[/code]
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13584
Joined: 15 May 2006
Location: Berlin

Postby daj95376 » Wed Sep 20, 2006 4:35 am

I don't have SE, so I don't have any idea if these are appropriate for this thread.

Code: Select all
6..2....8..8.6..349.7....2..9...5..3..5.8.4..3..6...5..7....3.685..4.2..2....7..5
..4.....59..4...7..25.6.9...7.15.......9.2.......46.8...2.7.41..6...1..78.....2..
..2..8.5.5...16...8..9...1.3.1.....6.5.1.4.2.4.....5.1.3...2..5...54...3.6.8..4..
....71.32...56.49.......6....91....41...3...97....62....4.......17.94...35.61....
6.9.5.7...2.9....6.....4.98..75.9..42.......99..3.68..3..4...8......5.6...2.8.9.7
86....3...4.6....2..2.84..61...6.42...9.4.1...34.1...84..82.5..5....1.....34...91
97....5.2...195..7.5...79....97...4...74698...1...879..9.5...7.7..986...6.5.7...9
daj95376
2014 Supporter
 
Posts: 2624
Joined: 15 May 2006

Re: Skewed distribution of initial values

Postby gsf » Wed Sep 20, 2006 3:17 pm

here's the frequency distribution for gordon's 17s
the second column number is nine digits, one for each clue, representing the
number of occurrences for that clue, in occurrence order -- not clue label
e.g., first digit 0 means there was one clue that occurred 0 times
Code: Select all
    8 011122334
  203 011123333
  253 011222234
 6785 011222333
   14 012222224
 9279 012222233
  787 022222223
   11 111113333
  100 111122234
 2505 111122333
   16 111222224
12262 111222233
 4400 112222223
    5 122222222
gsf
2014 Supporter
 
Posts: 7306
Joined: 21 September 2005
Location: NJ USA

Re: Skewed distribution of initial values

Postby JPF » Wed Sep 20, 2006 6:37 pm

gsf wrote:here's the frequency distribution for gordon's 17s
the second column number is nine digits, one for each clue, representing the
number of occurrences for that clue, in occurrence order -- not clue label
e.g., first digit 0 means there was one clue that occurred 0 times
Code: Select all
    8 011122334
  203 011123333
  253 011222234
 6785 011222333
   14 012222224
 9279 012222233
  787 022222223
   11 111113333
  100 111122234
 2505 111122333
   16 111222224
12262 111222233
 4400 112222223
    5 122222222

Interesting stats

0<=x1<=x2<=...<=x9<=9 and x1+x2+...+x9=17
has 207 solutions

With x2>0 (8 digits must be present), there are still 50 solutions.

With x2>0 and x9<5 => 21 solutions.

With x2>0, x8<4 , x9<5 => 16 solutions :
Code: Select all
1    0 1 1 1 1 3 3 3 4   x
2    0 1 1 1 2 2 3 3 4   
3    0 1 1 1 2 3 3 3 3   
4    0 1 1 2 2 2 2 3 4   
5    0 1 1 2 2 2 3 3 3   
6    0 1 2 2 2 2 2 2 4   
7    0 1 2 2 2 2 2 3 3   
8    0 2 2 2 2 2 2 2 3   
9    1 1 1 1 1 2 3 3 4   x   
10   1 1 1 1 1 3 3 3 3   
11   1 1 1 1 2 2 2 3 4   
12   1 1 1 1 2 2 3 3 3   
13   1 1 1 2 2 2 2 2 4   
14   1 1 1 2 2 2 2 3 3   
15   1 1 2 2 2 2 2 2 3   
16   1 2 2 2 2 2 2 2 2

but lines 1, 9 don't exist for the gordon's list.
[edit : error in the lines number]
JPF
Last edited by JPF on Thu Sep 21, 2006 3:58 am, edited 1 time in total.
JPF
2017 Supporter
 
Posts: 6127
Joined: 06 December 2005
Location: Paris, France

Re: Skewed distribution of initial values

Postby m_b_metcalf » Wed Sep 20, 2006 9:05 pm

gsf wrote:here's the frequency distribution for gordon's 17s
the second column number is nine digits, one for each clue, representing the
number of occurrences for that clue, in occurrence order -- not clue label
e.g., first digit 0 means there was one clue that occurred 0 times
Code: Select all
    8 011122334
 


Whoops, I clearly slipped up when filtering the file. Here is the correct list:

Code: Select all
000000130000080005420000000600000020005010000000000400070402000000600200000000001
000031600200000007000000100050200040036000000001000000800710000000000020000000300
000401200800600010300000500040100700200030000000000000000020080010000000000000030
040030000000000052000000670080000300000200000000700000203000100700502000000080000
160500000000008700500000000450100060007000300000020000000000051000073000000000000
500000740080000050000100000030006000000050200000047000705000000000200003200000000
600300000000000081000000000018000050005600000000703000200010000400000600060000300
830020000200500100000000000070000082000601000000000000020080030001000600400000000


Sorry,

Mike Metcalf
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13584
Joined: 15 May 2006
Location: Berlin

Postby coloin » Thu Sep 21, 2006 12:44 am

So it it logical to suggest that these puzzles might be the "hardest" of the 17s.

They might also be the puzzles which have the best "mega-clue" !

E.g.

Remove a clue from one of the SF 17s and you get

Code: Select all
+---+---+---+
|...|24.|7..|
|.8.|...|.9.|
|.1.|...|...|
+---+---+---+
|...|8..|..6|
|7..|...|...|
|4..|...|2..|
+---+---+---+
|3..|.7.|...|
|...|...|..2|
|...|..6|.18|
+---+---+---+  remove the 7 in r7c5

+---+---+---+
|...|24.|7..|
|.8.|...|.9.|
|.1.|...|...|
+---+---+---+
|...|8..|..6|
|7..|...|...|
|4..|...|2..|
+---+---+---+
|3..|...|...|
|...|...|..2|
|...|..6|.18|
+---+---+---+      94763 sol.


A megaclue !

But can it live up to its name ?

But what is the best one ?

Yes it can ! ....taking your first grid and removing the 1 at r5c5
Code: Select all
+---+---+---+
|...|...|13.|
|...|.8.|..5|
|42.|...|...|
+---+---+---+
|6..|...|.2.|
|..5|...|...|
|...|...|4..|
+---+---+---+
|.7.|4.2|...|
|...|6..|2..|
|...|...|..1|
+---+---+---+   1197093 sol.   I didnt expect that !


All the counts look heavy compared to the SF, but not a difficult puzzle...
Code: Select all
961 sol.   
1220 sol.
313384 sol.
980911 sol.
122669 sol.
543512 sol.
35016 sol.
586378 sol.
249591 sol.
1197093 sol.
90288 sol.
1578 sol.
201335 sol.
437590 sol.
67867 sol.
93308 sol.
347610 sol.
coloin
 
Posts: 2384
Joined: 05 May 2005
Location: Devon

Re: Skewed distribution of initial values

Postby Smythe Dakota » Thu Sep 21, 2006 11:12 am

m_b_metcalf wrote:.... I haven't seen this aspect of sudokus discussed here before (apologies if I've missed something), ....

The discussion "A different sort of minimalization", which I started on March 19, 2006, touches on this subject. It received a few interesting replies before it died on March 20. It's in the "General/puzzle" forum, on about page 11 (as of this moment), I believe. Check it out.

Bill Smythe
Smythe Dakota
 
Posts: 564
Joined: 11 February 2006

Postby Ocean » Thu Sep 21, 2006 11:52 am

coloin wrote:... remove the 7 in r7c5 ... 94763 sol.

A megaclue !

But can it live up to its name ?

But what is the best one ?

Yes it can ! ....taking your first grid and removing the 1 at r5c5
(...)
1197093 sol. I didnt expect that !

Two questions for the combinatorics guys (or the unavoidable experts).

1. Does this imply: This solution grid contains at least log2(94763), at least 17, unavoidable sets where the 7 in r7c5 is a member, and no other clue cells are members? [for the other grid/r5c5: at least log2(1197093), or at least 21]

2. What is the maximum possible number of (minimal) unavoidable sets in a subgrid with N cells, where one specific cell is part of all unavoidable sets in that subgrid? If we call this number UMAX, is then 2^(UMAX) an upper limit for the number of solutions we can have when removing one clue from a valid sudoku with 82-N clues?
Ocean
 
Posts: 442
Joined: 29 August 2005

re: "A different sort of minimalization "

Postby Pat » Thu Sep 21, 2006 12:32 pm

Smythe Dakota wrote:The discussion "A different sort of minimalization", which I started on March 19, 2006, touches on this subject. It received a few interesting replies before it died on March 20. It's in the "General/puzzle" forum, on about page 11 (as of this moment)

here's the link - A different sort of minimalization
User avatar
Pat
 
Posts: 4056
Joined: 18 July 2005

Re: re: "A different sort of minimalization "

Postby m_b_metcalf » Thu Sep 21, 2006 11:45 pm

Pat wrote:
Smythe Dakota wrote:The discussion "A different sort of minimalization", which I started on March 19, 2006, touches on this subject. It received a few interesting replies before it died on March 20. It's in the "General/puzzle" forum, on about page 11 (as of this moment)

here's the link - A different sort of minimalization


Thanks to you both for the reference. Interesting. In the meantime, I have had a few more short runs, and came up with a Sudoku Explainer level 8.9 with many high-level steps (as before, the value 1 missing, and 2, 3, and 4 once only):

Code: Select all
  0  0  2  0  0  0  6  0  0 SE 8.9 (2 x Dynamic Region Forcing Chains and much more)
  0  0  0  9  0  0  0  0  0
  0  0  0  0  8  0  5  7  0
  0  4  0  0  0  0  0  0  0
  0  7  0  0  9  6  0  5  0
  6  0  0  0  5  0  8  9  0
  5  0  0  6  0  0  0  0  0
  0  0  0  0  0  0  7  0  3
  8  9  0  0  7  0  0  0  0


I then tried to combine this approach with the diagonal stripe pattern that is known to give harder puzzles. Here, I could generate only puzzles with the value 1 missing and 2 once only:

Code: Select all
  3  0  0  2  0  0  9  0  0  SE 8.3 and much more
  0  7  0  0  0  0  0  4  0 
  0  0  0  0  0  3  0  0  8
  0  0  0  4  0  0  0  0  0
  0  0  0  0  7  0  0  5  0
  0  0  8  0  0  6  0  0  3
  9  0  0  3  0  0  8  0  0
  0  4  0  0  5  0  0  6  0
  0  0  6  0  0  9  0  0  7
 
  7  0  0  0  0  0  8  0  0  SE 8.4 and more
  0  0  0  0  7  0  0  3  0 
  0  0  0  0  0  3  0  0  4
  8  0  0  9  0  0  6  0  0
  0  4  0  0  0  0  0  5  0
  0  0  3  0  0  5  0  0  7
  2  0  0  6  0  0  7  0  0
  0  8  0  0  0  0  0  4  0
  0  0  4  0  0  9  0  0  5


Finally, I allowed some non-diagonal clues, but came up with only (1 missing, 2 once):

Code: Select all
  3  0  0  4  0  0  0  0  0  SE 7.7
  0  4  0  0  0  0  0  3  2 
  0  0  0  0  0  8  0  9  6
  6  0  0  7  0  0  0  0  0
  0  7  0  0  8  0  0  4  0
  0  0  9  0  0  0  0  0  0
  5  0  0  6  0  0  7  0  0
  0  0  6  0  9  0  0  5  0
  0  0  0  0  0  7  8  0  0


I think I'll stop now, but this does seem to be a productive way to generate hard puzzles (and with an auxilliary program of less than 400 lines that I wrote as a test for 9x9 only!).

Regards,

Mike Metcalf
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13584
Joined: 15 May 2006
Location: Berlin

Postby coloin » Fri Sep 22, 2006 5:33 pm

Ocean wrote:1. Does this imply: This solution grid contains at least log2(94763), at least 17, unavoidable sets where the 7 in r7c5 is a member, and no other clue cells are members? [size=9][for the other grid/r5c5: at least log2(1197093), or at least 21]


We could easily rearrange the grid so that our megaclue was at position 11 - checker will give us the sets - and we could cross off the rest.

99 out of 537 unavoidables [of size 14 or less] covered by clue at position 11
Code: Select all
+---+---+---+
|1..|..5|...|
|...|6..|.2.|
|...|...|4..|
+---+---+---+
|...|...|13.|
|8..|...|..5|
|...|42.|...|
+---+---+---+
|.42|.7.|...|
|.6.|...|2..|
|...|...|..1|
+---+---+---+ rearranged megaclue at position 11


Clue numbers
megaclue = 11

11,16,24,28,37,47,48,51,59,64,65,72,73,75,82,87,99

Unavoidables not covered
Code: Select all
{11,12,13,15,21,22,25,41,42,43}
{11,12,14,15,16,17,21,22,25,27,31,32,34,36}
{11,12,14,15,18,22,29,31,34,35,38,39}
{11,12,14,15,21,22,23,25,31,32,34,91,92,93}
{11,12,14,31,32,33,34,61,62,63}
{11,12,15,17,18,21,22,25,29,67,68,69}
{11,12,19,22,29,32,33,39,61,62,63}
{11,13,34,36,41,43,54,56,71,74,83,86}
{11,13,41,43,54,55,56,71,74,75,83,86}
{11,14,15,16,17,21,25,26,27,94,95,96}
{11,14,15,21,25,26,31,36,94,95,96}
{11,15,18,19,32,33,35,38,39,61,62,63}
{11,15,19,22,25,32,33,39,61,62,63}
{11,15,22,23,25,81,83,84,92,94}
{11,15,22,25,53,56,62,66,81,83}   

{11,12,18,23,25,29,34,38,55,56,67,68,71,74,79,83,84,86,92,97} and others ?



Of course the number of grid solutions is determined by the number/size/iterations of the unavoidable sets in the other grids solutions. Our megaclue hits all the remaining sets in the solution grid.

The high number of grid solutions doesnt preclude a grid completion with one clue....it just makes it less likely.

Ocean wrote:2. What is the maximum possible number of (minimal) unavoidable sets in a subgrid with N cells, where one specific cell is part of all unavoidable sets in that subgrid? If we call this number UMAX, is then 2^(UMAX) an upper limit for the number of solutions we can have when removing one clue from a valid sudoku with 82-N clues?


Isnt this what the megaclue does ? The megaclue is only good in combination with the other [not so good] clues which together define and solve the grid. The subgrid would be the megaclue plus the 64 other non-given or insertable clues then ?

C
coloin
 
Posts: 2384
Joined: 05 May 2005
Location: Devon


Return to General