17-clue and 18-clue Sudoku update

Everything about Sudoku that doesn't fit in one of the other sections
There has been some discussion about defining "distances" in the Sudoku space thread and also on earlier pages here (see e.g. Oceans's interesting alternative definition). For me it turned out, that there is no satisfying definition for all purposes, so as you said, it is a matter of point of view, which one fits better.
Red Ed wrote:Now a challenge:
• The biggest distance I've seen between a pair of 17s is 16.
• The smallest similarity I've seen between a pair of 17s is 8.
Can anyone do better? (I only checked a smallish random sample.)

I made some 10000 comparison's, but did not find better ones. A distance of 16 seems to be rather common for "exotic" puzzles outside the big 17-cluster, so maybe comparisons between those "island puzzles" have good chances to improve it.
There is a much better way: it's called the Hungarian (or Munkres') Assignment Algorithm.
Thanks for the link. At the first glance i did something similar, but i suppose it should be faster than my method.
ravel

Posts: 998
Joined: 21 February 2006

ravel wrote:There has been some discussion about defining "distances" ... (see e.g. Oceans's interesting alternative definition).
fwiw, I came up with a very similar definition nearly a year earlier, but had (and still have) no clue about how to do any useful calculations with it.
Red Ed

Posts: 633
Joined: 06 June 2005

Hello all,

It's really fascinating to follow the excellent discussion about 17 and 18 clue Sudokus! This is clearly the most advanced and spectacular discussion in the Science of Sudoku! What is the total number of 17-clue Sudokus? Is there a 16? How are these 17-clue Sudokus related? Recently I have been involved in the Sudoku Architect project that Håvard manages, and I have been privileged to be able to find new 17-clue Sudokus every day: thanks to Intel Compiler Team!

In order to estimate how many 17-clue Sudokus are left, I did a set of 10 isolated experiments. Results are:
Code: Select all
`          20   20+1-1     19     18   18+1-1   17  n17 n17+2-2   #1   1914   34593   37256   5480   23363   207   2   2   #2   3701   65420   68885   9070   36202   193   8  13   #3   3191   55411   58159   7281   28596   190   5   5   #4   2822   49995   52729   6250   24252   182   6   7   #5   2687   46949   48740   5658   21236   107   8  12   #6   6531  114346  120800  14640   56054   416  12  26   #7   3006   51873   54749   7211   28964   209   4   6   #8   2665   47653   50587   6573   24758   150   2   3   #9   3543   63262   69287   8861   33489   169   6   8   #10  3360   60654   69596   9154   35519   312   5   6   Sum 33420  590156  630788  80178  312433  2135  58  88`

Each experiment was based on isolated random sets of 20-clue Sudokus, and I followed the technique 20+1-1+1-2+1-2+1-1+1-2+2-2=17. In the table, n17 denotes the new Sudokus that was added to Gordon's list today. Even though all experiments are based on random generation of 20-clue Sudokus, I am not certain that this may lead to some intelligible statistical conclusions. Please feel free to comment and conclude…

Best Regards,

Lars Petter Endresen
Lars Petter Endresen

Posts: 7
Joined: 03 June 2007
Location: NORWAY

BTW: in my Sudoku experiment I suddenly came across this family...

Code: Select all
`..57....9.12.......4..3.........42...........3....2..68..57...................84...57....9.12.......4..3.........42...........3....2..78..57...................84...67....9.12.......4..3.........42...........3....2..58..57...................84...67....9.12.......4..3.........42...........3....2..78..57...................84...57....9.12.......4..3.........42...........3....2..68..56...................84...57....9.12.......4..3.........42...........3....2..68..59...................84...56....7.12.......4..3.........42...........3....2..68..57...................84...5.6...7.12.......4..3.........42...........3....2..68..57...................84...5.6...9.12.......4..3.........42...........3....2..78..57...................84...56....9.12.......4..3.........42...........3....2..68..97...................84...5.6...9.12.......4..3.........42...........3....2..68..97...................84...57....9.12.......4.3..........42...........3....2..68..67...................84...6.7...9.12.......4..3.........42...........3....2..68..59...................84...76....9.12.......4..3.........42...........3....2..68..59...................84...7.6...9.12.......4..3.........42...........3....2..68..59...................84...67....5.12.......4..3.........42...........3....2..68..59...................84.`

I guess they all have the same grandpa...

Lars Petter Endresen

Posts: 7
Joined: 03 June 2007
Location: NORWAY

About guessing the number of 17s.

What we are doing is like fishing in a lake...(catch, tag and release the fish)

Can we say that you generated 2135 17s and that only 88 were "new".
The success rate is 88/2135=4%

I don't remember, what is the MLE of the number of fishes in that case .

Can we say that 4% of say 42000 are still to be discovered ~ 1680-1700.
But of course, the fishing is not perfect and some strange animals are hidden and more difficult to cath than others.

In the good days, I get a success rate of 3% - 4% starting with a 19.

JPF
JPF
2017 Supporter

Posts: 3754
Joined: 06 December 2005
Location: Paris, France

Red Ed wrote:PS: regarding this ...
gsf wrote:I'm hoping there's a better way than running p! combinations on p matching positions,
but I don't see it right now
There is a much better way: it's called the Hungarian (or Munkres') Assignment Algorithm.

thanks for the reference
p! (a bad knee jerk reaction) down to 2**p (second attempt) down to p**3 (Hungarian modulo <= 9 clues mapping to possibly > 9 positions)
distance/similarity in my solver (not yet posted) are now in agreement with Red Ed and ravel
gsf
2014 Supporter

Posts: 7306
Joined: 21 September 2005
Location: NJ USA

gsf wrote:thanks for the reference
p! (a bad knee jerk reaction) down to 2**p (second attempt) down to p**3 (Hungarian modulo <= 9 clues mapping to possibly > 9 positions)
distance/similarity in my solver (not yet posted) are now in agreement with Red Ed and ravel

It would be a good idea that one could assign different point schemes for the Hamming distance. For example, the original Hamming distance gives 2 points for each clue in different position and 1 point for a clue in the same position and different value (am I right?), but if we assign 10 points for each clue in different position and 1 for same position and different value, we encourage more similar patterns, and so on. Could this be implemented?
Mauricio

Posts: 1174
Joined: 22 March 2006

Mauricio wrote:For example, the original Hamming distance gives 2 points for each clue in different position and 1 point for a clue in the same position and different value (am I right?)

Hamming distance between two strings is defined to be the number of places in which the two strings differ.

For example, the distance between

Code: Select all
`100200200020`

is three, because they are different in 3 places.

Code: Select all
`100200200020^  ^^`

As you noticed, this has the effect that a clue that is "moved" will add 2 to the distance, while one that is changed will add 1 to the distance, but this is a consequence of the definition and not really a free choice.

Gordon
gfroyle

Posts: 214
Joined: 21 June 2005

gfroyle wrote:Hamming distance between two strings is defined to be the number of places in which the two strings differ.

For example, the distance between

Code: Select all
`100200200020`

is three.

I see, but it depends how it is implemented. For example, if we take the above puzzle as the target puzzle, and if for each nonzero cell of the below puzzle we count:
• 0 points if the clue above it is the same,
• 1 point if the clue above it is not the same and nonzero,
• 2 points if te clue above it is zero,
the the count is three to, and it always gives the hamming distance (wrong, see edit).
If the hamming distance is implemented this way, then we could change the values.

Edit: I see, that only works if the puzzles have the same number of clues. Nevermind.
Mauricio

Posts: 1174
Joined: 22 March 2006

re: Mauricio distance

Mauricio wrote:It would be a good idea that one could assign different point schemes for the Hamming distance.

For example, the original Hamming distance gives 2 points for each clue in different position and 1 point for a clue in the same position and different value (am I right?),
but if we assign 10 points for each clue in different position and 1 for same position and different value, we encourage more similar patterns, and so on.

Could this be implemented?

as gfroyle pointed out, this would no longer be the Hamming distance.
but gsf could choose to implement this anyway,
and call it the Mauricio distance.

Pat

Posts: 3710
Joined: 18 July 2005

re: Lars Petter Endresen's new 17s

JPF wrote:
Lars Petter Endresen wrote:
Code: Select all
`          20   20+1-1     19     18   18+1-1   17  n17 n17+2-2   #1   1914   34593   37256   5480   23363   207   2   2   #2   3701   65420   68885   9070   36202   193   8  13   #3   3191   55411   58159   7281   28596   190   5   5   #4   2822   49995   52729   6250   24252   182   6   7   #5   2687   46949   48740   5658   21236   107   8  12   #6   6531  114346  120800  14640   56054   416  12  26   #7   3006   51873   54749   7211   28964   209   4   6   #8   2665   47653   50587   6573   24758   150   2   3   #9   3543   63262   69287   8861   33489   169   6   8   #10  3360   60654   69596   9154   35519   312   5   6   Sum 33420  590156  630788  80178  312433  2135  58  88`

I followed the technique 20 +1-1 +1-2 +1-2 +1-1 +1-2 +2-2 =17

In the table, n17 denotes the new Sudokus that was added to Gordon's list today

Can we say that you generated 2135 17s and that only 88 were "new".
The success rate is 88/2135=4%

my interpretation of Lars Petter Endresen's report, is that the experiment produced 58 + 88 = 146 new 17s;

but it is unclear to me if these 146 come from a total of 2135 17s
or are we perhaps missing a column for 17+2-2 puzzles produced

~ Pat

Pat

Posts: 3710
Joined: 18 July 2005

I interpreted the LPE table as ...

Code: Select all
`From ten runs: 33420          (20s) generated =>590156          (20s) generated =>630788          (19s) generated => 80178          (18s) generated =>312443          (18s) generated =>  2135 existing (17s) +    58 new      (17s) generated =>    88 existing (17s)`
daj95376
2014 Supporter

Posts: 2624
Joined: 15 May 2006

OK, I'll try this guessing game too. Starting with 33,420 20s Lars generated 2,135 17s of which 58 were new. Then a 2-off/2-on run on these 58 yielded an additional 30 new 17s ... for a total of 88.
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

ronk wrote:OK, I'll try this guessing game too. Starting with 33,420 20s Lars generated 2,135 17s of which 58 were new. Then a 2-off/2-on run on these 58 yielded an additional 30 new 17s ... for a total of 88.

It was my interpretation

JPF
JPF
2017 Supporter

Posts: 3754
Joined: 06 December 2005
Location: Paris, France

JPF wrote:It was my interpretation

Sorry, I forgot you already reached that conclusion for your "success rate" post.
ronk
2012 Supporter

Posts: 4764
Joined: 02 November 2005
Location: Southeastern USA

PreviousNext