The New Sudoku Players' Forum

by **ravel** » Sun Aug 12, 2007 12:21 pm

There has been some discussion about defining "distances" in the Sudoku space thread and also on earlier pages here (see e.g. Oceans's interesting alternative definition). For me it turned out, that there is no satisfying definition for all purposes, so as you said, it is a matter of point of view, which one fits better.

Red Ed wrote:Now a challenge:
The biggest distance I've seen between a pair of 17s is 16.
The smallest similarity I've seen between a pair of 17s is 8.
Can anyone do better? (I only checked a smallish random sample.)

I made some 10000 comparison's, but did not find better ones. A distance of 16 seems to be rather common for "exotic" puzzles outside the big 17-cluster, so maybe comparisons between those "island puzzles" have good chances to improve it.

There is a much better way: it's called the Hungarian (or Munkres') Assignment Algorithm.

Thanks for the link. At the first glance i did something similar, but i suppose it should be faster than my method.

by **Red Ed** » Sun Aug 12, 2007 6:09 pm

ravel wrote:There has been some discussion about defining "distances" ... (see e.g. Oceans's interesting alternative definition).

fwiw, I came up with a very similar definition nearly a year earlier, but had (and still have) no clue about how to do any useful calculations with it.

by **Lars Petter Endresen** » Mon Aug 13, 2007 10:00 pm

Hello all,

It's really fascinating to follow the excellent discussion about 17 and 18 clue Sudokus! This is clearly the most advanced and spectacular discussion in the Science of Sudoku! What is the total number of 17-clue Sudokus? Is there a 16? How are these 17-clue Sudokus related? Recently I have been involved in the Sudoku Architect project that Håvard manages, and I have been privileged to be able to find new 17-clue Sudokus every day: thanks to Intel Compiler Team!

In order to estimate how many 17-clue Sudokus are left, I did a set of 10 isolated experiments. Results are:

Code: Select all: 20 20+1-1 19 18 18+1-1 17 n17 n17+2-2 #1 1914 34593 37256 5480 23363 207 2 2 #2 3701 65420 68885 9070 36202 193 8 13 #3 3191 55411 58159 7281 28596 190 5 5 #4 2822 49995 52729 6250 24252 182 6 7 #5 2687 46949 48740 5658 21236 107 8 12 #6 6531 114346 120800 14640 56054 416 12 26 #7 3006 51873 54749 7211 28964 209 4 6 #8 2665 47653 50587 6573 24758 150 2 3 #9 3543 63262 69287 8861 33489 169 6 8 #10 3360 60654 69596 9154 35519 312 5 6 Sum 33420 590156 630788 80178 312433 2135 58 88

Each experiment was based on isolated random sets of 20-clue Sudokus, and I followed the technique 20+1-1+1-2+1-2+1-1+1-2+2-2=17. In the table, n17 denotes the new Sudokus that was added to Gordon's list today. Even though all experiments are based on random generation of 20-clue Sudokus, I am not certain that this may lead to some intelligible statistical conclusions. Please feel free to comment and conclude…

Best Regards,

Lars Petter Endresen

by **Lars Petter Endresen** » Mon Aug 13, 2007 10:35 pm

BTW: in my Sudoku experiment I suddenly came across this family...

Code: Select all: ..57....9.12.......4..3.........42...........3....2..68..57...................84. ..57....9.12.......4..3.........42...........3....2..78..57...................84. ..67....9.12.......4..3.........42...........3....2..58..57...................84. ..67....9.12.......4..3.........42...........3....2..78..57...................84. ..57....9.12.......4..3.........42...........3....2..68..56...................84. ..57....9.12.......4..3.........42...........3....2..68..59...................84. ..56....7.12.......4..3.........42...........3....2..68..57...................84. ..5.6...7.12.......4..3.........42...........3....2..68..57...................84. ..5.6...9.12.......4..3.........42...........3....2..78..57...................84. ..56....9.12.......4..3.........42...........3....2..68..97...................84. ..5.6...9.12.......4..3.........42...........3....2..68..97...................84. ..57....9.12.......4.3..........42...........3....2..68..67...................84. ..6.7...9.12.......4..3.........42...........3....2..68..59...................84. ..76....9.12.......4..3.........42...........3....2..68..59...................84. ..7.6...9.12.......4..3.........42...........3....2..68..59...................84. ..67....5.12.......4..3.........42...........3....2..68..59...................84.

I guess they all have the same grandpa...

by **JPF** » Mon Aug 13, 2007 11:03 pm

About guessing the number of 17s.

What we are doing is like fishing in a lake...(catch, tag and release the fish)

Can we say that you generated 2135 17s and that only 88 were "new".
The success rate is 88/2135=4%

I don't remember, what is the MLE of the number of fishes in that case .

Can we say that 4% of say 42000 are still to be discovered ~ 1680-1700.
But of course, the fishing is not perfect and some strange animals are hidden and more difficult to cath than others.

In the good days, I get a success rate of 3% - 4% starting with a 19.

JPF

by **gsf** » Tue Aug 14, 2007 5:43 am

Red Ed wrote:PS: regarding this ...
gsf wrote:I'm hoping there's a better way than running p! combinations on p matching positions,
but I don't see it right now
There is a much better way: it's called the Hungarian (or Munkres') Assignment Algorithm.

thanks for the reference
p! (a bad knee jerk reaction) down to 2**p (second attempt) down to p**3 (Hungarian modulo <= 9 clues mapping to possibly > 9 positions)
distance/similarity in my solver (not yet posted) are now in agreement with Red Ed and ravel

by **Mauricio** » Tue Aug 14, 2007 5:54 am

gsf wrote:thanks for the reference
p! (a bad knee jerk reaction) down to 2**p (second attempt) down to p**3 (Hungarian modulo <= 9 clues mapping to possibly > 9 positions)
distance/similarity in my solver (not yet posted) are now in agreement with Red Ed and ravel

It would be a good idea that one could assign different point schemes for the Hamming distance. For example, the original Hamming distance gives 2 points for each clue in different position and 1 point for a clue in the same position and different value (am I right?), but if we assign 10 points for each clue in different position and 1 for same position and different value, we encourage more similar patterns, and so on. Could this be implemented?

by **gfroyle** » Tue Aug 14, 2007 1:29 pm

Mauricio wrote:For example, the original Hamming distance gives 2 points for each clue in different position and 1 point for a clue in the same position and different value (am I right?)

Hamming distance between two strings is defined to be the number of places in which the two strings differ.

For example, the distance between

Code: Select all: 100200 200020

is three, because they are different in 3 places.

Code: Select all: 100200 200020 ^ ^^

As you noticed, this has the effect that a clue that is "moved" will add 2 to the distance, while one that is changed will add 1 to the distance, but this is a consequence of the definition and not really a free choice.

Gordon

by **Mauricio** » Tue Aug 14, 2007 4:30 pm

gfroyle wrote:Hamming distance between two strings is defined to be the number of places in which the two strings differ.

For example, the distance between

Code: Select all
100200 200020

is three.

I see, but it depends how it is implemented. For example, if we take the above puzzle as the target puzzle, and if for each nonzero cell of the below puzzle we count:

0 points if the clue above it is the same,
1 point if the clue above it is not the same and nonzero,
2 points if te clue above it is zero,

the the count is three to, and it always gives the hamming distance (wrong, see edit).
If the hamming distance is implemented this way, then we could change the values.

Edit: I see, that only works if the puzzles have the same number of clues. Nevermind.

by **Pat** » Wed Aug 15, 2007 10:29 am

Mauricio wrote:It would be a good idea that one could assign different point schemes for the Hamming distance.

For example, the original Hamming distance gives 2 points for each clue in different position and 1 point for a clue in the same position and different value (am I right?),
but if we assign 10 points for each clue in different position and 1 for same position and different value, we encourage more similar patterns, and so on.

Could this be implemented?

as gfroyle pointed out, this would no longer be the Hamming distance.

gsf

Mauricio

by **Pat** » Wed Aug 15, 2007 10:44 am

JPF wrote:
Lars Petter Endresen wrote:
Code: Select all
20 20+1-1 19 18 18+1-1 17 n17 n17+2-2 #1 1914 34593 37256 5480 23363 207 2 2 #2 3701 65420 68885 9070 36202 193 8 13 #3 3191 55411 58159 7281 28596 190 5 5 #4 2822 49995 52729 6250 24252 182 6 7 #5 2687 46949 48740 5658 21236 107 8 12 #6 6531 114346 120800 14640 56054 416 12 26 #7 3006 51873 54749 7211 28964 209 4 6 #8 2665 47653 50587 6573 24758 150 2 3 #9 3543 63262 69287 8861 33489 169 6 8 #10 3360 60654 69596 9154 35519 312 5 6 Sum 33420 590156 630788 80178 312433 2135 58 88

I followed the technique 20 +1-1 +1-2 +1-2 +1-1 +1-2 +2-2 =17

In the table, n17 denotes the new Sudokus that was added to Gordon's list today

Can we say that you generated 2135 17s and that only 88 were "new".
The success rate is 88/2135=4%

my interpretation of Lars Petter Endresen's report, is that the experiment produced 58 + 88 = 146 new 17s;

but it is unclear to me if these 146 come from a total of 2135 17s
or are we perhaps missing a column for 17+2-2 puzzles produced

~ Pat

by **daj95376** » Wed Aug 15, 2007 12:52 pm

I interpreted the LPE table as ...

Code: Select all: From ten runs: 33420 (20s) generated => 590156 (20s) generated => 630788 (19s) generated => 80178 (18s) generated => 312443 (18s) generated => 2135 existing (17s) + 58 new (17s) generated => 88 existing (17s)

by **ronk** » Wed Aug 15, 2007 1:32 pm

OK, I'll try this guessing game too.

Starting with 33,420 20s Lars generated 2,135 17s of which 58 were new. Then a 2-off/2-on run on these 58 yielded an additional 30 new 17s ... for a total of 88.

by **JPF** » Wed Aug 15, 2007 2:32 pm

ronk wrote:OK, I'll try this guessing game too. Starting with 33,420 20s Lars generated 2,135 17s of which 58 were new. Then a 2-off/2-on run on these 58 yielded an additional 30 new 17s ... for a total of 88.

It was my interpretation

JPF

by **ronk** » Wed Aug 15, 2007 3:22 pm

JPF wrote:It was my interpretation

Sorry, I forgot you already reached that conclusion for your "success rate" post.

The New Sudoku Players' Forum

17-clue and 18-clue Sudoku update

re: Mauricio distance

re: Lars Petter Endresen's new 17s