The New Sudoku Players' Forum

by **gsf** » Thu Aug 09, 2007 3:12 am

ronk wrote:
gsf wrote:
ronk wrote:
gsf wrote:using two different algorithms I just confirmed that G, |G|=41347, is closed under {-5+5}

So unknown 17s have Hamming distance d > 10 from all members of (code) set G ...

... if/when we are able to compute the {-5+5} closure of G

Color me confused, as I thought that's what your first statement above said.

aha
I had to rewrite a few times before sending -- and your confusion confirms I didn't get it quite right
there's too many closures in scope

the computation I did was to take G (the known 17s), do a {-5} (5-off) to derive all 12 clue subgrids,
and then did a closure based on the subgrids using a connected components and disjoint set union algorithms
this particular closure does not generate new 17sbut instead determines paths from any known 17 to any other known 17 via the 12s
this closure yielded one equivalence class
which means that all of the known 17s are in the {-5+5} closure
I think it was coloin that mentioned the possibility of this computation a few posts back

its neat that we can assert this without computing the entire {-5+5} closure

so my previous reply was based on this interpretation of unknown:
since we haven't confirmed the {-5+5} closure of G, some unknowns could be in that closure
there could also be other unknowns outside the closure or the {-5+5} closure could contain all 17s

by **gfroyle** » Thu Aug 09, 2007 6:09 am

gsf wrote:I had to rewrite a few times before sending -- and your confusion confirms I didn't get it quite right
there's too many closures in scope

Indeed...

Basically the class is CLOSED if it contains *all* 17s that can be obtained from any of the others by a "move" (in your case {-5+5} but the concept is general).

The class is CONNECTED if for any two 17s you can move from one to the other via an entire *sequence* of moves using other 17s as intermediate steps.

What we know from Glenn's work is that this graph is connected, even though we may not know the entire closure... basically at the moment we can view our existing collection of 17s as the "visible" portion of a possibly much larger "submerged" graph with new vertices waiting to be found. Doing a full {-5+5} on any vertex (impossible I know) is equivalent to completely hauling THAT VERTEX and all of its neighbours out of the water, thus possibly revealing new vertices.

What *I* want to know is more detail about the structure of the graph on the 17s... what are the degrees of the vertices (i.e. the number of neighbours that each puzzle/vertex has), which puzzles have the highest degree etc. (Could it even be that this graph is a scale-free graph in the Barabasi et al sense?)

I can see that at some stage I will have to compute this darn graph myself and just confirm your numbers....

Cheers

Gordon

by **ravel** » Thu Aug 09, 2007 8:27 am

gsf wrote:
Code: Select all
distance 20 .......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6... ...2...1.4...3.6...............5.3....1....7.6...4....5.....4...8.7........1...2.

Can you explain ? I thought it would be 15:

Code: Select all: .......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6... ....7...54..2..3...............5...78.....4....1.9....3..4......5......9...8..2..

by **gfroyle** » Thu Aug 09, 2007 9:10 am

ravel wrote:Can you explain ? I thought it would be 15:
Code: Select all
.......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6... ....7...54..2..3...............5...78.....4....1.9....3..4......5......9...8..2..

Ravel is correct.. the Hamming distance is (at most) 15...

If I recall correctly, when gsf posted his original description of his Hamming distance implementation it was a two-step process of choosing a n equivalent puzzle in such a way as to:

- maximise overlap of positions, then
- maximise similarity of symbols in overlapping positions

What Ravel's example shows us is that a smaller overall distance can be obtained WITHOUT MAXIMUM OVERLAP provided you can make up for it with greater symbol similarity... noticed that the first representative overlaps in 11 positions, while the 2nd overlaps in only 10, but almost all of those 10 are "same symbol".

Great example...

Cheers

Gordon

by **gsf** » Thu Aug 09, 2007 2:04 pm

gfroyle wrote:What *I* want to know is more detail about the structure of the graph on the 17s... what are the degrees of the vertices (i.e. the number of neighbours that each puzzle/vertex has), which puzzles have the highest degree etc. (Could it even be that this graph is a scale-free graph in the Barabasi et al sense?)

thanks for the closure clarifications

could you provide a ref with a description of scale-free graph
there's a bunch of refs to scale-free, and its not clear which of those actually defines it

I can see that at some stage I will have to compute this darn graph myself and just confirm your numbers....

you can use my solver as a starting point
for the {-5} (5-off) initial data:

Code: Select all: sudoku -goce{-5} -Fc'%05#in %#kc' g.dat > g-5-1

where g.dat contains the 17s
g-5-1 contains lines with two fields <g-ordinal> <12-clue-minlex-subgrid>
where 12=17-5
this takes ~1h45 min to generate 3.2Ghz

g-5-1 may contain dups (typically too much data for my solver to uniq in core), so

Code: Select all: sort -u g-5-1 > g-5-2

I do one more pass on the data to convert the <12-clue-minlex-subgrid> fields to ordinals
and then use the connected-components or disjoint-set-union algs
g-5-2 is 1.5Gi, a bit to large to post

by **gsf** » Thu Aug 09, 2007 2:08 pm

gfroyle wrote:
ravel wrote:Can you explain ? I thought it would be 15:
Code: Select all
.......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6... ....7...54..2..3...............5...78.....4....1.9....3..4......5......9...8..2..

Ravel is correct.. the Hamming distance is (at most) 15...

If I recall correctly, when gsf posted his original description of his Hamming distance implementation it was a two-step process of choosing a n equivalent puzzle in such a way as to:

- maximise overlap of positions, then
- maximise similarity of symbols in overlapping positions

you recall better than I do -- any of my code >6months old looks like someone else's
(why did they do it that way --- oops, they was me)
the code, or at least the docs, will be corrected
thanks

by **gfroyle** » Fri Aug 10, 2007 12:36 am

gsf wrote:you recall better than I do -- any of my code >6months old looks like someone else's

All that I remembered was a vague comment - when I searched for it, I found the following from a post you made on 25 May.

gsf wrote:also added a hamming-style distance based on the sudoku space thread discussions
the distance is biased towards minimizing clue position differences first, then minimizing cell value differences

Cheers

g

by **gsf** » Fri Aug 10, 2007 2:11 am

gfroyle wrote:
All that I remembered was a vague comment - when I searched for it, I found the following from a post you made on 25 May.

gsf wrote:also added a hamming-style distance based on the sudoku space thread discussions
the distance is biased towards minimizing clue position differences first, then minimizing cell value differences

after looking at the code again I know why I did it that way
matching positions is much easier than matching clue values
I'm hoping there's a better way than running p! combinations on p matching positions,
but I don't see it right now

by **ravel** » Fri Aug 10, 2007 8:41 am

My program needs about the double time than yours.
I go through all (2*6^8) equivalent patterns of one puzzle and - if the number of common cells is small enough for allowing a smaller distance than the best one so far - i try to map as much numbers as possible.

For your other sample i get distance 15:

Code: Select all: ......6.4....81.........7...5.6..8...4...7......2.....2.7....1.1...3.......5..... ......7.4.6..8.....3.9......516...9..4....2...........2.7....1.............5.4...

by **gsf** » Fri Aug 10, 2007 9:52 am

ravel wrote:My program needs about the double time than yours.
I go through all (2*6^8) equivalent patterns of one puzzle and - if the number of common cells is small enough for allowing a smaller distance than the best one so far - i try to map as much numbers as possible.

right, the first part looks like a canonicalization loop
but what does i try to map as much numbers as possible look like?
if there are p common positions between the two, how many clue assignment combinations must be tested?

by **ravel** » Fri Aug 10, 2007 12:18 pm

I only try placements, that definitely increase the number of common clues. In the above sample this were between [edit:] 208 and 957 single number placements leading to 96-416 mappings (with posssibly less than 9 numbers, then the others can be filled in arbitrarily), when i checked 518 puzzles with 9-11 common cells.

Of course this is much slower for high clue puzzles. When comparing these 38's

Code: Select all: .........9..26...44.6.8...3....5..1.14..985.25.9.124382.5..6.416.4.213.531.5....6 .7.......9.1.87.4.4.6.91.871......2..4.9.2.5.5...1.4..2.5.7...4614.29.7578.15.26.

i got a distance of 30:

Code: Select all: .3...5..99..21..3.1.......5.......6.3...685.28.6.524.3..3..69.16.9.2135441.5.9.86

after checking 67162 puzzles with between 102 and 14261 single placements and 0 to 7342 possible mappings then.

by **gsf** » Fri Aug 10, 2007 1:36 pm

thanks for the details

by **gsf** » Fri Aug 10, 2007 6:39 pm

I tweaked my sudoku17 query script to include per-discoverer counts and rates based on the
time of first and last post
the overall rate since #40095 is 1 new entry every 1h22m

Code: Select all: Håvard Graff 354 1h57m Gordon Royle 339 5h14m JPF 161 5h21m gsf 132 3h49m Kohei Noshita 83 17h38m Lars Petter Endresen 67 3h56m coloin 46 14h05m Kevin Burley 44 10h21m Jim Wilson 29 14h53m RW 17 9h52m Glenn Fowler 11 4d21h Anonymous 9 1w00d Mauricio Chacón 8 1d18h Colin Colville 4 1d18h 3 1w06d ravel 1 - jim wilson 1 - Helene Foliguet 1 - total 1310 1h22m

by **gfroyle** » Sat Aug 11, 2007 11:58 am

Scale-free just means that the distribution of vertex degrees roughly follows a "power law" distribution..

So, lots of vertices of degree 1, fewer of degree 2, and a teeny number of high degree vertices.

Basically try to fit a function of the form

N(k)= A k^-B

to describe the number of vertices of degree k.

by **Red Ed** » Sat Aug 11, 2007 5:30 pm

ravel wrote:When comparing these 38's
Code: Select all
.........9..26...44.6.8...3....5..1.14..985.25.9.124382.5..6.416.4.213.531.5....6 .7.......9.1.87.4.4.6.91.871......2..4.9.2.5.5...1.4..2.5.7...4614.29.7578.15.26.
i got a distance of 30

I think that similarity is a more interesting measure than distance. Now wait! That's not just a "subtract from 81"-type comment. The two measures are quite different creatures (perhaps this has already been mentioned). Let ...

E

e

S

s

D

d

T

t

... so E + S + D + T = 81. You've been working with distance = D + T. Now I can see that that's interesting from an edit-distance / neighbourhood tour point of view, but I say that you should also be interested in the measure similarity = S.

Here's why. We can regard a puzzle as a graph with each vertex = a value in a cell and each edge = relationship between them (e.g. same value, same band; or different value, same row, different stack). Viewed that way, two puzzles are isomorphic iff their graphs are equal (modulo labelling) -- in fact this is how Gordon said that he checks whether or not a 17 is new. If two puzzles are not isomorphic then we should be interested in the size of the largest subgraph (= sub-puzzle) in common to both. That's "similarity" as defined above.

For example, comparing ravel's two 38s:

Code: Select all: Target: .........9..26...44.6.8...3....5..1.14..985.25.9.124382.5..6.416.4.213.531.5....6 Permed (new): .3...5..99..21..3.1.......5.......6.3...685.28.6.524.3..3..69.16.9.2135441.5.9.86 distance = 30 *** similarity = 16 Target: .........9..26...44.6.8...3....5..1.14..985.25.9.124382.5..6.416.4.213.531.5....6 Permed (new): ....6..9.94.2...8...6.8.2...6835.91...5.98..24.9612.585.3.46.2.6.4.21.35...5..... distance = 31 similarity = 20 ***

The distance measure in this case says that you can get from one puzzle to (an isomorph of) the other by editing 30 cells; whereas the similarity measure says that, up to (a different) isomorphism, the largest sub-puzzle in common has 20 clues.

Both measures are interesting: we shouldn't just limit ourselves to "distance".

:!:

Now a challenge:

The biggest distance I've seen between a pair of 17s is 16.
The smallest similarity I've seen between a pair of 17s is 8.

Can anyone do better? (I only checked a smallish random sample.)

~~~

PS: regarding this ...

gsf wrote:I'm hoping there's a better way than running p! combinations on p matching positions,
but I don't see it right now

There is a much better way: it's called the Hungarian (or Munkres') Assignment Algorithm.

The New Sudoku Players' Forum

17-clue and 18-clue Sudoku update