ravel wrote:When comparing these 38's

- Code: Select all
`.........9..26...44.6.8...3....5..1.14..985.25.9.124382.5..6.416.4.213.531.5....6`

.7.......9.1.87.4.4.6.91.871......2..4.9.2.5.5...1.4..2.5.7...4614.29.7578.15.26.

i got a distance of 30

I think that similarity is a more interesting measure than distance. Now wait! That's not just a "subtract from 81"-type comment. The two measures are quite different creatures (perhaps this has already been mentioned). Let ...

E = nr cells that are empty in both puzzles

S = nr occupied cells that are the same in both puzzles

D = nr occupied cells that are different in both puzzles

T = nr cells that have different type (occupied/empty) in the two puzzles

... so

E +

S +

D +

T = 81. You've been working with

distance = D + T. Now I can see that that's interesting from an edit-distance / neighbourhood tour point of view, but I say that you should also be interested in the measure

similarity = S.

Here's why. We can regard a puzzle as a graph with each vertex = a value in a cell and each edge = relationship between them (e.g. same value, same band; or different value, same row, different stack). Viewed that way, two puzzles are isomorphic iff their graphs are equal (modulo labelling) -- in fact this is how Gordon said that he checks whether or not a 17 is new. If two puzzles are not isomorphic then we should be interested in the size of the largest subgraph (= sub-puzzle) in common to both. That's "similarity" as defined above.

For example, comparing ravel's two 38s:

- Code: Select all
` Target: .........9..26...44.6.8...3....5..1.14..985.25.9.124382.5..6.416.4.213.531.5....6`

Permed (new): .3...5..99..21..3.1.......5.......6.3...685.28.6.524.3..3..69.16.9.2135441.5.9.86

distance = 30 ***

similarity = 16

Target: .........9..26...44.6.8...3....5..1.14..985.25.9.124382.5..6.416.4.213.531.5....6

Permed (new): ....6..9.94.2...8...6.8.2...6835.91...5.98..24.9612.585.3.46.2.6.4.21.35...5.....

distance = 31

similarity = 20 ***

The distance measure in this case says that you can get from one puzzle to (an isomorph of) the other by editing 30 cells; whereas the similarity measure says that, up to (a different) isomorphism, the largest sub-puzzle in common has 20 clues.

Both measures are interesting: we shouldn't just limit ourselves to "distance".

Now a challenge:

- The biggest distance I've seen between a pair of 17s is 16.
- The smallest similarity I've seen between a pair of 17s is 8.

Can anyone do better? (I only checked a smallish random sample.)

~~~

PS: regarding this ...

gsf wrote:I'm hoping there's a better way than running p! combinations on p matching positions,

but I don't see it right now

There is a

much better way: it's called the Hungarian (or Munkres')

Assignment Algorithm.