The New Sudoku Players' Forum

by **gsf** » Fri Apr 27, 2007 3:21 pm

rep'nA wrote:I'm not sure I see an inherent problem with that. If everybody uses the same normalization, then there isn't any problem at all. If we don't then it becomes a question of finding transformation laws that tell you how distances might change when you change the normalization.

we're seeing the difference between visual distance and normalized distance
given a canonical form for one puzzle a few mutable clues can be changed to generate a
family of puzzles that look related visually
however, if the normalization were reapplied to the generated puzzles that relationship may be lost
because although they may seem similar visually they may be very different normailized
we've seen this with the minlex canonical representation

somehow the clue pattern must come into play
so that computing a distance would be two stages
first map the one puzzle to match the clue pattern of the other
then map the clue values and row col while maintaining the common pattern
until the min number of differences is reached
there could be a different weight for mismatched clues (clue @ r,c in puzzle one, no clue @ r,c inpuzzle 2)
vs. different clue values at the same position in each puzzle

I haven't thought about the complexity of any of the steps
or even if this would be a true distance metric

the reason for the disparity between visual and normalized is that the normal forms
like minlex canonical do not take clue placement into account (no physical geometry)
geometric sudoku problem statements (like generating valid sudoku that match a pattern)
tend to introduce another level of complexity

so maybe looking for a normal form for distance is overkill
"all" that is needed is a map from one puzzle to another that minimizes (possibly weighted) differences
i.e., use one of the puzzles as the normal form

by **tarek** » Fri Apr 27, 2007 5:34 pm

My thoughts exactly gsf,

My simple way to measure distance between ANY two puzzles:

1. Select one of the puzzles as Reference.
2. Permute the second to match the 1st (by position & digit) scanning all possible permutations you can get a percantage of matching clues & select the highest....if several haighest percentages exist then select min-lex as the destination puzzle.
3. each puzzle consists of 81 digits [each digit can be 0-9]
4. each non matching clue is away by one unit of distance
5. Sum everything & you get the distance.

tarek

by **coloin** » Fri Apr 27, 2007 6:13 pm

Thankyou tarek

coloin wrote:Take the first pair of random 27 clue minimal puzzles

puzzle1# 5..2....4.4....972..7...5.......2..34.135.86....6.....1.3.4..2.9....5..88.....7..
puzzle2# .7.6..4.....7....2..3.9817..16.7.5.3..5.......8.56..1.6...542.88..2..............

With clue substitution and row swapping and a 90 rotation I was able to make an isomorph of puzzle 1#

Code: Select all: .x...............x....x.x....x..................x..........x..xx................. .7.6..4.....7....2..3.9817..16.7.5.3..5.......8.56..1.6...542.88..2..............#Puzzle2 973.2..6.........24...9.1....6..8...2...6.......54..8...7..4.288...5...1.1..7.6.3#Puzzle1'

27 clue puzzle
9 common clues
2 incorrect clues
16+16 mismatched clues
38 common empty cells

I dont think this degree of conformaton will be unususal between two random puzzles - there possibly are transformations which match up even better than this [manual} demonstration.

I await a better match between these two random puzzles, [using puzzle#2 as a reference] !

C

by **ravel** » Sat Apr 28, 2007 3:20 pm

These equivalents are a bit better

Code: Select all: 1234.....4...5.1...5...3...6...2..5..3..6......1.78..48......1..69.478.5......9.. puzzle 1' 12.64..3.4.........5...2.146...2357..3..8...1...9....3..5.....9..159.84.......... puzzle 2'

15 common cells
8 common clues

But i dont have an idea, how to calculate an optimum in reasonable time.

[Added:] Another pair:

Code: Select all: 1234.....4...5.1...5...3...6...2..5..3..6......1.78..48......1..69.478.5......9.. 92.48.3..8.........5...298....1...3..3..6..9.4...237.5..5....1...951.8.6.........

17 common cells
8 common clues

So we have to decide, if we want to maximise the number of common clues or common cells. Though Coloins pair only has 11 common cells, 9 common clues are still the best.
What i had in mind was to minimize a distance function a + 2b, where a is the number of common cells with different numbers and b the number of cells in one puzzle that have no given in the other (for 2 puzzles with the same number of clues), because here you have to remove and add a clue to get to the other.
So for our samples we would have a distance of 2+2*16, 7+2*12 and 9+2*10.

by **tarek** » Sun Apr 29, 2007 9:45 pm

Don't forget to match also the empty spaces(the 0s)...not just the clues (1-9)...so the configuration does really count when permuting one to match the other..........

There must be a way to formalise it so that it would be reproducable by everyone...

one way to do it is to have a score of 81.......a score of 0 means isomorphic puzzle.......that score would count as"one way to measure distance"

Not sure what the maximum possible would be though..........

tarek

by **coloin** » Mon Apr 30, 2007 10:09 am

Indeed, ravel's demonstration is much better generally.

tarek's comment about the empty clues demonstrates this

Code: Select all: ravel's two puzzles each puzzle has 27 clues 8 common clues 9 matched but incorrect clues 10+10 unmatched clues 44 common empty cells

I think all I really wanted to know was how "close" two random puzzles can be ................

Almost certainly there exists an improved match.

tarek wrote:Not sure what the maximum possible would be though..........

I suspect that you will always be able to match up at least 4 common clues [if not more] between two random puzzles. More commonly it will be 7.....
If this theory helps......?
here are the original the two puzzles

Code: Select all: puzzle1# 5..2....4.4....972..7...5.......2..34.135.86....6.....1.3.4..2.9....5..88.....7.. puzzle2# .7.6..4.....7....2..3.9817..16.7.5.3..5.......8.56..1.6...542.88..2..............

When I looked at them both puzzles had a maximum RS pattern of 7/9 [very much the norm]

Here are the possible 7/9 RS in puzzle #1 - that I could squint

Code: Select all: 5..2....4.4....972..7...5.......2..34.135.86....6.....1.3.4..2.9....5..88.....7.. 8/9 none found 7/9 Type ...............9....7..............3............6............2......5...8........ A ..........4.............5..........3..1.........6............2.9................. B ..........4.............5..........3..1.........6............2..........8........ B ..........4........................3..1.........6............2......5...8........ B ...............9...................3..1.........6............2......5...8........ B

The 7/9 patterns in puzzle 2#

Code: Select all: .7.6..4.....7....2..3.9817..16.7.5.3..5.......8.56..1.6...542.88..2.............. 8/9 none 7/9 Type .7...............2....9......6......................1......4...8.................A .7...............2....9......6..................5..........4...8.................B .7...............2....9...............5.............1......4...8.................A

The clues in equivalent type patterns are equivalent in both puzzles !

Incidently what is the distribution of these patterns in a range of puzzles ???

I am surprized that some puzzles dont have a 7/9 and even more surprized when I made a [minimal] puzzle with these these double 9/9s

Code: Select all: 1...........2...........3...4...........5...........6...7...........8...........9 ........8.9...........4..........7....6...........3......1............5.2........ 15......8.9.2.........463...4....7....6.5.........3.6...716.........8.5.2.......9

It would be so simple if puzzles only had one maximum RS template !

Puzzles with single 9/9, single or double 8/9 would be easy to sort

A single 7/9 type A or B would also be easy but in many puzzles there will be a mriad of the 7/9s.

C

by **ravel** » Mon Apr 30, 2007 11:00 am

After giving it a second thought it should not be too bad to calculate a minimum distance for 2 puzzles.
From my manual tries i suppose that a function, that relabels the numbers of a fixed puzzle such that it matches a maximum number of givens in another puzzle, can be very fast.
So we just could go through the 3 mio. (2*6^8) possible equivalent patterns of one puzzle and calculate it.

by **coloin** » Mon Apr 30, 2007 11:34 am

Yes, that would be stage one - optimally matching the patterns.
......stage two matching the clue numbers..... iterate the 9! on the best ones.

C

by **gsf** » Mon Apr 30, 2007 1:30 pm

coloin wrote:When I looked at them both puzzles had a maximum RS pattern of 7/9 [very much the norm]

what is an R S pattern?

by **coloin** » Mon Apr 30, 2007 11:14 pm

RS pattern...........Every disjoint clue is in a non-attacking rook pattern.
I coined it in an advance in the Queen sudoku thread [as in non-attacking queen in chess terms]

Code: Select all: +---+---+---+ |1..|...|...| |...|2..|...| |...|...|3..| +---+---+---+ |.4.|...|...| |...|.5.|...| |...|...|.6.| +---+---+---+ |..7|...|...| |...|..8|...| |...|...|..9| +---+---+---+

I think all grids have this pattern equivalent , possibly there are around 200 per grid....[av RS per grid ~200 - Red Ed]

If you take your average minimal puzzle.....the most you tend to find is a partial template of 7 [equivalent] clues.

Code: Select all: +---+---+---+ |1..|...|...| |...|2..|...| |...|...|3..| +---+---+---+ |.4.|...|...| |...|.5.|...| |...|...|...| +---+---+---+ |..6|...|...| |...|...|...| |...|...|..7| +---+---+---+type A +---+---+---+ |1..|...|...| |...|2..|...| |...|...|3..| +---+---+---+ |.4.|...|...| |...|.5.|...| |...|...|...| +---+---+---+ |..6|...|...| |...|..7|...| |...|...|...| +---+---+---+type B

A puzzle with several 7/9 is the norm - see the two 27 clue puzzles.

It is trivially possible to make puzzles with max RS template of 5/9 and 6/9. if the subgrid only has 5 or 6 boxes with clues in. The double 9/9 did surprize me.

Using the fact that the clues in a RS template are equivalent it is possible to map 7 common clues straight off.

C

by **ravel** » Tue May 01, 2007 3:13 pm

The maximum of common cells for Coloins sample is 18, this one has 11 common clues (but i dont know, if this is the best, that can be achieved):

Code: Select all: 5..2....4.4....972..7...5.......2..34.135.86....6.....1.3.4..2.9....5..88.....7.. puzzle 1 9..23..84.4...1.7......46....2..6.....175.26..........18..4.95..2..15..85........ puzzle 2'

by **coloin** » Tue May 01, 2007 4:05 pm

That is a very fair achievement- much better concordance than I would ever have thought !
I take it you found the maximum overlap of the pattern and then found the best clue match.......

If as tarek suggests the distance between identical puzzles is 0, then we can weight the non conforming clues
[Edit - as outlined above]

27 clue puzzle
11 correct clues [0]
7 incorrect clues but right position [* 1]
9 unmatched clues [* 2]
45 matched spaces [0]
[A puzzle with the same pattern has no unmatched clues, so will have a lower overall score - ie closer]

Distance score for ravel's morphed puzzle= 25

This has been an interesting excercise - no furthur on in the cluster analysis though.......................

C

by **ravel** » Tue May 01, 2007 9:26 pm

I have extended the program now also to find the puzzles with most common clues (and a minimal distance a+2b as described above). I do it by going through all equivalent patterns of the second puzzle and then (if there are more common cells than the best common clues found so far), i calculate the max. common clues with number changes.
It took about 30 secs (1.5 GHz).
I still have to work at the output and hopefully dont have serious bugs.

I could not find one with better distance (24), but this one has 13 common clues:

Code: Select all: .........26...397...7..9....36.9...5.4.35.86....6.......3.4..2.91.8...54..4...7..

[Added:]
To summarize, for this example we get different (equivalent) puzzles, when we maximize the number of common clues or common cells or minimize a distance function, here e.g. (#common cells - # common clues) + 2(#clues - #common cells):

Code: Select all: clues/cells/distance 5..2....4.4....972..7...5.......2..34.135.86....6.....1.3.4..2.9....5..88.....7.. puzzle 1 .7.6..4.....7....2..3.9817..16.7.5.3..5.......8.56..1.6...542.88..2.............. puzzle 2 .........26...397...7..9....36.9...5.4.35.86....6.......3.4..2.91.8...54..4...7.. 13/15/26 7..5....23..2..96...9...4....3..2.144.13..8.....6..5..9.5....4....1.5...13....7.. 9/20/25 2..8....1.4.5.297.....3...2..9.....3..139.86..........51.64..2.9....5.16.6....... 12/18/24

It is also interesting to compare some puzzles in my hardest list (for those with 21 clues the pogram only needs some seconds). E.g. these

Code: Select all: .....1.2.3...4.5.....6....7..2.....1.8..9..3.4.....8..5....2....9..3.4....67..... Ocean for RW ..3.....94......2..8.6..1..2....4....9.8....7..5.3.......9..8.......5.3..7..1...6 dml 1/07 .....1.2.9...8.5.....6....7..2.....14...9..3..5....8..8....2....3..5.4....67..... 14/19/9 ..3.....94......2..8.6..1..2....4....9.8....7..5.3.......9..8.......5.3..7..1...6 dml 1/07 1...5..8......9..3...2..4....4...9...3......78..6...5...28...6.5...1.....7...4..0 dml20 ..3.....14......2..8.6..9..2....4....9.8....7..5.3.......9..6.......5.3..7..1...8 17/21/4 .....1.2.3...4.5.....6....7..2.....1.8..9..3.4.....8..5....2....9..3.4....67..... Ocean for RW 1...5...9..6...2...8.....4...76.....5...1.3...4...8...3..9....5....2.1.......7.6. dml 7/07 .....7.6.3...5.1.....4....8..6.....7.5..3..9.1.....2..2....6....9..1.5....48..... 15/21/6

So they are not very different, though you would not see that immediately.

by **coloin** » Fri May 04, 2007 3:58 pm

Interesting on those hard puzzles - a search for the 17 plus 4 clues could be done -but probably not reveal any harder puzzles.

On a similar line .....another interesting exercise[s].......which might reveal another surprising result ?

Get two random 27 clue minimal puzzles......except there are 3 clues in all the boxes........How close are these [a highish SE for a random generated puzzle note ] puzzles ?

Code: Select all: .4......36.7.12..5.....41...8..96.7..2.3...1.9......3...297.....7..6.25.1.....6.. SE 6.7 47.3...5......48....9.5..2.7....5..1.9..6..74.1..7.....65..2....4..9..62....4...8 SE 9.0

and if we wanted to find more 17 clue puzzles finding the common clues within these patterns
here
This is from the "clues in the box distribution" of Gordons 17s...

Code: Select all: 1405 122222222:B

Some of the 1405 puzzles with this pattern might give you a high concordence ?

C

by **ravel** » Mon May 07, 2007 9:45 am

This is the result for the 2 puzzles.

Code: Select all: .4......36.7.12..5.....41...8..96.7..2.3...1.9......3...297.....7..6.25.1.....6.. puzzle 1 47.3...5......48....9.5..2.7....5..1.9..6..74.1..7.....65..2....4..9..62....4...8 puzzle 2 ...7....36...1.5...97..41...8..9......91...8..6..2.97.....7..3..7..6.25.12.5..... 13/16/26 .1......3.9..12..57....31...5..96..2.2...8...6......79..4.7.2.1.7..6..5.2..8..... 12/19/23

In the moment i am very short of time, but i hope, i can post some statistics until next week.

The New Sudoku Players' Forum

Sudoku space