rep'nA wrote:I'm not sure I see an inherent problem with that. If everybody uses the same normalization, then there isn't any problem at all. If we don't then it becomes a question of finding transformation laws that tell you how distances might change when you change the normalization.
we're seeing the difference between visual distance and normalized distance
given a canonical form for one puzzle a few mutable clues can be changed to generate a
family of puzzles that look related visually
however, if the normalization were reapplied to the generated puzzles that relationship may be lost
because although they may seem similar visually they may be very different normailized
we've seen this with the minlex canonical representation
somehow the clue pattern must come into play
so that computing a distance would be two stages
first map the one puzzle to match the clue pattern of the other
then map the clue values and row col while maintaining the common pattern
until the min number of differences is reached
there could be a different weight for mismatched clues (clue @ r,c in puzzle one, no clue @ r,c inpuzzle 2)
vs. different clue values at the same position in each puzzle
I haven't thought about the complexity of any of the steps
or even if this would be a true distance metric
the reason for the disparity between visual and normalized is that the normal forms
like minlex canonical do not take clue placement into account (no physical geometry)
geometric sudoku problem statements (like generating valid sudoku that match a pattern)
tend to introduce another level of complexity
so maybe looking for a normal form for distance is overkill
"all" that is needed is a map from one puzzle to another that minimizes (possibly weighted) differences
i.e., use one of the puzzles as the normal form