David P Bird wrote:It is indeed surprising to me that your generation method firstly works, and secondly never gets trapped in an infinite loop.
Indeed I cannot prove that it always works but I never entcountered such a situation.
David P Bird wrote: Your generator works completely unintelligently, but your solver reduces the trial and error by applying basic solving methods first.
I am not clear what you mean by "solver" here. There is nothing intelligent in this part either. Starting from the solution grid I just remove a clue randomly and test if the puzzle still can be solved (uniquely). If this is the case another clue is removed etc. If the puzzle is minimal that is no further clue can be removed usually the result is a relatively simple puzzle which often is solvable with naked/hidden singles only even if the SAT-method is allowed during the clue elimination process. Indeed it might be worth to experiment with a more intelligent way where the clues which are removed depend on the number of candidates left for this cell.
blue wrote:How do you define the number of wrong numbers ?
For example: For c1, is it the number that match the value in (r,c1) ?
For the two columns c1 and c2 and boxes b1 and b2 I count for each number from 1..9 the number of occurences in this column or box. If a number does not occur exactly once the error count is incremented. An error of 1 obviously can never occur with this definition.
blue wrote:If (r,c1) and (r,c2) are in the same "block"/box ... then what ? ... count twice ?
Good question! I never thought about this but in my current implementation indeed I count twice! I will see if there is any change in the behaviour if I only count once in this case.
blue wrote:From what I've seen, generators that actually do produce a uniform distribution, often use strange and unexpected methods.
Are you basing the claim that I underlined, on a mathematical proof ?
You are right, I cannot claim this because I cannot prove it. When we reverse the process and take a valid grid and do lets say 10 random row cell swaps - I just cannot imagine that the error count distribution depends on the grid we start with.