Hi, I'm new to the forum (and relatively new to sudoku!) but I have some ideas how to go about searching for 16's.
I think that a random search is probably not going to find it, as the large list of 17's show. I would guess that this indicates that 16's have special attributes (e.g. clues going in certain squares) that a random search is unlikely to run across.
So, I would propose using a computer program to gradually build the layout, rather than randomly trying grids and then attempting to solve them. Add the clues one at a time, at each step determining the clue that most reduces the solution space.
But, counting the number of boards containing a set of clues is computationally expensive! Second, it seems to not be a great indicator of how close the clues are to forming a complete puzzle, as eliminating one clue could lead to anywhere from 2 to 200000 solutions! So, I would suggest simply counting the possibilities for each square: there are only at most 729 possibilities that need to be considered. At each step, add a clue that eliminates the most possibilities. If the solver is fast enough, hopefully you can look ahead to determine which clues are best in the long run. You can also experiment with different ways of comparing numbers of possibilities, e.g. whether a square with 8 possibilities is worse, better, or equal to two squares with 4 possibilities each.
I threw together a quick program to do this, scoring each board by the sum of the number of possibilities for each square. At each stage I choose the clue that reduces the possibilities the most. However my solver is not fast enough. Although it will solve any board with a unique solution instantly, it seems to be very slow on boards with many possiblities (or no possibilities!). (All I programmed was basic elimination--it checks each row, column, block, and square for unique possibilities. Then it resorts to look-ahead.) Boards with few clues take about 1-5 seconds to process all possibilities, so it takes a long time to determine which clues lead to the fewest possibilities. But I think it shouldn't take more than a millisecond to determine whether a board is solvable, so even one second seems slow to check all possibilities for each square... Maybe someone with a fast solver would like to consider working on this method? Or has this method already been fruitless? After making a few boards I tend to get around 20 clues, with no 17's yet.
Anyway, I am hopeful that a 16 is found. Sudoku has so many variables that mathematically, I think a proof of nonexistence is unlikely for more than 12-14 clues, so we can only hope that 16's in fact exist
Hi Luke.
I'm also a newbie contributor, although I have been lurking for a while reading the collected wisdom here. I agree with your basic idea, and I've been working on something along those lines myself. A few comments:
Your method is a greedy algorithm (see Gordon's post). I think this would require a good deal of luck to succeed, although I agree that there are much better heuristics than the number of solutions, and yours looks promising. I don't think there is any "best" way to combine the scores from different cells; the most you can do is try different ways and use the one that experiment shows is best. My idea is to keep a collection of the best configurations (say those that score more than x% of the best score, where x is close to 100).
To do it my way, it's essential to prune the symmetric equivalents right from the word go. There are 9! * 6^8 * 2 = 1,218,998,108,160 ways of permuting a complete Sudoku grid without changing its underlying structure (reference: Gordon's web page), and many of these will be non-trivial permutations even for an incomplete grid. So you run the risk of generating the same problem over and over disguised in thousands or millions of permutations unless you do something to weed out the symmetries. Some recent posts here discuss how to recognise symmetric equivalence. Avoiding this is certainly one of the attractions of the 100% greedy approach.
A solver like you have described, that resorts to exhaustive recursive search after applying only single inferences (what you call basic elimination), will really struggle to perform on the more difficult problems. I wrote such a solver as my first attempt, choosing the clue with fewest alternatives for my guesses at each recursion, as the book says you should. The worst case performance was 50 times slower than I'm getting with my new solver. (My new solver can handle any inferences that lie in a single plane of the 3-D solution space, including box-row/column intersection, tuples, "fishy cycles": X-wings, swordfish, etc, and nishio, before resorting to recursion).
You can improve performance on some zero-solution problems by checking for inconsistencies as part of your singular inference processing. Any cell with no possibilities left, or any attempt to wipe out a possibility that is actually the setting for a clue cell, is a sure sign of inconsistency (or a solver bug!
). But there are some inconsistencies that will only become apparent at a later stage; this also is where sharper inference techniques are useful, but I conjecture that some inconsistencies may defeat these too and require recursion to detect them.
In general, the multiple solution problems should take less time than the really tough unique solutions. This is particularly true for small numbers of clues, and for a solver like yours that just gets on with it instead of intellectualising about the problem
. Are you aborting the search as soon as two solutions are found?
My solver, running in Python on a laptop, can solve really easy problems in about 25msec. The hardest ones take a couple of seconds, but there aren't many like that in Gordon's 17 clue collection of 450. A fast solver is desirable, but a good heuristic or three for pruning the search is even more important.
Jeff