Unbiased Grid Generation using Band SignaturesAfter a period of inactivity, I recently learned of the big system crash and was disappointed to find that all of my posts on the above topic were lost. This disaster, however, gives me an opportunity to present the topic in a more cohesive fashion.
Ed Russell was kind enough to test my method using his "Z-score" set of statistical diagnostics. I sent Ed the source code and a very big pre-computed data file. He generated 10,000,000 grids for analysis. The
challenge of this thread was fulfilled when Ed declared the results to be
unbiased. Regretably, these results were lost in the crash.
The `C' routines generate about
43,000 grids per second on my home computer (Intel Core 2 Quad CPU Q8200 @ 2.33GHz; 64-bit OS). A large data file (115,934 KB) is also needed.
Band-1 GenerationBand-1 is chosen from a precomputed set of 416 class representatives, where row-1 is fixed to be 1,2,3,4,5,6,7,8,9. Each representative (together with its "spins") covers a non-overlapping portion of the space of all possible bands. The class choices are not equally probable, but are directly proportional to the number of grid completions. See ideas presented in
"Enumerating possible Sudoku grids" by Felgenhauer and Jarvis. Also, see
"Summary of method and results."- Code: Select all
1 2 3 4 5 6 7 8 9 Sample band-1 class representative.
6 7 8 3 9 1 5 2 4
9 5 4 2 8 7 3 6 1
Row Order RestrictionsTo simplify generation, some row order restrictions are enforced. Let R1, ..., R9 represent the first digit in each of the nine rows, respectively. The restrictions: R1 < R2 < R3, R4 < R5 < R6, R7 < R8 < R9, and R1 < R4 < R7.
For the sample band-1, this is means that R4-R5-R6 must be one of the following ten choices: 234, 235, 237, 238, 245, 247, 248, 257, 258, 278.
Band Signature DefinedBand-2 is generated based upon band "signatures" and the number of grid completions for each signature. The "signature" of a band is a 9-long array of digits which appear in each column. For convenience, the three digits in each column are concatenated in ascending order and listed as a single quantity. The following is one of the many band-2 choices consistent with the above band-1 representative:
- Code: Select all
2 6 9 5 4 3 8 1 7 Sample band-2.
3 4 7 8 1 2 6 9 5
8 1 5 6 7 9 4 3 2
Band-1 signature = { 169, 257, 348, 234, 589, 167, 357, 268, 149 }
Band-2 signature = { 238, 146, 579, 568, 147, 239, 468, 139, 257 }
Band-3 signature = { 457, 389, 126, 179, 236, 458, 129, 457, 368 } (implied)
Band Signatures and Grid CompletionsFor every band-1 choice, there are exactly 87808 band-2 signatures which follow the row order restrictions. File "Signature.bin" holds all 36,528,128 (= 416 * 87808) signatures, stored in
compressed form with 26-bits per signature, for a total of 118,716,416 bytes.
Let's count the number of grid completions for all band-1/signature-2 combinations. For a particular combination, we know the implied band-3 signature and must count the number ways the puzzle can be finished which satisfy the row ordering restrictions. Some of you may already know that the number of completions is limited to sixteen values: 16, 20, 24, 26, 28, 30, 32, 36, 38, 44, 46, 48, 86, 96, 144 and 288.
Before storage, the signatures are sorted by class number (1-416) and number of completions, making subsequent probability calculations and look-up much simpler.
Band-2 and Band-3 GenerationGiven our band-1 representative, according to a weighted distribution (grid completions), choose one of the 87808 band-2 signatures. The number of permutation triples for this signature is already known, having been previously computed and stored. Suppose you randomly choose the i'th one; now, begin constructing fills and stop when the i'th fill is reached. This is the most time-consuming routine.
Compute the band-3 signature and follow the same procedure. (Actually, band-2 and band-3 construction could be done in parallel -- if available.)
Spin and Re-labelThe final step is to randomly interchange rows within bands, columns within chutes, row bands and column chutes. Digits are also randomly re-labeled.
This is the short-version of how it works. There were a number of interesting discoveries as the algortihm developed. More later.