by Smythe Dakota » Mon Feb 12, 2007 5:02 am
Here's a way to represent a Sudoku grid (including candidates) in about 60 bytes, on the average.
First, form a 9-digit number as follows: The first digit is the number of 1's among the givens, the next digit is the number of 2's, etc. For example, the 9-digit number 431232243 would mean that there are four 1's, three 2's, one 3, etc. among the givens.
A nine-digit number can be represented in 30 bits. Call this number the "header", and start your string with these 30 bits.
Next, we deal with the location of the givens. The first given 1 can be in any of the 81 cells, so it takes 7 bits to specify the location. The next 1 can be in any of 64 cells (delete the row and column the first 1 is in, never mind about boxes), so it takes 6 bits to specify this location. The locations of the remaining 1's can be specified in 6 bits (49 possible cells), 6 bits (36), 5 bits (25), 4 bits (16), 4 bits (9), 2 bits (4), and 1 bit (1). In our example, there are only four 1's, so all we need is 7+6+6+6, or 25 bits, to specify the locations of all the given 1's. Similarly, we then need only 19 bits to specify the locations of the three 2's, 7 bits for the one 3, etc. In our example, with a header of 431232243, it adds up to only 153 bits for all the givens.
Next come the candidates. These need be specified only for the remaining (empty) cells, and the scheme above knows which cells those are.
Theoretically, it would take 9 bits per cell to specify whether each digit (1 to 9) is a candidate in that cell. In practice, however, we can skip the "automatic non-candidates". By this I mean that, for example, the digit 4 is an automatic non-candidate for r5c6 if there is a 4 anywhere in either r5 and/or c6 and/or the center box. By now, which digits are automatic non-candidates is known to whatever goofy software has been devised to grok this scheme, so many bits can be saved. If, for example, the digits 4, 7, 9 are automatic non-candidates for the first empty cell, then the candidates for this cell can be represented in just 6 bits (one each for the digits 1, 2, 3, 5, 6, 8) instead of 9. Typically, an initial-position Sudoku will average about 4 automatic non-candidates per empty cell, so that only 5 bits will be required for each of the 60 or so empty cells. This comes out to 300 bits.
So we have a complete representation in 30+153+300, or 483 bits, which is 61 bytes.
This scheme will even work for illegal (multi-solution) grids, though the length of the string may be greater. A completely blank grid, for example, would require 759 bits, or 95 bytes.
Bill Smythe