I just discovered these fantastic puzzles last week, they are truly amazing. I have a doctorate in Communications Theory, and the similarity to erasure correcting codes is striking. I was floored by how few clues are required to solve the puzzles, due to the high level of mutual information encoded in the permuations. In particular, it struck me that that the hardest puzzles probably approach the Singleton bound (more on this in a minute).

This afternoon I started analyzing the mathematics, in particular, trying to figure out how to create the puzzles, and determine the cardinality.

I coded up a recursive puzzle solver, and quickly realized that determining the number of Sudoku 3x3 grids is not exactly trivial.

Naturally, I looked up the Wikipedia article to learn more, and was pleasantly surprised to find the in-depth analysis posted there (which apparently is gleaned from this forum). The analysis of the cardinality is impressive, and it's great to hear that the results have been confirmed independently. However, I was very surprised to read that the minimum number of required clues hadn't been solved yet.

I think I've got a solution, and it's remarkably simple and satisfying. But since I'm new to this, I may have missed something. I initially conjectured when I was figuring out how to encode the puzzles that a reasonable strategy is to start with the 3 matricies on the diagonal (or the anti-diagonal), with 9!^3 possible permuations, which would seem to be an independent basis for forming (in some way) a valid grid. Does anyone know of a counter-example?

It is not surprising that very few of these permuations would have a unique solution without filling in more of the squares. There are probably at least a few permutations that are uniquely determined by *only* these 3*9=27 squares, but this isn't that important, so long as

there is at least one solution.

Here's where the theory of Erasure-Correcting Codes comes in.

It boils down to "intrinsic" vs. "extrinsic" information. These terms come from Turbo codes, which in fact share a lot of parallels to these Sudoku puzzles, with one of the key differences being that the measurements are typically assumed to be corrupted by additive Gaussian noise rather than being erased. The transmitter sends an uncoded bit-stream along with two convolutionally-coded bit-streams, where the second bit-stream first passes through an interleaver to scramble it up. The decoding is accomplished iteratively using two MAP (Maximum Aposteriori Probability) decoders that operate in parallel, given an estimate of the noise variance, to make "soft-decisions" (typically a log-likelihood ratio) about how likely it is that each bit is a 0 or a 1. Each decoder ideally has an independent sequence of noise samples due to the way that the interleavers work. The details are somewhat complicated, but in essence, you can think of the two coded bit streams as providing an indepdendent "row" and "column". The trick to making the iterative feedback work is that the two MAP decoders (the "rows" and "columns") need to separate out the "intrinsic" information and only exchange "extrinsic" information.

In Sudoku, the only "intrinsic" information seems to be contained in

the 27 entries along either the diagonal (or anti-diagonal). In other words, these are the only cells where you have complete freedom to choose any of the 9! permutations without affecting eachother.

Making any further entries in any of the other 81-27=54 squares creates mutual information between the rows and columns, and is thus "extrinsic".

You can think of Sudoku as a non-binary rate (N,K) Erasure-Correcting Block code that uses a base B=9 alphabet. There are N=81 total grid positions (9^81 codewords, most are *not* Sudoku grids), and of them, only K=27 convey the intrinsic information. The "intrinsic information" consists of the 9!^3 possible permuations along either the diagonal or anti-diagonal, since these are the only permutations that can be chosen independently. Since we have a base-9 alphabet, the total amount of intrinisic information is K = ceil[Log_9(9!^3)]=18. You can think of the remaining 54 grid positions as containing parity symbols with "extrinsic" mutual information.

The theory of Error (or Erasure) Correcting codes tends to be rather arcane. But the most fundamental bounds (most originally derived by Shannon) are extremely simple and elegant, somewhat akin to Maxwell's equations. For *any* rate (N,K) linear block code, the minimum distance is *upper* bounded by the Singleton bound:

Dmin <= N-K+1

For a binary code, B=2 and the "distance" between any pair of the 2^K codewords is measured by the Hamming weight, which is the total number of bits that differ. The minimum distance is the smallest possible distance between any pair of 2^K codewords. And the Singleton bound is the largest possible value of the minimum distance. Codes that yield equality, i.e. Dmin = N-K+1, are called MDS (Minimum-Distance Seperable). Reed-Solomon codes are MDS, and in fact also have some interesting connections to Latin squares. In Sudoku, we aren't that lucky, otherwise every puzzle would be soluble given only 17 clues, which is clearly an exceptional circumstance.

The minimum distance corresponds to the number of "Erasures" that a code can correct. The number of "Errors" that can be corrected is *half* the minimum distance, because some of the information is unreliable. Imagine how much harder Sudoku would be if some of the clues were wrong, and we had to figure out which ones had to be fixed! We are much more lucky in Sudoku, because the missing numbers are simply erased, and we have complete confidence that every clue is correct.

Of the N=81 total positions, we wish to find Cmin, the minimum possible number of clues that allow us to still solve the puzzle. Since up to Dmin "Erasures" can be corrected, this means that

Dmin=N-Cmin,

i.e., we can fix up to 81-Cmin erasures in the best case.

Substituting this into the left-hand side of the Singleton bound,

N-Cmin <= N-K+1

-Cmin <=-K+1

+Cmin >= K-1,

where we multiplied by -1 and thus had to reverse the inequality.

Substituting for K,

Cmin >= Ceil[ Log_9(9!^3) ]-1

Cmin >= Ceil[ 17.479 ] -1

Therefore, Cmin >= 17.

Q.E.D.

Dr. Brandon Zeidler