A while back I was exploring some things that called for enumerating and sampling Sudoku grids, so I added some tools in the tdoku project that may be of general interest. I'm aware from an old thread here that there exist tools to work with a 54GB (5.7GB compressed) catalog of all the solution grids, and that the sequential access rate (albeit on 2010 hardware) was in the range of 100-200kgrids/second. The approach I've taken is a little different, and has some advantages in terms of table size and streaming access rate, though I'm not sure how it compares on random access, or on the utility of its order of streaming access.

Briefly, I compute a table that stores, for each of the 36288^2 = 1316818944 essentially different patterns of x's and y's below, the number of solutions completing the grid. Each entry is two bytes for a total of 2.5GB, and the table sums to 3546146300288, which is the full grid count factoring out 9! * 72^2 for digit/row/col/block permutations.

- Code: Select all
`1 2 3|x x x|x x x`

4 5 6|x x x|x x x

7 8 9|x x x|x x x

-----+-----+-----

y y y|. . .|. . .

y y y|. . .|. . .

y y y|. . .|. . .

-----+-----+-----

y y y|. . .|. . .

y y y|. . .|. . .

y y y|. . .|. . .

Additionally, I compute a much smaller index that maps every millionth grid to the index of the pattern for which it's a solution and the offset of that solution given tdoku's order of enumeration (which, while stable for a given version of tdoku, is admittedly not in any way canonical). To find a particular grid we jump into the index, scan forward a short way to determine the correct pattern and offset, and then ask tdoku to find the nth solution for that pattern. For sequential access we just return every solution the solver finds, which is really fast.

The key numbers are:

* uncompressed table size: 2.5 GB

* compressed table size: 450 MB

* sequential grid enumeration: 2m / sec / core

* random grid sampling: 130 / sec / core

If this sounds useful to you, you can find the code here and the tables here.

- Code: Select all
`# (1) uncompress tables.tar.xz in the tdoku project directory`

# (2) build:

./BUILD.sh

# (3) list or sample grids

build/grid_tools list_grids 0 10 # list the first 10 grids

build/grid_tools sample_grids 10 # sample 10 random grids

# (4) sample some minimal puzzles

build/grid_tools sample_puzzles

Other notes: (1) you can use grid_tools to compute the tables yourself, but it took me around 500 core hours. (2) also included, as illustrated above, is a rejection sampler for slowly generating minimal puzzles with uniform probability conditioned on clue count.