hkociemba wrote:Good progress concerning generation of a valid grid. Generation of 100x100 took 15 min though I did not try any optimization yet. For 225x255 though I do not think it will finish in a reasonable time yet.
The algorithm yet is extremely simple:
1. Fill all rows with numbers from 1..N in random order.
2. Select a random row r and in this row two random cells (r,c1) and (r,c2).
3. Count the number n1 of "wrong" numbers in the two columns c1 and c2 and in the two blocks b1 and b2 the (r,c1) and (r,c2) are in.
4. Swap the content of (r,c1) and (r,c2).
5. Count again. If the number n2 of "wrong" numbers is now greater, swap back.
6. Goto 2 and repeat until the grid is valid. Validity can be computed by the initial value I for the number of wrong numbers and always updating this value: I <-- I - n1 + n2.
Surprisingly the algorithm does not seem to get stuck in a local minimum (most of the time?) and this greedy algorithm works. For a standard 9x9 creation time is only about 30 ms on average, for a 49x49 20 s to 30 s.
Hello! I'm new to this forum. This post was very useful to me, Herbert; thank you :) I had previously been using a backtracking algorithm to generate grids and this stochastic method scaled
much better to large grid sizes.
I've implemented a tweaked version of this that may be slightly more optimized (though I have not implemented yours for comparing benchmarks).
1. Fill all rows with numbers from 1..N in random order.
2. For each horizontal chute,
2.1. For each block in the chute, have an array counting the times a value is seen in that block. ("blks_has")
2.2. Sum over each block the number of values 1..N which are not found in that block. ("has_nots")
2.2. While has_nots is not zero,
2.2.1. Pick a random row and two different cells in that row.
2.2.2. If swapping the values in those cells reduces has_nots, commit the operation and update blks_has and has_nots.
3. For each vertical chute,
2.1. For each block in the chute, have an array counting the times a value is seen in that block. ("cols_has")
2.2. Sum over each block the number of values 1..N which are not found in that block. ("has_nots")
2.2. While has_nots is not zero,
2.2.1. Pick a random row and two different cells in that row and the current vertical chute.
2.2.2. If swapping the values in those cells reduces has_nots, commit the operation and update cols_has and has_nots.
I found that it took fewer swaps (roughly half as many) to do the blocks stage than to do the columns stage, so that makes me think that starting with valid rows or columns instead of starting with valid blocks is faster, although I'm not sure why that would be true on a theoretical level.
If this does result in any performance improvement, I think it could be because the working set of variables is smaller at any given time (better cache reuse), and because it's easier to satisfy the block and column validity requirements in a staged manner rather than trying to do both at once. But I should really benchmark before I say words :P
My source code (current commit) for this is here:
https://github.com/david-fong/solvent/blob/bb0c3db4f0252f1e86a73d3ec49932453ebf403f/cpp/src/solvent/gen/stochastic.cppRunning on an i7 machine, I was able to generate 1000 grids of size 100x100 in 44.92 seconds with three threads.