coloin wrote:How do you keep tabs on all of them - my isomorph sorter runs out of memory at under a million grids...... ?
I explained the data model earlier in this thread. But let repeat it with some new details.
My generator uses as input a stream of subgrids with up to 40 givens (the seed). Then it strips them to 35 givens, sorts them in memory in minlex form, and adds givens. Any minimal puzzle of size 36+ is converted to minlex and added to a sorted list in memory.
The input size (batch size) is optimized to use the 64GB RAM. It is 70,000 puzzles for 37+ seed and takes about 4 hours and half to complete. For 36s the batch size is 10,000,000 puzzles and takes about 2 hours and half.
After all puzzles are processed, the sorted list is written to a file. A tag with number of givens is added at the end of each puzzle.
Then the puzzles from all batches for this generation are merged in a single ordered file.
Then the puzzles of size 37, 38, 39, 40 are filtered in a separate files, the known ones are removed by single pass over the known puzzles of the respective size. The file with all puzzles is kept for later extraction of 36s.
The new puzzles are checked for twins, and the minimal twins are added to the new puzzles of the respective size.
Then all new 37+ puzzles are used as a seed for the next generation.
New puzzles are added to the sorted list of known (and processed) puzzles with the respective size.
After the 37+ seed for the next generation is reduced to about 40,000 puzzles, the generated 36s are filtered, compared to the known 36s, checked for twins, and added to the seed. Comparing to the known 36s takes several hours. The known 36s are currently kept in 11 zipped files of sizes between 3 and 20 GB, each of them sorted. About 2/3 of the known 36s are "processed" (i.e. {-1+} is done). One or two files of known 36s are processed at once and after processing the files are renamed.
Recently a typical generation of 36+ puzzles consists of 60 batches and a 37+ generation starts from 8 batches gradually decreasing to a single batch.
Generation and filtering are automated, but adding the puzzles to the collections of knowns and processed puzzles is done manually, just for safety and for prevention of collections corruption. Manual work takes several minutes per generation when seeding with 37+ puzzles and almost one day for preparation and finalization of a generation with 36-givens seed.
For few times I also stripped out the 39s down to 34 givens and involved the so generated 36+ puzzles in the process.
coloin wrote:This first found 36 puzzle is from the MC grid [most canonical]
Interesting. I didn't know that.
coloin wrote:Out of interest did you find a 37 in this grid ? !!!!
After a run over the most of the 37s (those copied to my PC) I found no puzzle in MC grid.