Paquita wrote:Or maybe I already have the rates if these are from your large files scanprb.
I have most of them, but not all due to the ED canonical morphing.
I think better for the safety to run again the rating.
For ,the next updates, my answer is in the following point
Paquita wrote:I am thinking of a way to split such a large file, for the future maintenance if I am going to do that. Maybe an idea to split by number of clues, that is an invariant given for the minimals.
I have the Sqlite database but that stops at 10.000.000 records
My choice has been to write the missing code to use only gsort and text files, The sequence for an update is something as
find minimal canonical of the file to process
split the file TH threat and others
For each subfile
find the new minimal canonical (start is puzzle plus name)
add the number of clues
add the rating (from scratch for these new puzzles)
add the TH assumed threat, again for the new
expand the new and merge with the file of old expanded
clean expanded with subsets
extract new expanded and old expanded still valid
rate and add the THassumed threat on the new expanded
In this sequence, we need smal split/merge operations; easy things with a tailor made code.
But I agree that analysis on the final file is easier using a database tool and then, a split on any criteria can help