LCT-19XOk, we are up and running. Our unresolved 20C grid count at the start of this completion exercise is
626,731,783. This translates to roughly 280 core-days.
There are 6400 jobs to process, each job representing a work-unit of 100,000 grids to test. The division is done on a band basis, so for each of the band file indices 1 to 255, there is at least one job with less than 100,000 grids. (Band files 250-255 are composite, they span multiple ED band indices).
Despite the fact that our morphing process eventually resolved ~88% of all the 20C grids that we started with, it is clear that some bands were significantly harder to hit. Here is a small sample of some of the larger bands, from the report produced by
CreateWorkUnits:
- Code: Select all
03:33:35: Processed b006: ng = 96,229,043, n20 = 2,106,353, jobs = 22
03:34:32: Processed b020: ng = 88,782,527, n20 = 1,554,894, jobs = 16
03:34:48: Processed b022: ng = 85,627,560, n20 = 1,476,242, jobs = 15
03:35:03: Processed b024: ng = 85,102,374, n20 = 1,553,046, jobs = 16
03:35:57: Processed b032: ng = 40,697,708, n20 = 7,903,756, jobs = 80
03:37:40: Processed b033: ng = 80,468,664, n20 = 20,258,707, jobs = 203
03:38:27: Processed b034: ng = 79,175,611, n20 = 9,110,207, jobs = 92
03:39:21: Processed b035: ng = 77,979,784, n20 = 10,512,282, jobs = 106
Band 6, the largest of all, was 97% resolved, but band 33 (the largest # of jobs) was only 75% resolved. One suspects that this is not down to sheer chance, but that band 33 probably has less 19C puzzles on average, and is highly likely to yield significant numbers of "No 19C" cases.
There is very likely to be a strong correlation between the % of unresolved grids in a band, and the average cost of testing grids in that band.
We have data for the first full day of processing now, and we have tested 35 million grids. We are using hyper-threading, running 24 workers on 16 cores, and the net effect appears to be close to a 20-core system. With the 280 core-day estimate that would translate to roughly 14 days for the whole job.
Some snapshots of progress for the first 24 hours:
- Code: Select all
08:40:25: Start
13:42:05: Jobs complete = 69, i/p = 16, NGT = 6979931, NF = 9763 (0.140%) 5h
14:41:06: Jobs complete = 88, i/p = 20, NGT = 8815525, NF = 10425 (0.118%) 6h
16:45:55: Jobs complete = 124, i/p = 24, NGT = 12581337, NF = 12388 (0.098%) 8h
17:42:00: Jobs complete = 142, i/p = 24, NGT = 14273465, NF = 13378 (0.094%)
18:41:04: Jobs complete = 162, i/p = 24, NGT = 16072711, NF = 14560 (0.091%)
20:40:38: Jobs complete = 193, i/p = 24, NGT = 19820132, NF = 16294 (0.082%) 12h
22:50:51: Jobs complete = 243, i/p = 24, NGT = 23585231, NF = 17564 (0.074%)
00:52:35: Jobs complete = 267, i/p = 24, NGT = 26091727, NF = 19999 (0.077%) 16h
06:40:04: Jobs complete = 336, i/p = 24, NGT = 33198587, NF = 30312 (0.091%)
08:42:08: Jobs complete = 359, i/p = 24, NGT = 35310171, NF = 37053 (0.105%) 24h
NGT is the total # of grids tested, NF the total "failures" (No 19C).
As you can see, we might have done better had we started with 24 workers, but I wanted to observe Jack's performance/running temperature, etc just to be sure everything was shipshape (and it was! Jack remained cool and whisper-quiet all the way!)
The % of fails drops steadily for some time, then kicks up again, and this corresponds to the point at which we started on bands 32 and 33. So no surprise there - if you look closely you will see there is a corresponding decrease in overall grid-test rates.
And this translates to varying "total job time" estimate predictions, all in excess of the theoretical 14 days. 16 days (based on 1st 12 hour period), 20 (the second 12 hours), 18 for the full 24 hour period.