Hi Denis,
denis_berthier wrote:blue wrote:Those numbers don't figure into the actual estimates for the n(c) numbers, though -- only
into the error estimates. That can be countered with a larger sample size, if nothing else.
Now, we're at it. How much larger is the whole question and I don't believe there can be any general answer independent of the random variable under study.
Whence my list of questions about adjusting the various parameters.
Note that having to increase the sample size may also have drastic computational consequences if the random variable of interest is hard to compute.
I agree about the general answer. There may be an upper bound, but if so, it's probably much
larger than would be encountered in any practical situation.
For the rest, I'll should defer to
Red Ed.
denis_berthier wrote:If the puzzles are correlated, I don't see how you could compute standard deviations in a way that allowed to adjust the parameters in advance.
Here, we need to be on the same page, as far as what kinds of standard deviations we're talking about.
The one I mentioned above, was a standard deviation in an estimate of the value of a constant times
the actual average of very long (but fixed) list of numbers, based on a random sampling of those numbers.
The formula had as a parameter, which was the actual standard deviation of the numbers in the very
long list. In practice, we can't compute that, any more than we can compute the average, but we can
estimate it, by calculating the standard deviation for the sampled numbers.
There is another kind of standard deviation -- the standard deviation in the average number of clues
in a minimal puzzle, for example -- that we can calculate exactly, if we know the actual numbers of
puzzles of each size ... the actual n(c) numbers. We won't be able to calculate the actual numbers
(as always), but instead, we can use the n(c) estimates in the same calculation.
When we do that, we might like an error estimate for the standard deviation that we've calculated.
I guess I won't go into the details of how that can be done -- maybe you're already familiar with the basic ideas.
The details will depend on whether the n(c) estimates were calculated from the same set of (G,S)
samples, or different sets. If it's different sets (not very efficient), then the result will depend only
on the n(c) estimates, and thier error estimates. If it's the same set, then correlations between the
n(c) errors are also important. In that case, if we want the "best" error estimate, then besides
calculating sample averages for m(G,S,c) and m(G,S,c)*m(G,S,c) for the various c's, we'ld also need
sample averages for products like m(G,S,c1)*m(G,S,c2).
Variances and covariances, or standard deviations and correlation coefficients, can replace the
product averages, and vica-versa -- I'm assuming you're familiar with all of that.
Note: here too, correlations aren't figuring into the actual calculation of the estimated standard
deviation, but only into the calculation of the estimated error.
Regards,
Blue.