简体   繁体   中英

Online algorithm for computing average and variance from a subset of data

I took this as a reference for online computing the variance and mean from a variable-length array of data: http://www.johndcook.com/standard_deviation.html .

The data is a set from 16-bit unsigned values, which may have any number of samples (actually, the minimum would be about 20 samples, and the maximum about 2e32 samples.

As the dataset may be too big to store, I already implemented this using the above-mentioned online algorithm in C and verified it's computing correctly.

The trouble begins with the following requirement for the application: besides computing the variance and mean for the whole set, I also need to compute a separated result (both mean and variance) for a population comprised of the middle 50% of the values, ie disregarding the first 25% and the latter 25% of the samples. The number of samples is not known beforehand, so I must compute the additional set online.

I understand that I can both add and subtract a subset by computing it separately and them using something like the operator+ implementation described here: http://www.johndcook.com/skewness_kurtosis.html (minus the skewness & kurtosis specifics, for which I have no use). The subtraction could be derived from this.

The problem is: how do I maintain these subsets? Or should I try another technique?

If space is an issue, and you'd be happy to accept an approximation, I'd start with the algorithm from the following paper:

M Greenwald, S Khanna, Space-Efficient Online Computation of Quantile Summaries

You can use the algorithm to compute the running estimates of the 25th and 75th percentiles of the observations seen to far. You can then feed those observations that fall between the two percentiles into the Welford algorithm covered in John D Cook's article to compute the running mean and variance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM