[英]How can I calculate the median and standard deviation of a bunch stream of numbers in Perl?
In our logfiles we store response times for the requests. 在我们的日志文件中,我们存储请求的响应时间。 What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? 计算中间响应时间的最有效方法是什么,“75/90/95%的请求是在少于N个时间内提供的”数字等? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stream of numbers). (我想我的问题的一个变体是:计算一串数字流的中位数和标准差的最佳方法是什么)。
The best I came up with was just reading all the numbers, ordering them and then picking out the numbers, but that seems really goofy. 我想出的最好的只是阅读所有数字,订购它们然后挑出数字,但这看起来真的很傻。 Isn't there a smarter way? 是不是有更聪明的方法?
We use Perl, but solutions for any language might be helpful. 我们使用Perl,但任何语言的解决方案都可能有所帮助。
See the article Calculating Percentiles in Memory-bound Applications . 请参阅文章计算内存绑定应用程序中的百分位数 。 It explains how to calculate median and other percentiles efficiently. 它解释了如何有效地计算中位数和其他百分位数。
Also, here's an article on calculating standard deviation (variance) as you go: Accurately computing running variance . 另外,这里有一篇关于计算标准偏差(方差)的文章: 准确计算运行方差 。
you can have look at quick select: 你可以看看快速选择:
http://en.wikipedia.org/wiki/Selection_algorithm http://en.wikipedia.org/wiki/Selection_algorithm
Or at the Wirth algorithm: http://www.mail-archive.com/numpy-discussion@scipy.org/msg20059.html 或者在Wirth算法: http : //www.mail-archive.com/numpy-discussion@scipy.org/msg20059.html
Benchmark for the median can be found here: http://ndevilla.free.fr/median/median/index.html 可以在此处找到中位数的基准: http : //ndevilla.free.fr/median/median/index.html
Have a look at PDL ... the Perl Data Language. 看看PDL ...... Perl数据语言。
Also see these previous SO questions about mean/std dev: 另请参阅以前关于mean / std dev的SO问题:
/I3az/ / I3az /
这里有代码示例: http : //rosettacode.org/wiki/Standard_Deviation
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.