简体   繁体   English

如何计算Perl中一串数字的中值和标准差?

[英]How can I calculate the median and standard deviation of a bunch stream of numbers in Perl?

In our logfiles we store response times for the requests. 在我们的日志文件中,我们存储请求的响应时间。 What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? 计算中间响应时间的最有效方法是什么,“75/90/95%的请求是在少于N个时间内提供的”数字等? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stream of numbers). (我想我的问题的一个变体是:计算一串数字流的中位数和标准差的最佳方法是什么)。

The best I came up with was just reading all the numbers, ordering them and then picking out the numbers, but that seems really goofy. 我想出的最好的只是阅读所有数字,订购它们然后挑出数字,但这看起来真的很傻。 Isn't there a smarter way? 是不是有更聪明的方法?

We use Perl, but solutions for any language might be helpful. 我们使用Perl,但任何语言的解决方案都可能有所帮助。

See the article Calculating Percentiles in Memory-bound Applications . 请参阅文章计算内存绑定应用程序中的百分位数 It explains how to calculate median and other percentiles efficiently. 它解释了如何有效地计算中位数和其他百分位数。

Also, here's an article on calculating standard deviation (variance) as you go: Accurately computing running variance . 另外,这里有一篇关于计算标准偏差(方差)的文章: 准确计算运行方差

you can have look at quick select: 你可以看看快速选择:

http://en.wikipedia.org/wiki/Selection_algorithm http://en.wikipedia.org/wiki/Selection_algorithm

Or at the Wirth algorithm: http://www.mail-archive.com/numpy-discussion@scipy.org/msg20059.html 或者在Wirth算法: http//www.mail-archive.com/numpy-discussion@scipy.org/msg20059.html

Benchmark for the median can be found here: http://ndevilla.free.fr/median/median/index.html 可以在此处找到中位数的基准: http//ndevilla.free.fr/median/median/index.html

这里有代码示例: http//rosettacode.org/wiki/Standard_Deviation

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM