简体繁体中英

Graphite and statsd, averaging percentile, stddev incorrect

原文 2014-05-07 21:52:43 9 2 node.js/ statistics/ graphite/ statsd

Since statsd calculates statistics for each flush interval (default 10 secs), it seems incorrect for Graphite to simply average these when looking at a longer time window. For example, statsd sends the 90th percentile for 6 flush intervals. If I'm looking at the data in 1 minute buckets, Graphite averages these. It's not accurate to just take the average of 6 ten-second percentiles to create the 90th percentile of the minute.

This is a problem with the other statistics too: mean, median, stddev. For min/max/count it's easy to setup the Graphite storage-aggregation to correctly aggregate. But for statistics it isn't correct.

How are people handling this?

2 answers

You can't. Extracting the percentiles is inherently a lossy operation that cannot be reversed.

The arithmetic mean for the minute can be computed by getting the summing all the values for the 6 intervals and dividing by the sum of the count for all six intervals to restore the accurate mean for the entire minute; not exactly straightforward.

I've been thinking about the issue too.

Let's take the example of an ICMP check where you are measuring packet loss to a service. You are submitting the min,max,avg,90p of your check, every 10 seconds.

Here's my thoughts:

This problem doesn't apply for non sampled values (ie. if there's only one value per 10 seconds).
If you're sending some sort of sampled measurement for your time period measurement (ie. min,max,percentiles), whether through statsd or from the check directly, things get complicated.
- min and max are easy. You can roll things up that way directly (as you point out)
- count is also a special case that is handled, as you note..

But when it comes to percentiles.... things get really messy.

I think that being able to roll-up/flush with a computed percentile would greatly alleviate the problem.

I'm not sure this is technically a graphite problem, but I feel that everyone who is using graphite to "visualize" percentile data has got to be running into this.. but I haven't been able to find that much information online.

For now, if you want accurate visualization of percentile data for arbitrary time periods with rolled up periods, you're going to have to use something like ElasticSearch and go right to the source data (in this case, the results of every ping that you used to derive your statistics)

Graphite not graphing statsd requests

Graphite & statsd generating many nodes

graphite statsd xaxis every 2 seconds

Feed Pusher data to graphite using statsd

Tracking metrics using StatsD (via etsy) and Graphite, graphite graph doesn't seem to be graphing all the data

Issue sending metrics with statsd

issue with upstart job for statsd

Node.js with StatsD

StatsD start error

Trying to start statsD via chef

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Graphite not graphing statsd requests Graphite & statsd generating many nodes graphite statsd xaxis every 2 seconds Feed Pusher data to graphite using statsd Tracking metrics using StatsD (via etsy) and Graphite, graphite graph doesn't seem to be graphing all the data Issue sending metrics with statsd issue with upstart job for statsd Node.js with StatsD StatsD start error Trying to start statsD via chef

Related Tags

Graphite and statsd, averaging percentile, stddev incorrect

Question

2 answers

solution1
0 ACCPTED 2014-05-07 23:28:23

solution2
0 2014-09-22 20:58:56

Graphite and statsd, averaging percentile, stddev incorrect

Question

2 answers

solution1 0 ACCPTED 2014-05-07 23:28:23

solution2 0 2014-09-22 20:58:56

solution1
0 ACCPTED 2014-05-07 23:28:23

solution2
0 2014-09-22 20:58:56