简体   繁体   中英

Grafana gauge for percentage error rate not showing correct calculation

I am trying to visualize the percentage of requests that resulted in error (minute by minute) via a Grafana gauge but the Gauge is not showing the correct value. For example when I execute 10 requests within a 1 minute interval - where 5 of those requests result in HTTP 200 and 5 result in HTTP 500 then I expect the Gauge to show 50% error percentage. However the value stays at 100% regardless that I have been sending successful and unsuccessful requests to the API:

测量

This is the corresponding query:

100 * (sum(sum_over_time(total_requests_gauge{status_code!="200"}[1m]))/ on() group_left() sum(sum_over_time(total_requests_gauge[1m])))

I have configured the Gauge unit to Percent:

百分

On the client side this is how I have set up the Prometheus exporter:

MetricsReporter.cs

public class MetricReporter
{
private readonly ILogger<MetricReporter> _logger;
private readonly Counter _requestCounter;
private readonly Gauge _requestGauge;
private readonly Histogram _responseTimeHistogram;
    public MetricReporter(ILogger<MetricReporter> logger)
{
    _logger = logger ?? throw new ArgumentNullException(nameof(logger));

    _requestCounter = Metrics.CreateCounter("total_requests", "The total number of requests serviced by this API.");
    _requestGauge = Metrics.CreateGauge("total_requests_gauge", "The total number of requests serviced by this API.");

    _responseTimeHistogram = Metrics.CreateHistogram("request_duration_seconds",
        "The duration in seconds between the response to a request.", new HistogramConfiguration
        {
            Buckets = Histogram.ExponentialBuckets(0.01, 2, 10),
            LabelNames = new[] { "status_code", "method" , "path"}
        });
}

public void RegisterRequest()
{
    _requestCounter.Inc();
    _requestGauge.Inc();
}

public void RegisterResponseTime(int statusCode, string method,string path, TimeSpan elapsed)
{
    _responseTimeHistogram.Labels(statusCode.ToString(), method, path).Observe(elapsed.TotalSeconds);
}

}

Prometheus is scrapping the metrics correctly at http://localhost:9090 as well as the API endpoint at http://localhost:80/metrics

I also have an endpoint that always returns error responses:

[AllowAnonymous]
    [HttpPost("problem")]
    public IActionResult Problem([FromBody] RegisterModel model)
    {
        //always returns HTTP 500 error      
            return Problem();
    }

What am I missing?

You are using the wrong prometheus function for this use case - sum_over_time(). Rather I would increase() for easy calculation.

The increase() calculates how much a counter increased in the specified interval. The sum_over_time() calculates the sum of all values in the specified interval.

Here is the query that I tested and worked fine for me:

sum(increase(http_server_requests_seconds_count{namespace="",pod_name=~"",uri=~"",status,="200"}[1m]))/sum(increase(http_server_requests_seconds_count{namespace="",pod_name=~"",uri=~""}[1m])) *100

Looks like you are using a custom metric, so change the metric name and filter params accordingly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM