简体   繁体   中英

Spring Boot Actuator - MAX property

I am using Spring Boot Actuator dependency to get insights of application. For that, I have used Spring Boot Admin. Configuration for client-server is working fine. I have to measure the count, total-time, max for endpoints which are going to execute.

uri:/user/asset/getAllAssets
TOTAL_TIME: 831ms
MAX: 0ms 

uri:/user/getEmployee/{employeeId}
TOTAL_TIME: 98ms
MAX: 0ms

Why MAX (time) is 0 while TOTAL_TIME: is Xms

Spring Boot管理员映像

While I execute generalize form

localhost:8889/actuator/metrics/http.server.requests I get the MAX as 3.00..

I had also seen production-ready-features but not able to find any description about how MAX is calculated or what does it represent

Notes: with the number of request in an increase, COUNT, TOTAL_TIME is also getting an increase but MAX is reducing sometimes (see Request 1, Request 2 for details)

Request 1: http.server.requests

 {
        "name": "http.server.requests",
        "description": null,
        "baseUnit": "seconds",
        "measurements": [
            {
                "statistic": "COUNT",
                "value": 597
            },
            {
                "statistic": "TOTAL_TIME",
                "value": 144.9057076
            },
            {
                "statistic": "MAX",
                "value": 3.0002913
            }
        ],
        "availableTags": [
            {
                "tag": "exception",
                "values": [
                    "None"
                ]
            },
            {
                "tag": "method",
                "values": [
                    "GET"
                ]
            },
            {
                "tag": "uri",
                "values": [
                    "/actuator/metrics/{requiredMetricName}",
                    "/**/favicon.ico",
                    "/actuator",
                    "/user/getEmployee/{employeeId}",
                    "/user/asset/getAllAssets",
                    "/actuator/health",
                    "/actuator/info",
                    "/actuator/env/{toMatch}",
                    "/actuator/metrics",
                    "/**"
                ]
            },
            {
                "tag": "outcome",
                "values": [
                    "CLIENT_ERROR",
                    "SUCCESS"
                ]
            },
            {
                "tag": "status",
                "values": [
                    "404",
                    "200"
                ]
            }
        ]
    }

UPDATE

localhost:8889/actuator/metrics/http.server.requests?tag=uri:/user/getEmployee/2

Response 404 (I have executed /user/getEmployee/2 before making a request for actuator)


localhost:8889/actuator/metrics/http.server.requests?tag=uri:/user/getEmployee/{employeeId}

Response 400


localhost:8889/actuator/metrics/http.server.requests?tag=uri:/user/asset/getAllAssets

{
    "name": "http.server.requests",
    "description": null,
    "baseUnit": "seconds",
    "measurements": [
        {
            "statistic": "COUNT",
            "value": 1
        },
        {
            "statistic": "TOTAL_TIME",
            "value": 0.8311609
        },
        {
            "statistic": "MAX",
            "value": 0
        }
    ],
    "availableTags": [
        {
            "tag": "exception",
            "values": [
                "None"
            ]
        },
        {
            "tag": "method",
            "values": [
                "GET"
            ]
        },
        {
            "tag": "outcome",
            "values": [
                "SUCCESS"
            ]
        },
        {
            "tag": "status",
            "values": [
                "200"
            ]
        }
    ]
}

Request 2: http.server.requests

localhost:8889/actuator/metrics/http.server.requests

{
    "name": "http.server.requests",
    "description": null,
    "baseUnit": "seconds",
    "measurements": [
        {
            "statistic": "COUNT",
            "value": 3346
        },
        {
            "statistic": "TOTAL_TIME",
            "value": 559.7992767999998
        },
        {
            "statistic": "MAX",
            "value": 2.3612968
        }
    ],

You can see the individual metrics by using ?tag=url:{endpoint_tag} as defined in the response of the root /actuator/metrics/http.server.requests call. The details of the measurements values are;

  • COUNT: Rate per second for calls.
  • TOTAL_TIME: The sum of the times recorded. Reported in the monitoring system's base unit of time
  • MAX: The maximum amount recorded. When this represents a time, it is reported in the monitoring system's base unit of time.

As given here , also here .


The discrepancies you are seeing is due to the presence of a timer. Meaning after some time currently defined MAX value for any tagged metric can be reset back to 0 . Can you add some new calls to /user/asset/getAllAssets then immediately do a call to /actuator/metrics/http.server.requests to see a non-zero MAX value for given tag?

This is due to the idea behind getting MAX metric for each smaller period. When you are seeing these metrics, you will be able to get an array of MAX values rather than a single value for a long period of time.

You can get to see this in action within Micrometer source code. There is a rotate() method focused on resetting the MAX value to create above described behaviour.

You can see this is called for every poll() call, which is triggered every some period for metric gathering.

  • What does MAX represent

MAX represents the maximum time taken to execute endpoint.

Analysis for /user/asset/getAllAssets

COUNT  TOTAL_TIME  MAX
5      115         17
6      122         17  (Execution Time = 122 - 115 = 17)
7      131         17  (Execution Time = 131 - 122 = 17)
8      187         56  (Execution Time = 187 - 131 = 56)  
9      204         56  From Now MAX will be 56 (Execution Time = 204 - 187 = 17)  

  • Will MAX be 0 if we have less number of request (or 1 request) to the particular endpoint?

No number of request for particular endPoint does not affect the MAX


  • When MAX will be 0

There is Timer which set the value 0. When the endpoint is not being called or executed for sometime Timer sets MAX to 0. Here approximate timer value is 2.30 minutes (150 seconds)


  • How I have determined the timer value?

For that, I have taken 6 samples (executed the same endpoint for 6 times). For that, I have determined the time difference between the time of calling the endpoint - time for when MAX set back to zero

DistributionStatisticConfig has .expiry(Duration.ofMinutes(2)).bufferLength(3) which sets some measurements to 0 if there is no request has been made in between expiry time or rotate time.


MAX property belongs to enum Statistic which is used by Measurement (In Measurement we get COUNT, TOTAL_TIME, MAX)

public static final Statistic MAX

The maximum amount recorded. When this represents a time, it is reported in the monitoring system's base unit of time.


Notes: This is the cases from metric for a particular endpoint (here /actuator/metrics/http.server.requests?tag=uri:/user/asset/getAllAssets ).

For generalize metric of actuator/metrics/http.server.requests

As you can see from Request 1, Request 2 (in question) the MAX has been reduced ( from 3.0002913 to 2.3612968) so that maybe because of MAX for some endPoint will be set backed to 0 due to a timer. In my view for MAX for /http.server.requests will be same as a particular endpoint. (but sure on that, investigating on it)

The MAX metrics is a rolling max. So it represents the maximum measurement in a rolling window.

For example if you were to scrape your metrics every minute:

          Total    Count   Max
Minute 1    100        1   100  
Minute 2    500      101    90
Minute 3   4500     1000    10
Minute 4   4500     1000     0

In minute 1 you had 1 request, and a total of 100ms, so the average duration was 100ms, and the slowest (the max) was 100ms

In minute 2 total has increased by 400 (since total is cummulative) and count has increased by 100. So average is 4ms. However since the max is 90ms, then you know that while most of your requests in that second were fast, there were still some that were slower.

In minute 3 you had 899 more requests (count) and 4000ms added to the total. (4000/899 = ~4.4ms) So your average measurement was 4.4ms and the max was 10ms.

So the purpose of the MAX is to measure the worst outlier so you know how consistent the code is performing.

Looking at minute 4, the total and count haven't increased because there were no requests. Since there were no requests, then there couldn't be a 'slowest' request for the MAX, and that is why the MAX is 0.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM