简体   繁体   English

什么是 AWS Cloudwatch Agent disk_used_percent 测量值? 它与我在 lsblk 或 df 中看到的用法不符

[英]What is AWS Cloudwatch Agent disk_used_percent measuring? It does not match the usage I see with lsblk or df

I have a t4g.large EC2 instance, running Ubuntu 22.04, with a single 30GB storage volume.我有一个 t4g.large EC2 实例,运行 Ubuntu 22.04,具有单个 30GB 存储卷。 I have installed and configured the Cloudwatch Agent to monitor disk usage.我已经安装并配置了 Cloudwatch 代理来监控磁盘使用情况。

Right now, the metrics on Cloudwatch show that the disk is 56% full.现在,Cloudwatch 上的指标显示磁盘已满 56%。

If I run lsblk -f , I see this (I deleted the uuid column for conciseness):如果我运行lsblk -f ,我会看到这个(为了简洁起见,我删除了 uuid 列):

NAME         FSTYPE   FSVER LABEL           FSAVAIL FSUSE% MOUNTPOINTS  
loop0        squashfs 4.0                         0   100% /snap/core20/1699  
loop1        squashfs 4.0                         0   100% /snap/amazon-ssm-agent/5657  
loop2        squashfs 4.0                                   
loop3        squashfs 4.0                         0   100% /snap/lxd/23545  
loop4        squashfs 4.0                         0   100% /snap/core18/2658  
loop5        squashfs 4.0                         0   100% /snap/core18/2636  
loop6        squashfs 4.0                         0   100% /snap/snapd/17885  
loop7        squashfs 4.0                         0   100% /snap/amazon-ssm-agent/6313  
loop8        squashfs 4.0                         0   100% /snap/core20/1740  
nvme0n1                                                    
├─nvme0n1p1  ext4     1.0   cloudimg-rootfs    2.9G    90% / 
└─nvme0n1p15 vfat     FAT32 UEFI              92.4M     5% /boot/efiNAME

If I run df -h , I see this:如果我运行df -h ,我会看到:

Filesystem       Size  Used Avail Use% Mounted on
/dev/root         29G   27G  2.9G  91% /
tmpfs            3.9G     0  3.9G   0% /dev/shm
tmpfs            1.6G  1.1M  1.6G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p15   98M  5.1M   93M   6% /boot/efi
tmpfs            782M  8.0K  782M   1% /run/user/1000

I don't understand where 56% could be coming from.我不明白 56% 是从哪里来的。 Even if the Cloudwatch agent is doing a sum over all of the mount points, it would come out to ~75%, not 56%.即使 Cloudwatch 代理对所有挂载点进行求和,结果也会约为 75%,而不是 56%。

This is my config for the agent:这是我的代理配置:

{
    "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "root"
    },
    "metrics": {
        "aggregation_dimensions": [
            [
                "InstanceId"
            ]
        ],
        "append_dimensions": {
            "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
            "ImageId": "${aws:ImageId}",
            "InstanceId": "${aws:InstanceId}",
            "InstanceType": "${aws:InstanceType}"
        },
        "metrics_collected": {
            "collectd": {
                "metrics_aggregation_interval": 60
            },
            "disk": {
                "measurement": [
                    "used_percent"
                ],
                "metrics_collection_interval": 60,
                "resources": [
                    "*"
                ]
            },
            "mem": {
                "measurement": [
                    "mem_used_percent"
                ],
                "metrics_collection_interval": 60
            },
            "statsd": {
                "metrics_aggregation_interval": 60,
                "metrics_collection_interval": 30,
                "service_address": ":8125"
            }
        }
    }
}

I tried changing "*" to "/" or "/dev/root" in the resources, and restarted the agent, but it has not made any difference in the reported value.我尝试将资源中的“*”更改为“/”或“/dev/root”,然后重新启动代理,但它对报告值没有任何影响。

Edit: I've now deleted a bunch of files and lsblk reports 33% disk usage at the "/" mount point, while cloudwatch says 52%.编辑:我现在删除了一堆文件,lsblk 报告“/”挂载点的磁盘使用率为 33%,而 cloudwatch 则为 52%。

I figured it out.我想到了。 The culprit is this part of the config:罪魁祸首是配置的这一部分:

"aggregation_dimensions": [
            [
                "InstanceId"
            ]
        ],

This means that the agent sends an "aggregate" value to cloudwatch, which is what I was using by accident.这意味着代理向 cloudwatch 发送了一个“聚合”值,这是我无意中使用的。 To get this aggregate, I navigated through the Metrics in the Cloudwatch GUI like "CWAgent" - "InstanceId" - "disk_used_percent".为了获得这个聚合,我浏览了 Cloudwatch GUI 中的指标,如“CWAgent”-“InstanceId”-“disk_used_percent”。 This reports a set of data points for each point in time - all the results for all the different paths that the agent is reporting on.这会报告每个时间点的一组数据点 - 代理报告的所有不同路径的所有结果。 From there you can select "average", "max", "min", etc. to use this data.从那里您可以选择“平均”、“最大”、“最小”等以使用此数据。 I had selected "average".我选择了“平均”。

What I should have done was navigate through "CWAgent" - "ImageId, InstanceId, InstanceType, device, fstype, path" - "disk_used_percent" for path /.我应该做的是通过“CWAgent”导航 - “ImageId,InstanceId,InstanceType,设备,fstype,路径” - 路径 / 的“disk_used_percent”。 Then I would be looking at only the value for that path, there would only be one sample per time step, and it would match what I see in the terminal.然后我将只查看该路径的值,每个时间步长只有一个样本,并且它会与我在终端中看到的相匹配。

Note: If you really want to dive deep, you can check out the collectd config at /etc/collectd/collectd.conf , which has a config for "".注意:如果您真的想深入了解,可以在 /etc/collectd/collectd.conf 查看/etc/collectd/collectd.conf配置,其中有一个配置为“”。 This should point you to the path where collectd is storing the df information that the cloudwatch agent is reading.这应该指向 collectd 存储 cloudwatch 代理正在读取的 df 信息的路径。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 AWS Step Functions 是否登录 CloudWatch - Does AWS Step Functions log in CloudWatch 如何查看 Google Cloud 上 VM 实例的磁盘使用情况? - How to see disk usage of a VM instance on Google Cloud? AWS CloudWatch 中的当地时间 - Local time in AWS CloudWatch AWS-CloudWatch:InvalidSequenceTokenException - AWS-CloudWatch: InvalidSequenceTokenException 如何在 nodejs 中使用 `aws-sdk` 读取 cloudwatch 日志 - How can I read cloudwatch logs using `aws-sdk`in nodejs aws cloudwatch get-metric-statistics S3 BytesDownloaded 什么都不返回,我不能让它失败 - aws cloudwatch get-metric-statistics S3 BytesDownloaded returns nothing and I cannot make it fail 延迟 AWS Cloudwatch 警报状态更改 - Delay in AWS Cloudwatch Alarm state change 名称中带有连字符的 AWS CloudWatch Insights 查询字段 - AWS CloudWatch Insights query field with hyphen in name 如何使用适用于 .NET 的 AWS 开发工具包将多个日志发送到单个 CloudWatch 日志流? - How can I send multiple logs to a single CloudWatch log stream using the AWS SDK for .NET? 当我们决定在 AWS DirectConnect 上使用加密时会使用哪些协议? - What protocols are used when we decide to use encryption on AWS DirectConnect?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM