The node was low on resource: [DiskPressure]. but df -h shows 47% usage only

Question

I have a node in my K8S cluster that I use for monitoring tools.

Pods running here: Grafana , PGAdmin , Prometheus , and kube-state-metrics

My problem is that I have a lot of evicted pods

The pods evicted: kube-state-metrics , grafana-core , pgadmin

Then, the pod evicted with reason: The node was low on resource: [DiskPressure]. : kube-state-metrics (90% of evicted pods), pgadmin (20% of evicted pods)

When I check any of the pods, I have free space on disk:

bash-5.0$ df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                   7.4G      3.3G      3.7G  47% /
tmpfs                    64.0M         0     64.0M   0% /dev
tmpfs                   484.2M         0    484.2M   0% /sys/fs/cgroup
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /dev/termination-log
shm                      64.0M         0     64.0M   0% /dev/shm
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/resolv.conf
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/hostname
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/hosts
/dev/nvme2n1            975.9M      8.8M    951.1M   1% /var/lib/grafana
/dev/nvme0n1p2            7.4G      3.3G      3.7G  47% /etc/grafana/provisioning/datasources
tmpfs                   484.2M     12.0K    484.2M   0% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   484.2M         0    484.2M   0% /proc/acpi
tmpfs                    64.0M         0     64.0M   0% /proc/kcore
tmpfs                    64.0M         0     64.0M   0% /proc/keys
tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
tmpfs                    64.0M         0     64.0M   0% /proc/sched_debug
tmpfs                   484.2M         0    484.2M   0% /sys/firmware

Only one or two pods show another message:

The node was low on resource: ephemeral-storage. Container addon-resizer was using 48Ki, which exceeds its request of 0. Container kube-state-metrics was using 44Ki, which exceeds its request of 0.

The node was low on resource: ephemeral-storage. Container pgadmin was using 3432Ki, which exceeds its request of 0.

I also have kubelet saying:

(combined from similar events): failed to garbage collect required amount of images. Wanted to free 753073356 bytes, but freed 0 bytes

I have those pods running on a AWS t3.micro

It appears that it is not affecting my services in production.

Why is it happening, and how should I fix this.

EDIT: Here is the result when I do df -h in my node

admin@ip-172-20-41-112:~$ df -h 
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           789M  3.0M  786M   1% /run
/dev/nvme0n1p2  7.5G  6.3G  804M  89% /
tmpfs           3.9G     0  3.9G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup

I can see that /dev/nvme0n1p2 , but how can I see the content ? when I do ncdu in /, I can only see 3GB of data...

Answer 1

Apparently you're about to run out of the available disk space on your node . However keep in mind that according to the documentation DiskPressure condition denotes:

Available disk space and inodes on either the node's root filesystem or image filesystem has satisfied an eviction threshold

Try to run df -h but on your worker node , not in a Pod . What is the percentage of disk usage ? Additionally you may check kubelet logs for more details:

journalctl -xeu kubelet.service

Also take a look at this article and this comment.

Let me know if it helps.

Here you can find an answer which explains very well the same topic.

update:

This line clearly shows that the default treshold is close to being exceeded:

/dev/nvme0n1p2  7.5G  6.3G  804M  89% /

Swith to the root user ( su - ) and run:

du -hd1 /

to see what directories take up most of the disk space.

The node was low on resource: [DiskPressure]. but df -h shows 47% usage only

Question

1 answers

solution1
3 ACCPTED 2020-03-19 15:20:41

update:

The node was low on resource: [DiskPressure]. but df -h shows 47% usage only

Question

1 answers

solution1 3 ACCPTED 2020-03-19 15:20:41

update:

solution1
3 ACCPTED 2020-03-19 15:20:41