Why does a long running docker instance fill up my disk space?

Question

When I launch a fresh Ubuntu machine (EC2) and download a single docker image which I run for a long time, after a couple weeks the disk fills up. How do I prevent this from happening?

Everything I find online talks about running docker prune, but my issue is not related to lots of stray docker images or volumes sitting around. This EC2 instance downloads a single image and launches it only once (and keeps it running forever, this is a CI runner).

Here are some clues:

Both the host machine and the docker image are Ubuntu 20.04
My EC2 instance has a 10 GB volume
When I docker pull the image it's only 2.5 GB (it's an ubuntu minimal image)
The boot script launches docker with this command:

docker run -it -d --rm --shm-size=2gb --env --user root --name running-docker-ci ghcr.io/secret/docker-ci:latest start

Here is the diagnosis I've done:

$ df
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/root       10098432 10082048         0 100% /
devtmpfs         8192212        0   8192212   0% /dev
tmpfs            8198028        0   8198028   0% /dev/shm
tmpfs            1639608   164876   1474732  11% /run
tmpfs               5120        0      5120   0% /run/lock
tmpfs            8198028        0   8198028   0% /sys/fs/cgroup
/dev/loop0         34176    34176         0 100% /snap/amazon-ssm-agent/3552
/dev/loop1         56832    56832         0 100% /snap/core18/1988
/dev/loop4         33152    33152         0 100% /snap/snapd/11588
/dev/loop5         56832    56832         0 100% /snap/core18/1997
/dev/loop6         72192    72192         0 100% /snap/lxd/19647
/dev/loop7         69248    69248         0 100% /snap/lxd/20326
/dev/loop2         32896    32896         0 100% /snap/snapd/11841
tmpfs            1639604        0   1639604   0% /run/user/1000

And running du a lot led me to this being my biggest folder:

/var/lib/docker$ sudo du -s * | sort -nr | head -50
13842100    overlay2
14888   image
128 containers
72  buildkit
56  network
28  volumes
20  plugins
20  builder
4   trust
4   tmp
4   swarm
4   runtimes

Any help? I'm stumped.

Add more details:

larsks Suggested maybe this is inside the container. It doesn't appear to be. I don't have anything running that generates files. Oddly I noticed that df shows 8 gigs are used by the overlay file system:

$ df
Filesystem     1K-blocks    Used Available Use% Mounted on
overlay          8065444 8049060         0 100% /
tmpfs              65536       0     65536   0% /dev
tmpfs            8198028       0   8198028   0% /sys/fs/cgroup
shm              2097152      16   2097136   1% /dev/shm
/dev/root        8065444 8049060         0 100% /etc/hosts
tmpfs            8198028       0   8198028   0% /proc/acpi
tmpfs            8198028       0   8198028   0% /proc/scsi
tmpfs            8198028       0   8198028   0% /sys/firmware

But when do du on the directory tree, it does not add up anywhere close to 8 gigs. I ran this from the root of the file system inside the running container:

$ sudo du -s * | sort -nr | head -50

3945724 home
1094712 usr
254652  opt
151984  var
3080    etc
252     run
192     tmp
24      root
16      dev
4       srv
4       mnt
4       media
4       boot
0       sys
0       sbin
0       proc
0       libx32
0       lib64
0       lib32
0       lib
0       bin

Answer 1

It appears that part of how OverlayFS works is that delete operations don't always free up space in your filesystem. Fromthe docs :

Deleting files and directories:

When a file is deleted within a container, a whiteout file is created in the container (upperdir). The version of the file in the image layer (lowerdir) is not deleted (because the lowerdir is read-only). However, the whiteout file prevents it from being available to the container.

When a directory is deleted within a container, an opaque directory is created within the container (upperdir). This works in the same way as a whiteout file and effectively prevents the directory from being accessed, even though it still exists in the image (lowerdir).

Without knowing your CI procedures it's hard to say precisely, but the point remains that if you think you're removing files, it's likely that the filesystem is retaining some or all of their contents.

Just as an aside, since you mentioned you're on AWS, you might consider a serverless CI deployment so that your container(s) starts from a clean slate on every run.

Why does a long running docker instance fill up my disk space?

Question

1 answers

solution1
0 2021-05-25 18:39:56

Why does a long running docker instance fill up my disk space?

Question

1 answers

solution1 0 2021-05-25 18:39:56

solution1
0 2021-05-25 18:39:56