Kubernetes和垃圾回收器：尝试删除使用过的图像

Question

A node of my k8s cluster has GC trying to remove images used by a container. 我的k8s群集的一个节点具有GC试图删除容器使用的图像。

This behaviour seems strange to me. 这种行为对我来说似乎很奇怪。

Here the logs: 这里的日志：

kubelet: I1218 12:44:19.925831   11177 image_gc_manager.go:334] [imageGCManager]: Removing image "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" to free 746888 bytes
kubelet: E1218 12:44:19.928742   11177 remote_image.go:130] RemoveImage "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" from image service failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 6f236a385a8e
kubelet: E1218 12:44:19.928793   11177 kuberuntime_image.go:126] Remove image "sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2" failed: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 6f236a385a8e
kubelet: W1218 12:44:19.928821   11177 eviction_manager.go:435] eviction manager: unexpected error when attempting to reduce nodefs pressure: wanted to free 9223372036854775807 bytes, but freed 0 bytes space with errors in image deletion: rpc error: code = Unknown desc = Error response from daemon: conflict: unable to delete 99e59f495ffa (cannot be forced) - image is being used by running container 6f236a385a8e

Any suggestions? 有什么建议么？ May a manual remove of docker images and stopped containers on a node cause such a problem? 手动删除docker映像和节点上停止的容器是否会导致此类问题？

Thank you in advance. 先感谢您。

Answer 1

What you've encountered is not the regular Kubernetes garbage collection that deleted orphaned API resource objects, but the kubelet's Image collection . 您遇到的不是删除孤儿API资源对象的常规Kubernetes垃圾收集，而是kubelet的Image收集 。

Whenever a node experiences Disk pressure , the Kubelet daemon will desperately try to reclaim disk space by deleting (supposedly) unused images. 每当节点遇到磁盘压力时 ，Kubelet守护程序都会拼命尝试通过删除（据说）未使用的映像来回收磁盘空间。 Reading the source code shows that the Kubelet sorts the images to remove by the time since they have last been used for creating a Pod -- if all images are in use, the Kubelet will try to delete them anyways and fail (which is probably what happened to you). 阅读源代码后，Kubelet将对图像进行排序，以从上次使用它们创建Pod的时间开始删除-如果所有图像都在使用中，则Kubelet仍将尝试删除它们并失败（这可能是由于发生在你身上）。

You can use the Kubelet's --minimum-image-ttl-duration flag to specify a minimum age that an image needs to have before the Kubelet will ever try to remove it (although this will not prevent the Kubelet from trying to remove used images altogether). 您可以使用Kubelet的--minimum-image-ttl-duration标志来指定图像在Kubelet尝试删除之前的最低年龄（尽管这不会阻止Kubelet尝试完全删除使用过的图像））。 Alternatively, see if you can provision your nodes with more disk space for images (or build smaller images). 或者，查看是否可以为节点配置更多的磁盘空间来存储映像（或构建较小的映像）。

Answer 2

As I understand, Kubelet has a garbage collector and the purpose of it to remove unnecessary k8s objects for utilising resources. 据我了解，Kubelet有一个垃圾收集器，其目的是删除不必要的k8s对象以利用资源。

If the object does not belong to any owner it means its orphaned. 如果对象不属于任何所有者，则表示该对象是孤立的。 There is a pattern in Kubernetes which is known as ownership in kubernetes . Kubernetes中有一种模式，称为kubernetes中的所有权 。

For Instance, If you apply the deployment object then it will create a replicaSet object, further ResplicaSet will create pods objects. 对于实例，如果您应用部署对象，则它将创建一个plicateSet对象，进一步的ReplicaSet将创建pods对象。

So Ownership flow 所以所有权流

Deployment <== RepicaSet <=== Pod 部署<== RepicaSet <=== Pod

Now if you delete Deployment object which means ReplicaSet does not have an owner then Garbage collector will try to remove ReplicaSet and now Pods do not have owner, therefore, GC will try to remove pods. 现在，如果删除Deployment对象，这意味着ReplicaSet没有所有者，则垃圾收集器将尝试删除ReplicaSet，而Pod没有所有者，因此，GC将尝试删除Pod。

There is a field called ownerReferences which describe the relationship among all of these Objects such as Deployment, ReplicaSet, Pods etc. 有一个名为ownerReferences的字段，它描述所有这些对象（如Deployment，ReplicaSet，Pods等）之间的关系。

There are 3 ways to delete objects in Kubernetes. 在Kubernetes中有3种删除对象的方法。

Foreground: If you try to delete Deployment, First pods will be deleted then replicaset after that deployment will be removed. 前景：如果尝试删除“部署”，则将删除“第一个”窗格，然后删除该部署之后的副本集。
Background: If you try to delete Deployment, First Deployment will be deleted Now GC will remove replicasets and pods. 背景：如果您尝试删除Deployment，则将删除First Deployment，现在GC将删除副本集和Pod。
Orphan: If you remove Deployment, then Repicaset will be Orphaned and GC will remove all of these orphaned objects. 孤立的：如果删除部署，那么Repicaset将成为孤立的，而GC将删除所有这些孤立的对象。

Solutions to your issues 解决问题的方法

It seems to me that your pod (containers) is orphaned, therefore, GC is making sure that it is removed from the cluster. 在我看来，您的Pod（容器）是孤立的，因此，GC正在确保将其从群集中删除。

If you want to check ownerRererences status : 如果要检查ownerRererences状态：

kubectl get pod $PODNAME -o yaml

In the metadata sections, there will be an adequate information. 在元数据部分中，将有足够的信息。

I have attached references for further research. 我已附上参考资料供进一步研究。

garbage-collection 垃圾收集

garbage-collection-k8s 垃圾收集，K8S

Kubernetes和垃圾回收器：尝试删除使用过的图像

问题描述

2 个解决方案

解决方案1
2 2017-12-18 18:39:27

解决方案2
0 已采纳 2017-12-18 12:38:33

Kubernetes和垃圾回收器：尝试删除使用过的图像

问题描述

2 个解决方案

解决方案1 2 2017-12-18 18:39:27

解决方案2 0 已采纳 2017-12-18 12:38:33

解决方案1
2 2017-12-18 18:39:27

解决方案2
0 已采纳 2017-12-18 12:38:33