简体   繁体   English

附加卷超时时,k8s pod 无法重试

[英]k8s pods not able to retry when attach volume timeout

Sometimes I got a bunch of jobs to launch, and each of them mounts a pvc.有时我有一堆作业要启动,每个作业都安装一个 pvc。 As our resource is limited, some pods fail to mount in less than one minute.由于我们的资源有限,一些 pod 在不到一分钟的时间内无法挂载。

Unable to mount volumes for pod "package-job-120348968617328640-5gv7s_vname(b059856a-ecfa-11ea-a226-fa163e205547)": timeout expired waiting for volumes to attach or mount for pod "vname"/"package-job-120348968617328640-5gv7s".无法为 pod“package-job-120348968617328640-5gv7s_vname(b059856a-ecfa-11ea-a226-fa163e205547)”安装卷:超时已过等待为 pod“vname”/“package-job-120348968617328640-5gv7s”附加或安装卷”。 list of unmounted volumes=[tmp].卸载卷列表=[tmp]。 list of unattached volumes=[log tmp].未附加卷列表 = [log tmp]。

And it sure keeps retrying.它肯定会不断重试。 But it never success (event age is like 44s (x11 over 23m) ).但它永远不会成功(事件年龄就像44s (x11 over 23m) )。 But if I delete this pod, this job will create a new pod and it will complete.但是如果我删除这个 pod,这个作业将创建一个新的 pod 并且它会完成。

So why is this happening?那么为什么会这样呢? Shouldn't pod retry mount automatically instead of needing manual intervention? pod 不应该自动重试挂载而不需要手动干预吗? And if this is not avoidable, is there a workaround that it will automatically delete pods in Init Phase more than 2 min?如果这是不可避免的,是否有一种解决方法可以自动删除 Init Phase 超过 2 分钟的 pod?

Conclusion结论

It's actually the attaching script provided by my cloud provider in some of the nodes stucks (caused by a.network problem).它实际上是我的云提供商在某些节点卡住(由网络问题引起)中提供的附加脚本。 So If others run into these problem, maybe checking storage plugin that attaches disks is a good idea.所以如果其他人遇到这些问题,也许检查附加磁盘的存储插件是个好主意。

So why is this happening?那么为什么会这样呢? Shouldn't pod retry mount automatically instead of needing manual intervention? pod 不应该自动重试挂载而不需要手动干预吗? And if this is not avoidable, is there a workaround that it will automatically delete pods in Init Phase more than 2 min?如果这是不可避免的,是否有一种解决方法可以自动删除 Init Phase 超过 2 分钟的 pod?

There can be multiple reasons to this.这可能有多种原因。 Do you have any Events on the Pod if you do kubectl describe pod <podname> ?如果执行kubectl describe pod <podname> ,Pod 上是否有任何事件? And do you reuse the PVC that another Pod used before?你是否重复使用另一个 Pod 之前使用的 PVC?

I guess that you use a regional cluster, consisting of multiple datacenters (Availability Zones) and that your PVC is located in one AZ but your Pod is scheduled to run in a different AZ?我猜您使用的是区域集群,由多个数据中心(可用区)组成,并且您的 PVC 位于一个 AZ 中,但您的 Pod 计划在不同的 AZ 中运行? In such situation, the Pod will never be able to mount the volume since it is located in another AZ.在这种情况下,Pod 将永远无法挂载卷,因为它位于另一个 AZ 中。

I had same problem, when even volume attached to same node where pod is running.我遇到了同样的问题,即使卷连接到运行 Pod 的同一节点。

I ssh into node and restarted kubelet then it fixed the issue.我 ssh 进入节点并重新启动kubelet然后它解决了这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM