简体   繁体   English

调试 nfs 卷“无法为 pod 附加或安装卷”

[英]Debugging nfs volume "Unable to attach or mount volumes for pod"

I've set up an nfs server that serves a RMW pv according to the example at https://github.com/kube.netes/examples/tree/master/staging/volumes/nfs我已经根据https 中的示例设置了一个服务 RMW pv 的 nfs 服务器://github.com/kube.netes/examples/tree/master/staging/volumes/nfs

This setup works fine for me in lots of production environments, but in some specific GKE cluster instance, mount stopped working after pods restarted.此设置在许多生产环境中对我来说都很好,但在某些特定的 GKE 集群实例中,mount 在 pod 重新启动后停止工作。

From kubelet logs I see the following repeating many times从 kubelet 日志中,我看到以下内容重复了很多次

Unable to attach or mount volumes for pod "api-bf5869665-zpj4c_default(521b43c8-319f-425f-aaa7-e05c08282e8e)": unmounted volumes=[shared-mount], unattached volumes=[geekadm.net deployment-role-token-6tg9p shared-mount]: timed out waiting for the condition;无法为 pod“api-bf5869665-zpj4c_default(521b43c8-319f-425f-aaa7-e05c08282e8e)”附加或装载卷:卸载的卷=[shared-mount],未附加的卷=[geekadm.net deployment-role-token-6tg9p shared-mount]:等待条件超时; skipping pod跳绳

Error syncing pod 521b43c8-319f-425f-aaa7-e05c08282e8e ("api-bf5869665-zpj4c_default(521b43c8-319f-425f-aaa7-e05c08282e8e)"), skipping: unmounted volumes=[shared-mount], unattached volumes=[geekadm.net deployment-role-token-6tg9p shared-mount]: timed out waiting for the condition同步 pod 521b43c8-319f-425f-aaa7-e05c08282e8e 时出错(“api-bf5869665-zpj4c_default(521b43c8-319f-425f-aaa7-e05c08282e8e)”),跳过:未安装的卷=[共享安装],未连接的卷=[geekadm net deployment-role-token-6tg9p shared-mount]:等待条件超时

Manually mounting the nfs on any of the nodes work just fine: mount -t nfs <service ip>:/ /tmp/mnt在任何节点上手动安装 nfs 都可以正常工作: mount -t nfs <service ip>:/ /tmp/mnt

How can I further debug the issue?我怎样才能进一步调试这个问题? Are there any other logs I could look at besides kubelet?除了 kubelet 之外,还有其他我可以查看的日志吗?

In case the pod gets kicked out of the node because the mount is too slow, you may see messages like that in logs.如果 pod 由于挂载速度太慢而被踢出节点,您可能会在日志中看到类似的消息。

Kubelets even inform about this issue in logs. Kubelets 甚至在日志中告知这个问题。
Sample log from Kubelets:来自 Kubelet 的示例日志:
Setting volume ownership for /var/lib/kubelet/pods/c9987636-acbe-4653-8b8d- aa80fe423597/volumes/kube.netes.io~gce-pd/pvc-fbae0402-b8c7-4bc8-b375- 1060487d730d and fsGroup set.为 /var/lib/kubelet/pods/c9987636-acbe-4653-8b8d-aa80fe423597/volumes/kube.netes.io~gce-pd/pvc-fbae0402-b8c7-4bc8-b375-1060487d730d 和 fsGroup 集设置卷所有权。 If the volume has a lot of files then setting volume ownership could be slow, see https://github.com/kube.netes/kube.netes/issues/69699如果该卷有很多文件,那么设置卷所有权可能会很慢,请参阅https://github.com/kube.netes/kube.netes/issues/69699

Cause:原因:
The pod.spec.securityContext.fsGroup setting causes kubelet to run chown and chmod on all the files in the volumes mounted for given pod. pod.spec.securityContext.fsGroup 设置使 kubelet 对为给定 pod 安装的卷中的所有文件运行 chown 和 chmod。 This can be a very time consuming thing to do in case of big volumes with many files.如果文件很大且文件很多,这可能是一件非常耗时的事情。

By default, Kube.netes recursively changes ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod's securityContext when that volume is mounted.默认情况下,Kube.netes 递归地更改每个卷内容的所有权和权限,以匹配挂载该卷时在 Pod 的 securityContext 中指定的 fsGroup。 From the document .文件

Solution:解决方案:
You can deal with it in the following ways.您可以通过以下方式处理。

  1. Reduce the number of files in the volume.减少卷中的文件数量。
  2. Stop using the fsGroup setting.停止使用 fsGroup 设置。

Did you specify an nfs version when mounting command-line?安装命令行时是否指定了 nfs 版本? I had the same issue on AKS, but inspired by https://stackoverflow.com/a/71789693/1382108 I checked the nfs versions.我在 AKS 上遇到了同样的问题,但受到https://stackoverflow.com/a/71789693/1382108的启发,我检查了 nfs 版本。 Noticed my PV had a vers=3.注意到我的 PV 有一个 vers=3。 When I tried mounting command-line using mount -t nfs -o vers=3 command just hung, with vers=4.1 it worked immediately.当我尝试使用挂起的mount -t nfs -o vers=3命令挂载命令行时, vers=4.1它立即起作用。 Changed the version in my PV and next Pod worked just fine.更改了我的 PV 中的版本,下一个 Pod 工作正常。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Kubernetes:无法将未格式化的卷挂载为只读 - Kubernetes: failed to mount unformatted volume as read only 如何在 GKE 中查找哪个 pod 正在使用 Persistent Volume Claim - How to find which pod is using a Persistent Volume Claim in GKE Kube8s pod 无法连接到调度程序 - Kube8s pod unable to connect to scheduler 无法使用 Go Docker 引擎 ZDB974238714CA81DE634A7C2Z 的容器在 Docker 中安装文件夹 - Unable to mount a folder in Docker from container with Go Docker Engine API 如何使用 AWS CDK 将 EC2 卷附加到 EC2 实例 - How to attach an EC2 volume to an EC2 instance using AWS CDK 为什么 python 代码无法在 EKS pod 容器内运行? - Why python code is unable to run inside a EKS pod container? GKE 实例元数据 pod 记录“无法同步沙箱”数百万次 - GKE Instance Metadata pod logging "Unable to sync sandbox" millions of times 无法使用 RIOFS 在 EC2 上安装 S3 - HTTP 错误:400(错误请求) - Unable to mount S3 on EC2 using RIOFS - HTTP error: 400 (Bad Request) 我想将 EFS 挂载到 ECS,但在使用控制台为 EC2 集群创建 ECS 任务定义时没有 EFS 卷类型选项 - I want to mount EFS to ECS but I don't have EFS Volume Type option when creating ECS task definition for EC2-cluster using console 无法将外部永久磁盘作为 Google 计算引擎中启动脚本的一部分进行挂载 - Unable to mount an external persistent disk as part of startup script in Google compute engine
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM