简体   繁体   English

kubernetes pods卡在容器创建

[英]kubernetes pods stuck at containercreating

I have a raspberry pi cluster (one master , 3 nodes) 我有一个覆盆子pi集群(一个主,3个节点)

My basic image is : raspbian stretch lite 我的基本形象是:raspbian stretch lite

I already set up a basic kubernetes setup where a master can see all his nodes (kubectl get nodes) and they're all running. 我已经建立了一个基本的kubernetes设置,其中一个主人可以看到他所有的节点(kubectl获取节点),他们都在运行。 I used a weave network plugin for the network communication 我使用weave网络插件进行网络通信

When everything is all setup i tried to run a nginx pod (first with some replica's but now just 1 pod) on my cluster as followed kubectl run my-nginx --image=nginx 当一切都设置好后,我试图在我的集群上运行一个nginx pod(首先是一些副本,但现在只有一个pod),然后kubectl运行my-nginx --image = nginx

But somehow the pod get stuck in the status "Container creating" , when i run docker images i can't see the nginx image being pulled. 但不知何故,pod陷入“容器创建”状态,当我运行docker图像时,我无法看到nginx图像被拉动。 And normally an nginx image is not that large so it had to be pulled already by now (15 minutes). 并且通常nginx图像不是那么大,所以它现在必须被拉(15分钟)。 The kubectl describe pods give the error that the pod sandbox failed to create and kubernetes will rec-create it. kubectl描述pod提供了pod沙箱无法创建的错误,kubernetes将重新创建它。

I searched everything about this issue and tried the solutions on stackoverflow (reboot to restart cluster, searched describe pods , new network plugin tried it with flannel) but i can't see what the actual problem is. 我搜索了关于这个问题的一切,并尝试了stackoverflow上的解决方案(重新启动重启群集,搜索描述pods,新网络插件尝试使用法兰绒)但我无法看到实际问题是什么。 I did the exact same thing in Virtual box (just ubuntu not ARM ) and everything worked. 我在虚拟框中做了完全相同的事情(只是ubuntu而不是ARM)并且一切正常。

First i thougt it was a permission issue because i run everything as a normal user , but in vm i did the same thing and nothing changed. 首先,我认为这是一个权限问题,因为我作为普通用户运行一切,但在vm我做了同样的事情,没有任何改变。 Then i checked kubectl get pods --all-namespaces to verify that the pods for the weaver network and kube-dns are running and also nothing wrong over there . 然后我检查kubectl获取pods --all-namespaces来验证weaver网络和kube-dns的pod正在运行,并且那里也没有任何错误。

Is this a firewall issue in Raspberry pi ? 这是Raspberry pi中的防火墙问题吗? Is the weave network plugin not compatible (even the kubernetes website says it is) with arm devices ? 编织网络插件是不兼容的(甚至kubernetes网站说它是)与arm设备? I 'am guessing there is an api network problem and thats why i can't get my pod runnning on a node 我猜测有一个api网络问题,这就是为什么我不能让我的pod在节点上运行

[EDIT] Log files [编辑]日志文件

kubectl describe podName kubectl描述podName

>     
>     Name:           my-nginx-9d5677d94-g44l6 Namespace:      default Node: kubenode1/10.1.88.22 Start Time:     Tue, 06 Mar 2018 08:24:13
> +0000 Labels:         pod-template-hash=581233850
>                     run=my-nginx Annotations:    <none> Status:         Pending IP: Controlled By:  ReplicaSet/my-nginx-9d5677d94 Containers: 
> my-nginx:
>         Container ID:
>         Image:          nginx
>         Image ID:
>         Port:           80/TCP
>         State:          Waiting
>           Reason:       ContainerCreating
>         Ready:          False
>         Restart Count:  0
>         Environment:    <none>
>         Mounts:
>           /var/run/secrets/kubernetes.io/serviceaccount from default-token-phdv5 (ro) Conditions:   Type           Status  
> Initialized    True   Ready          False   PodScheduled   True
> Volumes:   default-token-phdv5:
>         Type:        Secret (a volume populated by a Secret)
>         SecretName:  default-token-phdv5
>         Optional:    false QoS Class:       BestEffort Node-Selectors:  <none> Tolerations:     node.kubernetes.io/not-ready:NoExecute for
> 300s
>                      node.kubernetes.io/unreachable:NoExecute for 300s Events:   Type     Reason                  Age   From               
> Message   ----     ------                  ----  ----               
>     -------   Normal   Scheduled               5m    default-scheduler   Successfully assigned my-nginx-9d5677d94-g44l6 to kubenode1   Normal  
> SuccessfulMountVolume   5m    kubelet, kubenode1  MountVolume.SetUp
> succeeded for volume "default-token-phdv5"   Warning 
> FailedCreatePodSandBox  1m    kubelet, kubenode1  Failed create pod
> sandbox.   Normal   SandboxChanged          1m    kubelet, kubenode1 
> Pod sandbox changed, it will be killed and re-created.

kubectl logs podName kubectl记录podName

Error from server (BadRequest): container "my-nginx" in pod "my-nginx-9d5677d94-g44l6" is waiting to start: ContainerCreating

journalctl -u kubelet gives this error journalctl -u kubelet给出了这个错误

Mar 12 13:42:45 kubeMaster kubelet[16379]: W0312 13:42:45.824314   16379 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 12 13:42:45 kubeMaster kubelet[16379]: E0312 13:42:45.824816   16379 kubelet.go:2104] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

The problem seems to be with my network plugin. 问题似乎与我的网络插件有关。 In my /etc/systemd/system/kubelet.service.d/10.kubeadm.conf . 在我的/etc/systemd/system/kubelet.service.d/10.kubeadm.conf中。 the flags for the network plugins are present ? 网络插件的标志是否存在? environment= kubelet_network_args --cni-bin-dir=/etc/cni/net.d --network-plugin=cni environment = kubelet_network_args --cni-bin-dir = / etc / cni / net.d --network-plugin = cni

Thank you all for responding to my question. 谢谢大家回答我的问题。 I solved my problem now. 我现在解决了我的问题。 For anyone who has come to my question in the future the solution was as followed. 对于任何未来提出问题的人来说,解决方案如下。

I cloned my raspberry pi images because i wanted a basicConfig.img for when i needed to add a new node to my cluster of when one gets down. 我克隆了我的覆盆子pi图像,因为我需要一个basicConfig.img,当我需要在我的集群中添加一个新节点时才会出现故障。

Weave network (the plugin i used) got confused because on every node and master the os had the same machine-id. 编织网络(我使用的插件)感到困惑,因为在每个节点和主服务器上,操作系统具有相同的机器ID。 When i deleted the machine id and created a new one (and reboot the nodes) my error got fixed. 当我删除机器ID并创建一个新的(并重新启动节点)时,我的错误得到修复。 The commands to do this was 执行此操作的命令是

sudo rm /etc/machine-id
sudo rm /var/lib/dbus/machine-id
sudo dbus-uuidgen --ensure=/etc/machine-id

Once again my patience was being tested. 我的耐心再次受到考验。 Because my kubernetes setup was normal and my raspberry pi os was normal. 因为我的kubernetes设置正常,我的覆盆子pi os是正常的。 I founded this with the help of someone in the kubernetes community. 我是在kubernetes社区的某个人的帮助下创立的。 This again shows us how important and great our IT community is. 这再次向我们展示了我们的IT社区是多么重要和伟大。 To the people of the future who will come to this question. 对未来的人们来说这个问题。 I hope this solution will fix your error and will decrease the amount of time you will be searching after a stupid small thing. 我希望这个解决方案可以解决你的错误,并减少你在一个愚蠢的小事后搜索的时间。

You can see if it's network related by finding the node trying to pull the image: 您可以通过查找尝试拉取图像的节点来查看它是否与网络相关:

kubectl describe pod <name> -n <namespace>

SSH to the node, and run docker pull nginx on it. SSH到节点,并在其上运行docker pull nginx If it's having issues pulling the image manually, then it might be network related. 如果手动拉动图像时出现问题,则可能与网络有关。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM