简体   繁体   English

Rancher:kube-system pod 卡在 ContainerCreating 上

[英]Rancher: kube-system pods stuck on ContainerCreating

I'm trying to spin up a cluster with one node (VM machine) but I'm getting some pods for kube-system stuck as ContainerCreating我正在尝试使用一个节点(VM 机器)启动一个集群,但是我得到了一些用于kube-system Pod 作为ContainerCreating

> kubectl get pods,svc -owide --all-namespaces
NAMESPACE       NAME                                          READY   STATUS              RESTARTS   AGE     IP            NODE            NOMINATED NODE   READINESS GATES
cattle-system   pod/cattle-cluster-agent-7db88c6b68-bz5dp     0/1     ContainerCreating   0          7m13s   <none>        hdn-dev-app66   <none>           <none>
cattle-system   pod/cattle-node-agent-ccntw                   1/1     Running             0          7m13s   10.105.1.76   hdn-dev-app66   <none>           <none>
cattle-system   pod/kube-api-auth-9kdpw                       1/1     Running             0          7m13s   10.105.1.76   hdn-dev-app66   <none>           <none>
ingress-nginx   pod/default-http-backend-598b7d7dbd-rwvhm     0/1     ContainerCreating   0          7m29s   <none>        hdn-dev-app66   <none>           <none>
ingress-nginx   pod/nginx-ingress-controller-62vhq            1/1     Running             0          7m29s   10.105.1.76   hdn-dev-app66   <none>           <none>
kube-system     pod/coredns-849545576b-w87zr                  0/1     ContainerCreating   0          7m39s   <none>        hdn-dev-app66   <none>           <none>
kube-system     pod/coredns-autoscaler-5dcd676cbd-pj54d       0/1     ContainerCreating   0          7m38s   <none>        hdn-dev-app66   <none>           <none>
kube-system     pod/kube-flannel-d9m6q                        2/2     Running             0          7m43s   10.105.1.76   hdn-dev-app66   <none>           <none>
kube-system     pod/metrics-server-697746ff48-q7cpx           0/1     ContainerCreating   0          7m33s   <none>        hdn-dev-app66   <none>           <none>
kube-system     pod/rke-coredns-addon-deploy-job-npjll        0/1     Completed           0          7m40s   10.105.1.76   hdn-dev-app66   <none>           <none>
kube-system     pod/rke-ingress-controller-deploy-job-b9rs4   0/1     Completed           0          7m30s   10.105.1.76   hdn-dev-app66   <none>           <none>
kube-system     pod/rke-metrics-addon-deploy-job-5rpbj        0/1     Completed           0          7m35s   10.105.1.76   hdn-dev-app66   <none>           <none>
kube-system     pod/rke-network-plugin-deploy-job-lvk2q       0/1     Completed           0          7m50s   10.105.1.76   hdn-dev-app66   <none>           <none>

NAMESPACE       NAME                           TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
default         service/kubernetes             ClusterIP   10.43.0.1      <none>        443/TCP                  8m19s   <none>
ingress-nginx   service/default-http-backend   ClusterIP   10.43.144.25   <none>        80/TCP                   7m29s   app=default-http-backend
kube-system     service/kube-dns               ClusterIP   10.43.0.10     <none>        53/UDP,53/TCP,9153/TCP   7m39s   k8s-app=kube-dns
kube-system     service/metrics-server         ClusterIP   10.43.251.47   <none>        443/TCP                  7m34s   k8s-app=metrics-server

when I will do describe on failing pods I'm getting that:当我要描述失败的豆荚时,我得到了:

Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "345460c8f6399a0cf20956d8ea24d52f5a684ae47c3e8ec247f83d66d56b2baa" network for pod "cattle-cluster-agent-7db88c6b68-bz5dp": networkPlugin cni failed to set up pod "cattle-cluster-agent-7db88c6b68-bz5dp_cattle-system" network: error getting ClusterInformation: connection is unauthorized: clusterinformations.crd.projectcalico.org "default" is forbidden: User "system:node" cannot get resource "clusterinformations" in API group "crd.projectcalico.org" at the cluster scope, failed to clean up sandbox container "345460c8f6399a0cf20956d8ea24d52f5a684ae47c3e8ec247f83d66d56b2baa" network for pod "cattle-cluster-agent-7db88c6b68-bz5dp": networkPlugin cni failed to teardown pod "cattle-cluster-agent-7db88c6b68-bz5dp_cattle-system" network: error getting ClusterInformation: connection is unauthorized: clusterinformations.crd.projectcalico.org "default" is forbidden: User "system:node" cannot get resource "clusterinformations" in API group "crd.projectcalico.org" at the cluster scope]

Had try to re-registry that node once more time but no luck.曾尝试再次重新注册该节点,但没有成功。 Any thoughts?有什么想法吗?

As it says unauthorized so you have to give rbac permissions to make it work.正如它所说的未经授权,因此您必须授予 rbac 权限才能使其工作。

Try adding尝试添加

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:calico-node
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: calico-node
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes

修复了以下来自https://rancher.com/docs/rancher/v2.x/en/cluster-admin/cleaning-cluster-nodes/关于如何回收损坏节点的文章的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM