简体   繁体   English

Kubernetes coredns豆荚处于待处理状态。 无法启动仪表板

[英]Kubernetes coredns pods stuck in Pending status. Cannot start the dashboard

I am building a Kubernetes cluster following this tutorial , and I have troubles to access the Kubernetes dashboard. 我将在本教程之后构建一个Kubernetes集群,但在访问Kubernetes仪表板时遇到了麻烦。 I already created another question about it that you can see here , but while digging up into my cluster, I think that the problem might be somewhere else and that's why I create a new question. 我已经创建了一个关于此的另一个问题,您可以在这里看到,但是在深入研究我的集群时,我认为该问题可能在其他地方,这就是为什么我创建一个新问题。

I start my master, by running the following commands: 我通过运行以下命令来启动我的主服务器:

> kubeadm reset 
> kubeadm init --apiserver-advertise-address=[MASTER_IP] > file.txt
> tail -2 file.txt > join.sh # I keep this file for later

> kubectl apply -f https://git.io/weave-kube/

> kubectl -n kube-system get pod
NAME                                READY   STATUS  RESTARTS    AGE
coredns-fb8b8dccf-kb2zq             0/1     Pending 0           2m46s
coredns-fb8b8dccf-nnc5n             0/1     Pending 0           2m46s
etcd-kubemaster                     1/1     Running 0           93s
kube-apiserver-kubemaster           1/1     Running 0           93s
kube-controller-manager-kubemaster  1/1     Running 0           113s
kube-proxy-lxhvs                    1/1     Running 0           2m46s
kube-scheduler-kubemaster           1/1     Running 0           93s

Here we can see that I have two coredns pods stuck in Pending state forever, and when I run the command : 在这里,我们可以看到我有两个coredns pod永远卡在Pending状态,当我运行命令时:

> kubectl -n kube-system describe pod coredns-fb8b8dccf-kb2zq

I can see in the Events part the following Warning : 我可以在“事件”部分看到以下警告:

Failed Scheduling : 0/1 nodes are available 1 node(s) had taints that the pod didn't tolerate.

Since it is a Warning and not and Error, and that as a Kubernetes newbie, taints does not mean much to me, I tried to connect a node to the master (using the previously saved command) : 由于这是一个警告而不是错误,并且作为Kubernetes的新手, taints对我而言意义不大,所以我尝试将节点连接到主节点(使用之前保存的命令):

> cat join.sh
kubeadm join [MASTER_IP]:6443 --token [TOKEN] \
    --discovery-token-ca-cert-hash sha256:[ANOTHER_TOKEN]

> ssh [USER]@[WORKER_IP] 'bash' < join.sh

This node has joined the cluster.

On the master, I check that the node is connected: 在主服务器上,我检查节点是否已连接:

> kubectl get nodes 
NAME        STATUS      ROLES   AGE     VERSION
kubemaster  NotReady    master  13m     v1.14.1
kubeslave1  NotReady    <none>  31s     v1.14.1

And I check my pods : 我检查我的豆荚:

> kubectl -n kube-system get pod
NAME                                READY   STATUS              RESTARTS    AGE
coredns-fb8b8dccf-kb2zq             0/1     Pending             0           14m
coredns-fb8b8dccf-nnc5n             0/1     Pending             0           14m
etcd-kubemaster                     1/1     Running             0           13m
kube-apiserver-kubemaster           1/1     Running             0           13m
kube-controller-manager-kubemaster  1/1     Running             0           13m
kube-proxy-lxhvs                    1/1     Running             0           14m
kube-proxy-xllx4                    0/1     ContainerCreating   0           2m16s
kube-scheduler-kubemaster           1/1     Running             0           13m

We can see that another kube-proxy pod have been created and is stuck in ContainerCreating status. 我们可以看到已经创建了另一个kube-proxy pod,并停留在ContainerCreating状态。

And when I am doing a describe again : 当我再次进行描述时:

kubectl -n kube-system describe pod kube-proxy-xllx4

I can see in the Events part multiple identical Warnings : 我可以在事件部分看到多个相同的警告:

Failed create pod sandbox : rpx error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.1": Get https://k8s.gcr.io/v1/_ping: dial tcp: lookup k8s.gcr.io on [::1]:53 read up [::1]43133->[::1]:53: read: connection refused

Here are my repositories : 这是我的存储库:

docker image ls
REPOSITORY                          TAG     
k8s.gcr.io/kube-proxy               v1.14.1 
k8s.gcr.io/kube-apiserver           v1.14.1 
k8s.gcr.io/kube-controller-manager  v1.14.1 
k8s.gcr.io/kube-scheduler           v1.14.1 
k8s.gcr.io/coredns                  1.3.1   
k8s.gcr.io/etcd                     3.3.10  
k8s.gcr.io/pause                    3.1 

And so, for the dashboard part, I tried to start it with the command 因此,对于仪表板部分,我尝试使用以下命令启动它

> kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/aio/deploy/recommended/kubernetes-dashboard.yaml

But the dashboard pod is stuck in Pending state. 但是仪表板容器卡在“挂起”状态。

kubectl -n kube-system get pod
NAME                                    READY   STATUS              RESTARTS    AGE
coredns-fb8b8dccf-kb2zq                 0/1     Pending             0           40m
coredns-fb8b8dccf-nnc5n                 0/1     Pending             0           40m
etcd-kubemaster                         1/1     Running             0           38m
kube-apiserver-kubemaster               1/1     Running             0           38m
kube-controller-manager-kubemaster      1/1     Running             0           39m
kube-proxy-lxhvs                        1/1     Running             0           40m
kube-proxy-xllx4                        0/1     ContainerCreating   0           27m
kube-scheduler-kubemaster               1/1     Running             0           38m
kubernetes-dashboard-5f7b999d65-qn8qn   1/1     Pending             0           8s

So, event though my problem originaly was that I cannot access to my dashboard, I guess that the real problem is deeper thant that. 因此,尽管我最初的问题是我无法访问仪表板,但我认为真正的问题远不止于此。

I know that I just put a lot of information here, but I am a k8s beginner and I am completely lost on this. 我知道我只是在这里输入了很多信息,但是我是k8s初学者,对此我完全迷失了。

There is an issue I experienced with coredns pods stuck in a pending mode when setting up your own cluster; 在设置您自己的集群时, coredns pod陷入了挂起模式,这是我遇到的一个问题。 which I resolve by adding pod network. 我通过添加Pod网络来解决。

Looks like because there is no Network Addon installed, the nodes are taint as not-ready . 看起来是因为没有安装Network Addon,所以节点被污染为not-ready Installing the Addon would remove the taints and the Pods will be able to schedule. 安装插件会删除污点,并且Pods可以安排时间。 In my case adding flannel fixed the issue. 以我为例,添加绒布解决了该问题。

EDIT: There is a note about this in the official k8s documentation - Create cluster with kubeadm : 编辑:在官方的k8s文档中有关于此的注释-使用kubeadm创建集群

The network must be deployed before any applications. 必须在任何应用程序之前部署网络。 Also, CoreDNS will not start up before a network is installed. 另外,在安装网络之前,CoreDNS将不会启动。 kubeadm only supports Container Network Interface (CNI) based networks (and does not support kubenet). kubeadm仅支持基于容器网络接口(CNI)的网络(不支持kubenet)。

Actually it is the opposite of a deep or serious issue. 实际上,这与深刻或严肃的问题相反。 This is a trivial issue. 这是一个琐碎的问题。 Always you see a pod stuck on Pending state, it means the scheduler is having a hard time to schedule the pod; 总是会看到Pod处于Pending状态,这意味着调度程序很难调度Pod。 mostly because there are no enough resources on the node. 主要是因为节点上没有足够的资源。

In your case it is a taint that has the node, and your pod doesn't have the toleration. 在您的情况下,这是具有结点的taint ,并且您的吊舱没有公差。 What you have to do is to describe the node and get the taint: 您要做的是描述节点并获取污点:

kubectl describe node | grep -i taints

Note: you might have more then one taint. 注意:您可能有一个以上的异味。 So you might want to do kubectl describe no NODE since with grep you will only see one taint. 因此,您可能不希望kubectl describe no NODE因为使用grep时,您只会看到一个异味。

Once you get the taint, that will be something like hello=world:NoSchedule ; 一旦获得污点,将类似于hello=world:NoSchedule which means key=value:effect , you will have to add a toleration section in your Deployment . 这意味着key=value:effect ,您将必须在Deployment添加一个toleration部分。 This is an example Deployment so you can see how it should look like: 这是一个示例Deployment因此您可以看到它的外观:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 10
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx
        ports:
        - containerPort: 80
          name: http
      tolerations:
      - effect: NoExecute       #NoSchedule, PreferNoSchedule
        key: node
        operator: Equal
        value: not-ready
        tolerationSeconds: 3600

As you can see there is the toleration section in the yaml. 如您所见,yaml中有容忍部分。 So, if I would have a node with node=not-ready:NoExecute taint, no pod would be able to be scheduled on that node, unless would have this toleration. 因此,如果我的节点具有node=not-ready:NoExecute污点,则除非具有此容忍度,否则无法在该节点上调度Pod。

Also you can remove the taint , if you don need it. 你还可以去除taint ,如果你不需要它。 To remove a taint you would describe the node, get the key of the taint and do: 要删除taint您将描述节点,获取污点的key并执行以下操作:

kubectl taint node NODE key-

Hope it makes sense. 希望有道理。 Just add this section to your deployment, and it will work. 只需将此部分添加到您的部署中,它就会起作用。

Set up the flannel network tool. 设置绒布网络工具。

Running commands: 运行命令:

$ sysctl net.bridge.bridge-nf-call-iptables=1
$ kubectl apply -f 

https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM