简体   繁体   English

Kubernetes 使用 flannel 创建容器陷入“ContainerCreating”状态

[英]Kubernetes' container creation with flannel gets stuck in “ContainerCreating”-state

Context语境

I installed Docker following this instruction on my Ubuntu 18.04 LTS (Server) and later on Kubernetes followed via kubeadm .我按照说明在我的Ubuntu 18.04 LTS (Server)上安装了Docker ,随后在Kubernetes通过kubeadm After initializing ( kubeadm init --pod-network-cidr=10.10.10.10/24 ) and joining a second node (I got a two node cluster for the start) I cannot get my coredns as well as the later applied Web UI (Dashboard) to actually go into status Running .初始化( kubeadm init --pod-network-cidr=10.10.10.10/24 )并加入第二个节点(我有一个双节点集群作为开始)后,我无法获得我的coredns以及后来应用的Web UI(仪表板) )实际进入Running状态。

As pod network I tried both, Flannel ( kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml ) and Weave Net - Nothing changed.作为 pod 网络,我尝试了Flannelkubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml )和Weave Net - 没有任何改变。 It still shows status ContainerCreating , even after hours of waiting:即使经过数小时的等待,它仍然显示状态ContainerCreating

在此处输入图片说明

Question问题

Why doesn't the container creation work as expected and what might be the root cause for this?为什么容器创建没有按预期工作,这可能是什么根本原因? And most importantly: How do I solve this?最重要的是:我该如何解决这个问题?

Edit编辑

Summing up my answer below, here are the reasons why:总结一下我的回答,原因如下:

  • Docker used cgroups instead of systemd Docker 使用cgroups而不是systemd
  • I did not configure iptables correctly我没有正确配置iptables
  • I used a wrong kubeadm init since flannels standard-yaml requires --pod-network-cidr to be 10.244.0.0/16我用了一个错误的kubeadm init因为法兰绒标准YAML需要--pod-network-cidr10.244.0.0/16

Since answering this questions took me a lot of time, I wanted to share what got me out of this.由于回答这些问题花了我很多时间,我想分享一下是什么让我摆脱了这个问题。 There might be some more code than necessary, but I also want this to be in one place if I or someone else has to redo all steps.可能有一些不必要的代码,但如果我或其他人必须重做所有步骤,我也希望将其放在一个地方。



First it all started with Docker...首先,一切都始于 Docker……

I figured out that it presumably all started with the way I installed Docker .我发现这大概都是从我安装Docker的方式开始的。 Following the linked online-instructions I used sudo apt-get install docker.io in order to install Docker and used it with cgroups by doing sudo usermod -aG docker $USER .按照链接的在线说明,我使用sudo apt-get install docker.io来安装Docker并通过执行sudo usermod -aG docker $USER将其与cgroups一起使用。

Well, taking a look at the official instructions from Kubernetes this was a mistake: systemd is the recommended way to go!好吧,看看Kubernetes的官方说明,这是一个错误: systemd是推荐的方法!

So I completly purged all I ever did with docker by following these great instructions from Mayur Bhandare:因此,我遵循 Mayur Bhandare 的这些重要说明,彻底清除了我对 docker 所做的一切:

sudo apt-get purge -y docker-engine docker docker.io docker-ce  
sudo apt-get autoremove -y --purge docker-engine docker docker.io docker-ce  
sudo rm -rf /var/lib/docker /etc/docker
sudo rm /etc/apparmor.d/docker
sudo groupdel docker
sudo rm -rf /var/run/docker.sock

# Reboot to be sure

Afterwards I installed reinstalled the official way (keep in mind that this might change in the future):之后我以官方方式安装了重新安装(请记住,这将来可能会改变):

# Install Docker CE
## Set up the repository:
### Install packages to allow apt to use a repository over HTTPS
apt-get update && apt-get install -y \
  apt-transport-https ca-certificates curl software-properties-common gnupg2

### Add Docker’s official GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

### Add Docker apt repository.
add-apt-repository \
  "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) \
  stable"

## Install Docker CE.
apt-get update && apt-get install -y \
  containerd.io=1.2.10-3 \
  docker-ce=5:19.03.4~3-0~ubuntu-$(lsb_release -cs) \
  docker-ce-cli=5:19.03.4~3-0~ubuntu-$(lsb_release -cs)

# Setup daemon.
cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

mkdir -p /etc/systemd/system/docker.service.d

# Restart docker.
systemctl daemon-reload
systemctl restart docker

Note that this explicitly uses systemd !请注意,这明确使用systemd



... and then it went on with Flannel... ......然后它继续与法兰绒......

Above I wrote my sudo kubeadm init was done with --pod-network-cidr=10.10.10.10/24 since the latter was the IP of my master.上面我写了我的sudo kubeadm init是用--pod-network-cidr=10.10.10.10/24因为后者是我主人的 IP。 Well, as pointed out here not using the official recommended --pod-network-cidr=10.244.0.0/16 results in an error for example using kubectl proxy or the container-creation when using the provided kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml .好吧,正如这里所指出的,不使用官方推荐的--pod-network-cidr=10.244.0.0/16导致错误,例如在使用提供的kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml时使用kubectl proxy或容器创建kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml This is due to the fact that 10.244.0.0/16 is hard-linked in the .yaml and, hence, mandatory - Or you just change it in the .yaml .这是由于这样的事实: 10.244.0.0/16在硬链接.yaml ,因此,强制性的-或者你只是改变它在.yaml

In order to get rid of the false configuration I did a full reset.为了摆脱错误的配置,我进行了完全重置。 This can be achieved using sudo kubeadm reset and by deleting the config with sudo rm -r ~/.kube/config .这可以使用sudo kubeadm reset并使用sudo rm -r ~/.kube/config删除配置来实现。 Anyhow, since I screwed it so much, I did a full reset by uninstalling and reinstalling kubeadm and making sure it did use iptables this time (which I also forgot to do before...).无论如何,由于我把它搞砸了,我通过卸载并重新安装kubeadm并确保它这次确实使用了iptables (我之前也忘记这样做了......)来完全重置。

Here is a nice link how to fully uninstall all kubeadm-parts.是一个很好的链接如何完全卸载所有 kubeadm-parts。

kubeadm reset
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*   
sudo apt-get autoremove  
sudo rm -rf ~/.kube

For the sake of completeness, here is the reinstall as well:为了完整起见,这里也是重新安装:

# ensure legacy binaries are installed
sudo apt-get install -y iptables arptables ebtables

# switch to legacy versions
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy

# Install Kubernetes with kubeadm
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

#reboot



... and finally it worked! ……终于成功了!

After the clean reinstallation I did the following:干净重新安装后,我执行了以下操作:

# Initialize with correct cidr
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml

And then be astouned by the result:然后对结果感到震惊:

kubectl get pods --all-namespaces

在此处输入图片说明

On a site note: This also resolved the /run/flannel/subnet.env: no such file or directory -error I encountered prior to these steps when describing the uncreated coredns.在站点注释上:这也解决了/run/flannel/subnet.env: no such file or directory - 在描述未创建的 coredns 时我在这些步骤之前遇到的错误。

So I had the same issue as stated above.所以我遇到了与上述相同的问题。 For me, this was the perfect solution to fix this, but also other pods were stuck on either pending or ContainerCreating.对我来说,这是解决此问题的完美解决方案,但其他 pod 也被卡在挂起或 ContainerCreating 上。 In addition as the fix above, my flannel encountered an unnoticed error, so I needed to rerun the flannel create.另外作为上面的修复,我的法兰绒遇到了一个未被注意到的错误,所以我需要重新运行法兰绒创建。

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM