为什么kube-proxy无法将流量路由到另一个工作程序节点？

Question

I've deployed several different services and always get the same error. 我已经部署了几种不同的服务，并且总是收到相同的错误。

The service is reachable on the node port from the machine where the pod is running. 可以从运行Pod的计算机的节点端口上访问该服务。 On the two other nodes I get timeouts. 在其他两个节点上，我超时了。

The kube-proxy is running on all worker nodes and I can see in the logfiles from kube-proxy that the service port was added and the node port was opened. kube-proxy正在所有工作程序节点上运行，我可以从kube-proxy的日志文件中看到已添加服务端口并且已打开节点端口。 In this case I've deployed the stars demo from calico 在这种情况下，我已经部署了calico的stars demo

Kube-proxy log output: Kube-proxy日志输出：

Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.229458     659 service.go:309] Adding new service port "management-ui/management-ui:" at 10.32.0.133:9001/TCP
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.257483     659 proxier.go:1427] Opened local port "nodePort for management-ui/management-ui:" (:30002/tcp)

The kube-proxy is listening on the port 30002 kube-proxy正在侦听端口30002

root@kuben1:/tmp# netstat -lanp | grep 30002
tcp6       0      0 :::30002                :::*                    LISTEN      659/kube-proxy

There are also some iptable rules defined: 还定义了一些iptable规则：

root@kuben1:/tmp# iptables -L -t nat | grep management-ui
KUBE-MARK-MASQ  tcp  --  anywhere             anywhere             /* management-ui/management-ui: */ tcp dpt:30002
KUBE-SVC-MIYW5L3VT4JVLCIZ  tcp  --  anywhere             anywhere             /* management-ui/management-ui: */ tcp dpt:30002
KUBE-MARK-MASQ  tcp  -- !10.200.0.0/16        10.32.0.133          /* management-ui/management-ui: cluster IP */ tcp dpt:9001
KUBE-SVC-MIYW5L3VT4JVLCIZ  tcp  --  anywhere             10.32.0.133          /* management-ui/management-ui: cluster IP */ tcp dpt:9001

The interesting part is that I can reach the service IP from any worker node 有趣的是，我可以从任何工作节点访问服务IP

root@kubem1:/tmp# kubectl get svc -n management-ui
NAME            TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
management-ui   NodePort   10.32.0.133   <none>        9001:30002/TCP   52m

The service IP/port can be accessed from any worker node if I do a "curl http://10.32.0.133:9001 " 如果我执行“ curl http://10.32.0.133:9001 ”，则可以从任何工作程序节点访问服务IP /端口

I don't understand why kube-proxy does not "route" this properly... 我不明白为什么kube-proxy无法正确“路由”这个问题……
Has anyone a hint where I can find the error? 有没有人暗示我可以找到错误？

Here some cluster specs: 以下是一些群集规格：

This is a hand build cluster inspired by Kelsey Hightower's "kubernetes the hard way" guide. 这是一个手工构建集群，其灵感来自于Kelsey Hightower的“ kubernetes艰辛之路”指南。

6 Nodes (3 master: 3 worker) local vms 6个节点（3个主服务器：3个工作器）本地虚拟机
OS: Ubuntu 18.04 操作系统：Ubuntu 18.04
K8s: v1.13.0 K8s：v1.13.0
Docker: 18.9.3 码头工人：18.9.3
Cni: calico 中国：印花布

Component status on the master nodes looks okay 主节点上的组件状态看起来还不错

root@kubem1:/tmp# kubectl get componentstatus
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok                  
scheduler            Healthy   ok                  
etcd-0               Healthy   {"health":"true"}   
etcd-1               Healthy   {"health":"true"}   
etcd-2               Healthy   {"health":"true"}

The worker nodes are looking okay if I trust kubectl 如果我相信kubectl，工作节点看起来还可以

root@kubem1:/tmp# kubectl get nodes -o wide
NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
kuben1   Ready    <none>   39d   v1.13.0   192.168.178.77   <none>        Ubuntu 18.04.2 LTS   4.15.0-46-generic   docker://18.9.3
kuben2   Ready    <none>   39d   v1.13.0   192.168.178.78   <none>        Ubuntu 18.04.2 LTS   4.15.0-46-generic   docker://18.9.3
kuben3   Ready    <none>   39d   v1.13.0   192.168.178.79   <none>        Ubuntu 18.04.2 LTS   4.15.0-46-generic   docker://18.9.3

As asked by P Ekambaram: 正如P Ekambaram所问：

root@kubem1:/tmp# kubectl get po -n kube-system
NAME                                   READY   STATUS    RESTARTS   AGE
calico-node-bgjdg                      1/1     Running   5          40d
calico-node-nwkqw                      1/1     Running   5          40d
calico-node-vrwn4                      1/1     Running   5          40d
coredns-69cbb76ff8-fpssw               1/1     Running   5          40d
coredns-69cbb76ff8-tm6r8               1/1     Running   5          40d
kubernetes-dashboard-57df4db6b-2xrmb   1/1     Running   5          40d

Answer 1

I've found a solution for my "Problem". 我为“问题”找到了解决方案。
This behavior was caused by a change in Docker v1.13.x and the issue was fixed in kubernetes with version 1.8. 此行为是由Docker v1.13.x中的更改引起的，此问题已在版本1.8的kubernetes中修复。

The easy solution was to change the forward rules via iptables. 一种简单的解决方案是通过iptables更改转发规则。
Run the following cmd on all worker nodes: "iptables -A FORWARD -j ACCEPT" 在所有工作节点上运行以下cmd：“ iptables -A FORWARD -j ACCEPT”

To fix it the right way i had to tell the kube-proxy the cidr for the pods. 为了正确地解决它，我不得不告诉kube-proxy豆荚的cidr。 Theoretical that could be solved in two ways: 理论上可以通过两种方式解决：

Add "--cluster-cidr=10.0.0.0/16" as argument to the kube-proxy command line(in my case in the systemd service file) 将“ --cluster-cidr = 10.0.0.0 / 16”添加为kube-proxy命令行的参数（在我的情况下为systemd服务文件）
Add 'clusterCIDR: "10.0.0.0/16"' to the kubeconfig file for kube-proxy 将'clusterCIDR：“ 10.0.0.0/16”'添加到kubeconfig文件以获取kube-proxy

In my case the cmd line argument doesn't had any effect. 就我而言，cmd line参数没有任何作用。
As i've added the line to my kubeconfig file and restarted the kube-proxy on all worker nodes everything works well. 当我将行添加到我的kubeconfig文件中并在所有工作节点上重新启动kube-proxy时，一切正常。

Here is the github merge request for this "FORWARD" issue: link 这是此“ FORWARD”问题的github合并请求：链接

为什么kube-proxy无法将流量路由到另一个工作程序节点？

问题描述

1 个解决方案

解决方案1
1 2019-03-11 16:36:09

为什么kube-proxy无法将流量路由到另一个工作程序节点？

问题描述

1 个解决方案

解决方案1 1 2019-03-11 16:36:09

解决方案1
1 2019-03-11 16:36:09