[英]Why doesn't kube-proxy route traffic to another worker node?
I've deployed several different services and always get the same error. 我已经部署了几种不同的服务,并且总是收到相同的错误。
The service is reachable on the node port from the machine where the pod is running. 可以从运行Pod的计算机的节点端口上访问该服务。 On the two other nodes I get timeouts.
在其他两个节点上,我超时了。
The kube-proxy is running on all worker nodes and I can see in the logfiles from kube-proxy that the service port was added and the node port was opened. kube-proxy正在所有工作程序节点上运行,我可以从kube-proxy的日志文件中看到已添加服务端口并且已打开节点端口。 In this case I've deployed the stars demo from calico
在这种情况下,我已经部署了calico的stars demo
Kube-proxy log output: Kube-proxy日志输出:
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.229458 659 service.go:309] Adding new service port "management-ui/management-ui:" at 10.32.0.133:9001/TCP
Mar 11 10:25:10 kuben1 kube-proxy[659]: I0311 10:25:10.257483 659 proxier.go:1427] Opened local port "nodePort for management-ui/management-ui:" (:30002/tcp)
The kube-proxy is listening on the port 30002 kube-proxy正在侦听端口30002
root@kuben1:/tmp# netstat -lanp | grep 30002
tcp6 0 0 :::30002 :::* LISTEN 659/kube-proxy
There are also some iptable rules defined: 还定义了一些iptable规则:
root@kuben1:/tmp# iptables -L -t nat | grep management-ui
KUBE-MARK-MASQ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere anywhere /* management-ui/management-ui: */ tcp dpt:30002
KUBE-MARK-MASQ tcp -- !10.200.0.0/16 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
KUBE-SVC-MIYW5L3VT4JVLCIZ tcp -- anywhere 10.32.0.133 /* management-ui/management-ui: cluster IP */ tcp dpt:9001
The interesting part is that I can reach the service IP from any worker node 有趣的是,我可以从任何工作节点访问服务IP
root@kubem1:/tmp# kubectl get svc -n management-ui
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
management-ui NodePort 10.32.0.133 <none> 9001:30002/TCP 52m
The service IP/port can be accessed from any worker node if I do a "curl http://10.32.0.133:9001 " 如果我执行“ curl http://10.32.0.133:9001 ”,则可以从任何工作程序节点访问服务IP /端口
I don't understand why kube-proxy does not "route" this properly... 我不明白为什么kube-proxy无法正确“路由”这个问题……
Has anyone a hint where I can find the error? 有没有人暗示我可以找到错误?
Here some cluster specs: 以下是一些群集规格:
This is a hand build cluster inspired by Kelsey Hightower's "kubernetes the hard way" guide. 这是一个手工构建集群,其灵感来自于Kelsey Hightower的“ kubernetes艰辛之路”指南。
Component status on the master nodes looks okay 主节点上的组件状态看起来还不错
root@kubem1:/tmp# kubectl get componentstatus
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
The worker nodes are looking okay if I trust kubectl 如果我相信kubectl,工作节点看起来还可以
root@kubem1:/tmp# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kuben1 Ready <none> 39d v1.13.0 192.168.178.77 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben2 Ready <none> 39d v1.13.0 192.168.178.78 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
kuben3 Ready <none> 39d v1.13.0 192.168.178.79 <none> Ubuntu 18.04.2 LTS 4.15.0-46-generic docker://18.9.3
As asked by P Ekambaram: 正如P Ekambaram所问:
root@kubem1:/tmp# kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
calico-node-bgjdg 1/1 Running 5 40d
calico-node-nwkqw 1/1 Running 5 40d
calico-node-vrwn4 1/1 Running 5 40d
coredns-69cbb76ff8-fpssw 1/1 Running 5 40d
coredns-69cbb76ff8-tm6r8 1/1 Running 5 40d
kubernetes-dashboard-57df4db6b-2xrmb 1/1 Running 5 40d
I've found a solution for my "Problem". 我为“问题”找到了解决方案。
This behavior was caused by a change in Docker v1.13.x and the issue was fixed in kubernetes with version 1.8. 此行为是由Docker v1.13.x中的更改引起的,此问题已在版本1.8的kubernetes中修复。
The easy solution was to change the forward rules via iptables. 一种简单的解决方案是通过iptables更改转发规则。
Run the following cmd on all worker nodes: "iptables -A FORWARD -j ACCEPT" 在所有工作节点上运行以下cmd:“ iptables -A FORWARD -j ACCEPT”
To fix it the right way i had to tell the kube-proxy the cidr for the pods. 为了正确地解决它,我不得不告诉kube-proxy豆荚的cidr。 Theoretical that could be solved in two ways:
理论上可以通过两种方式解决:
In my case the cmd line argument doesn't had any effect. 就我而言,cmd line参数没有任何作用。
As i've added the line to my kubeconfig file and restarted the kube-proxy on all worker nodes everything works well. 当我将行添加到我的kubeconfig文件中并在所有工作节点上重新启动kube-proxy时,一切正常。
Here is the github merge request for this "FORWARD" issue: link 这是此“ FORWARD”问题的github合并请求: 链接
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.