[英]Kubernetes pod cannot access service which is running on another node
I'm trying to setup a k8s cluster.我正在尝试设置一个 k8s 集群。 I've already deployed an ingress controller and a cert manager.
我已经部署了一个入口控制器和一个证书管理器。 However, currently I'm trying to deploy a first small service (Spring Cloud Config Server) and noticed that my pods cannot access services that are running on other nodes.
但是,目前我正在尝试部署第一个小服务(Spring Cloud Config Server)并注意到我的 pod 无法访问在其他节点上运行的服务。
The pod tries to resolve a dns name which is publicly available and fails in this attempt due to a timeout while reaching the coredns-service. pod 尝试解析一个公共可用的 dns 名称,但由于到达 coredns-service 时超时而导致此尝试失败。
My Cluster looks like this:我的集群看起来像这样:
Nodes:节点:
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-master Ready master 6d17h v1.17.2 10.0.0.10 <none> CentOS Linux 7 (Core) 5.5.0-1.el7.elrepo.x86_64 docker://19.3.5
node-1 Ready <none> 6d17h v1.17.2 10.0.0.11 <none> CentOS Linux 7 (Core) 5.5.0-1.el7.elrepo.x86_64 docker://19.3.5
node-2 Ready <none> 6d17h v1.17.2 10.0.0.12 <none> CentOS Linux 7 (Core) 5.5.0-1.el7.elrepo.x86_64 docker://19.3.5
Pods:豆荚:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager cert-manager-c6cb4cbdf-kcdhx 1/1 Running 1 23h 10.244.2.22 node-2 <none> <none>
cert-manager cert-manager-cainjector-76f7596c4-5f2h8 1/1 Running 3 23h 10.244.1.21 node-1 <none> <none>
cert-manager cert-manager-webhook-8575f88c85-b7vcx 1/1 Running 1 23h 10.244.2.23 node-2 <none> <none>
ingress-nginx ingress-nginx-5kghx 1/1 Running 1 6d16h 10.244.1.23 node-1 <none> <none>
ingress-nginx ingress-nginx-kvh5b 1/1 Running 1 6d16h 10.244.0.6 k8s-master <none> <none>
ingress-nginx ingress-nginx-rrq4r 1/1 Running 1 6d16h 10.244.2.21 node-2 <none> <none>
project1 config-server-7897679d5d-q2hmr 0/1 CrashLoopBackOff 1 103m 10.244.1.22 node-1 <none> <none>
project1 config-server-7897679d5d-vvn6s 1/1 Running 1 21h 10.244.2.24 node-2 <none> <none>
kube-system coredns-6955765f44-7ttww 1/1 Running 2 6d17h 10.244.2.20 node-2 <none> <none>
kube-system coredns-6955765f44-b57kq 1/1 Running 2 6d17h 10.244.2.19 node-2 <none> <none>
kube-system etcd-k8s-master 1/1 Running 5 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-apiserver-k8s-master 1/1 Running 5 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-controller-manager-k8s-master 1/1 Running 8 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-flannel-ds-amd64-f2lw8 1/1 Running 11 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-flannel-ds-amd64-kt6ts 1/1 Running 11 6d17h 10.0.0.11 node-1 <none> <none>
kube-system kube-flannel-ds-amd64-pb8r9 1/1 Running 12 6d17h 10.0.0.12 node-2 <none> <none>
kube-system kube-proxy-b64jt 1/1 Running 5 6d17h 10.0.0.12 node-2 <none> <none>
kube-system kube-proxy-bltzm 1/1 Running 5 6d17h 10.0.0.10 k8s-master <none> <none>
kube-system kube-proxy-fl9xb 1/1 Running 5 6d17h 10.0.0.11 node-1 <none> <none>
kube-system kube-scheduler-k8s-master 1/1 Running 7 6d17h 10.0.0.10 k8s-master <none> <none>
Services:服务:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
cert-manager cert-manager ClusterIP 10.102.188.88 <none> 9402/TCP 23h app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager cert-manager-webhook ClusterIP 10.96.98.94 <none> 443/TCP 23h app.kubernetes.io/instance=cert-manager,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=webhook,app=webhook
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d17h <none>
ingress-nginx ingress-nginx NodePort 10.101.135.13 <none> 80:31080/TCP,443:31443/TCP 6d16h app.kubernetes.io/name=ingress-nginx,app.kubernetes.io/part-of=ingress-nginx
project1 config-server ClusterIP 10.99.94.55 <none> 80/TCP 24h app=config-server,release=config-server
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6d17h k8s-app=kube-dns
I've noticed that my newly deployed service has no access to the coredns service on node-1.我注意到我新部署的服务无法访问 node-1 上的 coredns 服务。 My coredns service has two pods of which no one is running on node-1.
我的 coredns 服务有两个 pod,其中没有人在 node-1 上运行。 If I understand it correctly it should be possible to access the coredns pods via the service ip (10.96.0.10) on every node whether or not it runs on it.
如果我理解正确,应该可以通过每个节点上的服务 ip (10.96.0.10) 访问 coredns pod,无论它是否在其上运行。
I've already noticed that the routing tables on the nodes look like this:我已经注意到节点上的路由表如下所示:
default via 172.31.1.1 dev eth0
10.0.0.0/16 via 10.0.0.1 dev eth1 proto static
10.0.0.1 dev eth1 scope link
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.31.1.1 dev eth0 scope link
So as you see there is no route to the 10.96.0.0/16 network.因此,如您所见,没有通往 10.96.0.0/16 网络的路由。
I've already checked the ports and the net.bridge.bridge-nf-call-iptables
and net.bridge.bridge-nf-call-ip6tables
sysctl values.我已经检查了端口和
net.bridge.bridge-nf-call-iptables
和net.bridge.bridge-nf-call-ip6tables
sysctl 值。 All flannel ports are reachable and should be able to receive traffic over the 10.0.0.0/24 network.所有法兰绒端口都可以访问,并且应该能够通过 10.0.0.0/24 网络接收流量。
Here is the output of iptables -L
on the node-1:这是节点 1 上
iptables -L
的输出:
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL all -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere tcp dpt:22
ACCEPT icmp -- anywhere anywhere
ACCEPT udp -- anywhere anywhere udp spt:ntp
ACCEPT tcp -- 10.0.0.0/24 anywhere
ACCEPT udp -- 10.0.0.0/24 anywhere
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
LOG all -- anywhere anywhere limit: avg 15/min burst 5 LOG level debug prefix "Dropped by firewall: "
DROP all -- anywhere anywhere
Chain FORWARD (policy DROP)
target prot opt source destination
KUBE-FORWARD all -- anywhere anywhere /* kubernetes forwarding rules */
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
DOCKER-USER all -- anywhere anywhere
DOCKER-ISOLATION-STAGE-1 all -- anywhere anywhere
ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED
DOCKER all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- anywhere anywhere
ACCEPT all -- 10.244.0.0/16 anywhere
ACCEPT all -- anywhere 10.244.0.0/16
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL all -- anywhere anywhere
ACCEPT udp -- anywhere anywhere udp dpt:ntp
Chain DOCKER (1 references)
target prot opt source destination
Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target prot opt source destination
DOCKER-ISOLATION-STAGE-2 all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target prot opt source destination
DROP all -- anywhere anywhere
RETURN all -- anywhere anywhere
Chain DOCKER-USER (1 references)
target prot opt source destination
RETURN all -- anywhere anywhere
Chain KUBE-EXTERNAL-SERVICES (1 references)
target prot opt source destination
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- anywhere anywhere /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
Chain KUBE-FORWARD (1 references)
target prot opt source destination
DROP all -- anywhere anywhere ctstate INVALID
ACCEPT all -- anywhere anywhere /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT all -- 10.244.0.0/16 anywhere /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT all -- anywhere 10.244.0.0/16 /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED
Chain KUBE-KUBELET-CANARY (0 references)
target prot opt source destination
Chain KUBE-SERVICES (3 references)
target prot opt source destination
REJECT tcp -- anywhere 10.99.94.55 /* project1/config-server:http has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable
The cluster is deployed via ansible.集群是通过 ansible 部署的。
I'm sure I'm doing anything wrong.我确定我做错了什么。 However I couldn't see it.
然而我看不出来。 Can somebody help me here?
有人可以帮我吗?
Thanks谢谢
I experienced the same issue on Kubernetes with the Calico network stack under Debian Buster.我在 Kubernetes 和 Debian Buster 下的 Calico 网络堆栈上遇到了同样的问题。
Checking a lot of configs and parameters, I ended up with getting it to work by changing the policy for the forward rule to ACCEPT
.检查了很多配置和参数,我最终通过将转发规则的策略更改为
ACCEPT
使其工作。 This made it clear that the issue is somewhere around the firewall.这清楚地表明问题出在防火墙附近。 Due to security considerations I changed it back.
出于安全考虑,我把它改回来了。
Running iptables -L
gave me the following unveiling warning: # Warning: iptables-legacy tables present, use iptables-legacy to see them
运行
iptables -L
给了我以下揭幕警告: # Warning: iptables-legacy tables present, use iptables-legacy to see them
The output given by the list command does not contain any Calico rules. list 命令给出的输出不包含任何 Calico 规则。 Running
iptables-legacy -L
showed me the Calico rules, so it seems obvious now why it didn't work.运行
iptables-legacy -L
向我展示了 Calico 规则,所以现在它为什么不起作用似乎很明显。 So Calico seems to use the legacy interface.所以 Calico 似乎使用了遗留接口。
The issue is the change in Debian to iptables-nft
in the alternatives, you can check via:问题是 Debian 中的
iptables-nft
在替代方案中的更改,您可以通过以下方式检查:
ls -l /etc/alternatives | grep iptables
Doing the following:执行以下操作:
update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives --set arptables /usr/sbin/arptables-legacy
update-alternatives --set ebtables /usr/sbin/ebtables-legacy
Now it works all fine!现在一切正常! Thanks to Long at the Kubernetes Slack channel for pointing the route to solving it.
感谢 Kubernetes Slack 频道的Long指出解决问题的途径。
I've followed the suggestion from Dawid Kruk and tried it with kubespray.我遵循了 Dawid Kruk 的建议,并使用 kubespray 进行了尝试。 Now it works as intended.
现在它按预期工作。 If I'm able to figure out were my mistake was, I would post it here for the future.
如果我能弄清楚我的错误是什么,我会在这里发布以备将来使用。
Edit: Solution编辑:解决方案
My firewall rules were too restrictive.我的防火墙规则太严格了。 Flannel creates a new interfaces and since my rules are not restricted to my main interface nearly every package from flannel was dropped.
Flannel 创建了一个新的接口,因为我的规则不限于我的主接口,几乎所有来自 flannel 的包都被删除了。 If I had viewed the journalctl more attentive, I've found the issue earlier.
如果我更仔细地查看 journalctl,我会更早发现问题。
I am not sure what is the exact issue here.我不确定这里的确切问题是什么。 But I would like to clarify few things to make things more clear.
但我想澄清一些事情以使事情更清楚。
Cluster IPs are virtual IPs.集群 IP 是虚拟 IP。 They are not routed via routing tables.
它们不通过路由表路由。 Instead, for each cluster IP, kube-proxy adds NAT table entries on its respective node.
相反,对于每个集群 IP,kube-proxy 在其各自的节点上添加 NAT 表条目。 To check those entries, execute command
sudo iptables -t nat -L -n -v
.要检查这些条目,请执行命令
sudo iptables -t nat -L -n -v
。
Now, core dns pods are exposed via a service cluster IP.现在,核心 dns pod 通过服务集群 IP 公开。 Hence, whenever a packet comes to a node having destination address as cluster IP, its destination address is changed to the pod IP address which is routable from all the nodes (thanks to flannel).
因此,每当数据包到达目标地址为集群 IP 的节点时,其目标地址将更改为可从所有节点路由的 pod IP 地址(感谢 flannel)。 This change in destination address is done via a DNAT target entry in the iptables which looks like below.
目标地址的这种更改是通过 iptables 中的 DNAT 目标条目完成的,如下所示。
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-SVC-ERIFXISQEP7F7OF4 tcp -- anywhere 10.96.0.10 /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain
Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
target prot opt source destination
KUBE-SEP-IT2ZTR26TO4XFPTO all -- anywhere anywhere statistic mode random probability 0.50000000000
KUBE-SEP-ZXMNUKOKXUTL2MK2 all -- anywhere anywhere
Chain KUBE-SEP-IT2ZTR26TO4XFPTO (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.0.2 anywhere
DNAT tcp -- anywhere anywhere tcp to:10.244.0.2:53
Chain KUBE-SEP-ZXMNUKOKXUTL2MK2 (1 references)
target prot opt source destination
KUBE-MARK-MASQ all -- 10.244.0.3 anywhere
DNAT tcp -- anywhere anywhere tcp to:10.244.0.3:53
Hence, if you can re-simulate the issue, try checking nat table entries to see if everything is proper.因此,如果您可以重新模拟该问题,请尝试检查 nat 表条目以查看是否一切正常。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.