Kubernetes pod 无法访问在另一个节点上运行的服务

Question

我正在尝试设置一个 k8s 集群。 我已经部署了一个入口控制器和一个证书管理器。 但是，目前我正在尝试部署第一个小服务（Spring Cloud Config Server）并注意到我的 pod 无法访问在其他节点上运行的服务。

pod 尝试解析一个公共可用的 dns 名称，但由于到达 coredns-service 时超时而导致此尝试失败。

我的集群看起来像这样：

节点：

NAME         STATUS   ROLES    AGE     VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION              CONTAINER-RUNTIME
k8s-master   Ready    master   6d17h   v1.17.2   10.0.0.10     <none>        CentOS Linux 7 (Core)   5.5.0-1.el7.elrepo.x86_64   docker://19.3.5
node-1       Ready    <none>   6d17h   v1.17.2   10.0.0.11     <none>        CentOS Linux 7 (Core)   5.5.0-1.el7.elrepo.x86_64   docker://19.3.5
node-2       Ready    <none>   6d17h   v1.17.2   10.0.0.12     <none>        CentOS Linux 7 (Core)   5.5.0-1.el7.elrepo.x86_64   docker://19.3.5

豆荚：

NAMESPACE       NAME                                      READY   STATUS             RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
cert-manager    cert-manager-c6cb4cbdf-kcdhx              1/1     Running            1          23h     10.244.2.22   node-2       <none>           <none>
cert-manager    cert-manager-cainjector-76f7596c4-5f2h8   1/1     Running            3          23h     10.244.1.21   node-1       <none>           <none>
cert-manager    cert-manager-webhook-8575f88c85-b7vcx     1/1     Running            1          23h     10.244.2.23   node-2       <none>           <none>
ingress-nginx   ingress-nginx-5kghx                       1/1     Running            1          6d16h   10.244.1.23   node-1       <none>           <none>
ingress-nginx   ingress-nginx-kvh5b                       1/1     Running            1          6d16h   10.244.0.6    k8s-master   <none>           <none>
ingress-nginx   ingress-nginx-rrq4r                       1/1     Running            1          6d16h   10.244.2.21   node-2       <none>           <none>
project1        config-server-7897679d5d-q2hmr            0/1     CrashLoopBackOff   1          103m    10.244.1.22   node-1       <none>           <none>
project1        config-server-7897679d5d-vvn6s            1/1     Running            1          21h     10.244.2.24   node-2       <none>           <none>
kube-system     coredns-6955765f44-7ttww                  1/1     Running            2          6d17h   10.244.2.20   node-2       <none>           <none>
kube-system     coredns-6955765f44-b57kq                  1/1     Running            2          6d17h   10.244.2.19   node-2       <none>           <none>
kube-system     etcd-k8s-master                           1/1     Running            5          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-apiserver-k8s-master                 1/1     Running            5          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-controller-manager-k8s-master        1/1     Running            8          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-flannel-ds-amd64-f2lw8               1/1     Running            11         6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-flannel-ds-amd64-kt6ts               1/1     Running            11         6d17h   10.0.0.11     node-1       <none>           <none>
kube-system     kube-flannel-ds-amd64-pb8r9               1/1     Running            12         6d17h   10.0.0.12     node-2       <none>           <none>
kube-system     kube-proxy-b64jt                          1/1     Running            5          6d17h   10.0.0.12     node-2       <none>           <none>
kube-system     kube-proxy-bltzm                          1/1     Running            5          6d17h   10.0.0.10     k8s-master   <none>           <none>
kube-system     kube-proxy-fl9xb                          1/1     Running            5          6d17h   10.0.0.11     node-1       <none>           <none>
kube-system     kube-scheduler-k8s-master                 1/1     Running            7          6d17h   10.0.0.10     k8s-master   <none>           <none>

服务：

NAMESPACE       NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE     SELECTOR
cert-manager    cert-manager                         ClusterIP   10.102.188.88    <none>        9402/TCP                     23h     app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager    cert-manager-webhook                 ClusterIP   10.96.98.94      <none>        443/TCP                      23h     app.kubernetes.io/instance=cert-manager,app.kubernetes.io/managed-by=Helm,app.kubernetes.io/name=webhook,app=webhook
default         kubernetes                           ClusterIP   10.96.0.1        <none>        443/TCP                      6d17h   <none>
ingress-nginx   ingress-nginx                        NodePort    10.101.135.13    <none>        80:31080/TCP,443:31443/TCP   6d16h   app.kubernetes.io/name=ingress-nginx,app.kubernetes.io/part-of=ingress-nginx
project1        config-server                        ClusterIP   10.99.94.55      <none>        80/TCP                       24h     app=config-server,release=config-server
kube-system     kube-dns                             ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP       6d17h   k8s-app=kube-dns

我注意到我新部署的服务无法访问 node-1 上的 coredns 服务。 我的 coredns 服务有两个 pod，其中没有人在 node-1 上运行。 如果我理解正确，应该可以通过每个节点上的服务 ip (10.96.0.10) 访问 coredns pod，无论它是否在其上运行。

我已经注意到节点上的路由表如下所示：

default via 172.31.1.1 dev eth0 
10.0.0.0/16 via 10.0.0.1 dev eth1 proto static 
10.0.0.1 dev eth1 scope link 
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink 
10.244.1.0/24 dev cni0 proto kernel scope link src 10.244.1.1 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 
172.31.1.1 dev eth0 scope link

因此，如您所见，没有通往 10.96.0.0/16 网络的路由。

我已经检查了端口和net.bridge.bridge-nf-call-iptables和net.bridge.bridge-nf-call-ip6tables sysctl 值。 所有法兰绒端口都可以访问，并且应该能够通过 10.0.0.0/24 网络接收流量。

这是节点 1 上iptables -L的输出：

Chain INPUT (policy ACCEPT)
target                  prot opt source               destination         
KUBE-SERVICES           all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL           all  --  anywhere             anywhere            
ACCEPT                  tcp  --  anywhere             anywhere             tcp dpt:22
ACCEPT                  icmp --  anywhere             anywhere            
ACCEPT                  udp  --  anywhere             anywhere             udp spt:ntp
ACCEPT                  tcp  --  10.0.0.0/24          anywhere            
ACCEPT                  udp  --  10.0.0.0/24          anywhere            
ACCEPT                  all  --  anywhere             anywhere             state RELATED,ESTABLISHED
LOG                     all  --  anywhere             anywhere             limit: avg 15/min burst 5 LOG level debug prefix "Dropped by firewall: "
DROP                    all  --  anywhere             anywhere            

Chain FORWARD (policy DROP)
target                    prot opt source               destination         
KUBE-FORWARD              all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES             all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
DOCKER-USER               all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT                    all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER                    all  --  anywhere             anywhere            
ACCEPT                    all  --  anywhere             anywhere            
ACCEPT                    all  --  anywhere             anywhere            
ACCEPT                    all  --  10.244.0.0/16        anywhere            
ACCEPT                    all  --  anywhere             10.244.0.0/16       

Chain OUTPUT (policy ACCEPT)
target         prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            
ACCEPT         udp  --  anywhere             anywhere             udp dpt:ntp

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target                    prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN                    all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Chain KUBE-EXTERNAL-SERVICES (1 references)
target     prot opt source               destination         

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  10.244.0.0/16        anywhere             /* kubernetes forwarding conntrack pod source rule */ ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             10.244.0.0/16        /* kubernetes forwarding conntrack pod destination rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-SERVICES (3 references)
target     prot opt source               destination         
REJECT     tcp  --  anywhere             10.99.94.55          /* project1/config-server:http has no endpoints */ tcp dpt:http reject-with icmp-port-unreachable

集群是通过 ansible 部署的。

我确定我做错了什么。 然而我看不出来。 有人可以帮我吗？

谢谢

Answer 1

我在 Kubernetes 和 Debian Buster 下的 Calico 网络堆栈上遇到了同样的问题。

检查了很多配置和参数，我最终通过将转发规则的策略更改为ACCEPT使其工作。 这清楚地表明问题出在防火墙附近。 出于安全考虑，我把它改回来了。

运行iptables -L给了我以下揭幕警告： # Warning: iptables-legacy tables present, use iptables-legacy to see them

list 命令给出的输出不包含任何 Calico 规则。 运行iptables-legacy -L向我展示了 Calico 规则，所以现在它为什么不起作用似乎很明显。 所以 Calico 似乎使用了遗留接口。

问题是 Debian 中的iptables-nft在替代方案中的更改，您可以通过以下方式检查：

ls -l /etc/alternatives | grep iptables

执行以下操作：

update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives --set arptables /usr/sbin/arptables-legacy
update-alternatives --set ebtables /usr/sbin/ebtables-legacy

现在一切正常！ 感谢 Kubernetes Slack 频道的Long指出解决问题的途径。

Answer 2

我遵循了 Dawid Kruk 的建议，并使用 kubespray 进行了尝试。 现在它按预期工作。 如果我能弄清楚我的错误是什么，我会在这里发布以备将来使用。

编辑：解决方案

我的防火墙规则太严格了。 Flannel 创建了一个新的接口，因为我的规则不限于我的主接口，几乎所有来自 flannel 的包都被删除了。 如果我更仔细地查看 journalctl，我会更早发现问题。

Answer 3

我不确定这里的确切问题是什么。 但我想澄清一些事情以使事情更清楚。

集群 IP 是虚拟 IP。 它们不通过路由表路由。 相反，对于每个集群 IP，kube-proxy 在其各自的节点上添加 NAT 表条目。 要检查这些条目，请执行命令sudo iptables -t nat -L -n -v 。

现在，核心 dns pod 通过服务集群 IP 公开。 因此，每当数据包到达目标地址为集群 IP 的节点时，其目标地址将更改为可从所有节点路由的 pod IP 地址（感谢 flannel）。 目标地址的这种更改是通过 iptables 中的 DNAT 目标条目完成的，如下所示。

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination   
KUBE-SVC-ERIFXISQEP7F7OF4  tcp  --  anywhere             10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:domain

Chain KUBE-SVC-ERIFXISQEP7F7OF4 (1 references)
target     prot opt source               destination         
KUBE-SEP-IT2ZTR26TO4XFPTO  all  --  anywhere             anywhere             statistic mode random probability 0.50000000000
KUBE-SEP-ZXMNUKOKXUTL2MK2  all  --  anywhere             anywhere           

Chain KUBE-SEP-IT2ZTR26TO4XFPTO (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  all  --  10.244.0.2           anywhere            
DNAT       tcp  --  anywhere             anywhere             tcp to:10.244.0.2:53

Chain KUBE-SEP-ZXMNUKOKXUTL2MK2 (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  all  --  10.244.0.3           anywhere            
DNAT       tcp  --  anywhere             anywhere             tcp to:10.244.0.3:53

因此，如果您可以重新模拟该问题，请尝试检查 nat 表条目以查看是否一切正常。

Kubernetes pod 无法访问在另一个节点上运行的服务

问题描述

3 个解决方案

解决方案1
2 2020-05-15 13:29:01

解决方案2
1 2020-02-04 18:36:13

解决方案3
1 2020-02-08 17:56:35

Kubernetes pod 无法访问在另一个节点上运行的服务

问题描述

3 个解决方案

解决方案1 2 2020-05-15 13:29:01

解决方案2 1 2020-02-04 18:36:13

解决方案3 1 2020-02-08 17:56:35

解决方案1
2 2020-05-15 13:29:01

解决方案2
1 2020-02-04 18:36:13

解决方案3
1 2020-02-08 17:56:35