简体   繁体   English

如何修复Kubernetes Ingress Controller切断集群中的节点

[英]How to fix Kubernetes Ingress Controller cutting off nodes from cluster

I'm having some trouble installing an Ingress Controller in my on-prem cluster (created with Kubespray, running MetalLB to create LoadBalancer.). 我在本地群集(使用Kubespray创建,运行MetalLB来创建LoadBalancer)中安装Ingress控制器时遇到了一些麻烦。

I tried using nginx, traefik and kong but all got the same results. 我尝试使用nginx,traefik和kong,但是都得到了相同的结果。

I'm installing my the nginx helm chart using the following values.yaml: 我正在使用以下values.yaml安装nginx掌舵图:

controller:
  kind: DaemonSet
  nodeSelector:
    node-role.kubernetes.io/master: ""
  image:
    tag: 0.23.0
rbac:
  create: true

With command: 使用命令:

helm install --name nginx stable/nginx-ingress --values values.yaml --namespace ingress-nginx

When I deploy the ingress controller in the cluster, a service is created (eg nginx-ingress-controller for nginx). 当我在集群中部署入口控制器时,会创建一个服务(例如,nginx的nginx-ingress-controller)。 This service is of the type LoadBalancer and gets an external IP. 该服务属于LoadBalancer类型,并获得一个外部IP。

When this external IP is assigned, the node that's linked to this external IP is lost (status Not Ready). 分配此外部IP后,链接到该外部IP的节点将丢失(状态未就绪)。 However, when I check this node, it's still running, it's just cut off from the other nodes, it can't even ping them (No route found). 但是,当我检查该节点时,它仍在运行,只是与其他节点断开,它甚至无法ping通它们(找不到路由)。 When I remove the service (not the rest of the nginx helm chart), everything works and the Ingress works. 当我删除服务(而不是nginx掌舵图的其余部分)时,一切正常,而Ingress正常工作。 I also tried installing nginx/traefik/kong without a LoadBalancer using NodePorts or External IPs on the service, but I get the same result. 我还尝试在服务上使用NodePorts或外部IP在没有LoadBalancer的情况下安装nginx / traefik / kong,但是得到的结果是相同的。

Does anyone recognize this behaviour? 有人认识到这种行为吗? Why does the ingress still work, even when I remove the nginx-ingress-controller service? 为什么即使删除了nginx-ingress-controller服务,入口仍能正常工作?

After a long search, we finally found a working solution for this problem. 经过长时间的搜索,我们终于找到了解决该问题的可行方案。

As mentioned by @A_Suh, the pool of IPs that metallb uses, should contain IPs that are currently not used by one of the nodes in the cluster. 如@A_Suh所述,metallb使用的IP池应包含群集中的节点之一当前未使用的IP。 By adding a new IP range that's also configured in the DHCP server, metallb can use ARP to link one of the IPs to one of the nodes. 通过添加也在DHCP服务器中配置的新IP范围,metallb可以使用ARP将其中一个IP链接到其中一个节点。

For example in my 5 node cluster (kube11-15): When metallb gets the range 10.4.5.200/31 and allocates 10.4.5.200 for my nginx-ingress-controller, 10.4.5.200 is linked to kube12. 例如,在我的5节点集群(kube11-15)中:当metallb到达范围10.4.5.200/31并为我的nginx-ingress-controller分配10.4.5.200时,10.4.5.200被链接到kube12。 On ARP requests for 10.4.5.200, all 5 nodes respond with kube12 and trafic will be routed to this node. 在针对10.4.5.200的ARP请求中,所有5个节点均以kube12响应,并且流量将被路由到该节点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM