kube-proxy 如何处理 Pod 之间的 Service 持久连接？

Question

I've seen scenarios where requests from one workload, sent to a ClusterIP service for another workload with no affinities set, only get routed to a subset of the associated pods.我见过这样的场景：来自一个工作负载的请求，被发送到另一个没有设置关联的工作负载的 ClusterIP 服务，只被路由到关联 pod 的一个子集。 The Endpoints object for this service does show all of the pod IPs.此服务的端点 object 确实显示了所有 pod IP。

I did a little experiment to figure out what is happening.我做了一个小实验来弄清楚发生了什么。

Experiment实验

I set up minikube to have a "router" workload with 3 replicas sending requests to a "backend" workload also with 3 pods.我将 minikube 设置为具有 3 个副本的“路由器”工作负载，该工作负载也使用 3 个 Pod 向“后端”工作负载发送请求。 The router just sends a request to the service name like http://backend .路由器只是向服务名称发送请求，例如http://backend 。

I sent 100 requests to the router service via http://$MINIKUBE_IP:$NODE_PORT , since it's exposed as a NodePort service.我通过http://$MINIKUBE_IP:$NODE_PORT向路由器服务发送了 100 个请求，因为它作为NodePort服务公开。 Then I observed which backend pods actually handled requests.然后我观察了哪些后端 pod 实际处理了请求。 I repeated this test multiple times.我多次重复这个测试。

In most cases, only 2 backend pods handled any requests, with the occasional case where all 3 did.在大多数情况下，只有 2 个后端 pod 处理任何请求，偶尔会出现 3 个都处理的情况。 I didn't see any where all requests went to one in these experiments, though I have seen it happen before running other tests in AKS.在这些实验中，我没有看到所有请求都指向一个，尽管我在 AKS 中运行其他测试之前已经看到它发生了。

This led me to the theory that the router is keeping a persistent connection to the backend pod it connects to.这让我想到了路由器与它所连接的后端 pod 保持持久连接的理论。 Given there are 3 routers and 3 backends, there's an 11% chance all 3 routers "stick" to a single backend, a 67% chance that between the 3 routers, they stick to 2 of the backends, and a 22% chance that each router sticks to a different backend pod (1-to-1).鉴于有 3 个路由器和 3 个后端，所有 3 个路由器有 11% 的机会“粘”到一个后端，67% 的机会在 3 个路由器之间，它们粘到 2 个后端，每个有 22% 的机会路由器坚持到不同的后端 pod（1 对 1）。

Here's one possible combination of router-to-backend connections (out of 27 possible):这是路由器到后端连接的一种可能组合（共 27 种可能）：

Disabling HTTP Keep-Alive禁用 HTTP Keep-Alive

If I use a Transport disabling HTTP Keep-Alives in the router's http client, then any requests I make to the router are uniformly distributed between the different backends on every test run as desired.如果我在路由器的 http 客户端中使用禁用 HTTP Keep-Alives 的Transport ，那么我对路由器发出的任何请求都会根据需要在每次测试运行时均匀分布在不同的后端之间。

client := http.Client{
    Transport: &http.Transport{
        DisableKeepAlives: true,
    },
}
resp, err := client.Get("http://backend")

So the theory seems accurate.所以这个理论似乎是准确的。 But here's my question:但这是我的问题：

How does the router using HTTP KeepAlive / persistent connections actually result in a single connection between one router pod and one backend pod?使用 HTTP KeepAlive / 持久连接的路由器实际上如何导致一个路由器 pod 和一个后端 pod 之间的单个连接？
- There is a kube-proxy in the middle, so I'd expect any persistent connections to be between the router pod and kube-proxy as well as between kube-proxy and the backend pods.中间有一个kube-proxy ，所以我希望路由器 pod 和kube-proxy之间以及kube-proxy和后端 pod 之间有任何持久连接。
- Also, when the router does a DNS lookup, it's going to find the Cluster IP of the backend service every time, so how can it "stick" to a Pod if it doesn't know the Pod IP?另外，当路由器每次进行 DNS 查找时，它都会找到后端服务的Cluster IP ，那么如果它不知道 Pod ZA12A3079E14CED406E69BA5B21ZA12A3079E14CED406E69BA5B21，它如何“粘”到 Pod

Using Kubernetes 1.17.7.使用 Kubernetes 1.17.7。

Answer 1

This excellent article covers your question in detail.这篇优秀的文章详细介绍了您的问题。
Kubernetes Services indeed do not load balance long-lived TCP connections. Kubernetes 服务确实不会对长期存在的 TCP 连接进行负载平衡。

Under the hood Services (in most cases) use iptables to distribute connections between pods.幕后服务（在大多数情况下）使用iptables在 pod 之间分配连接。 But iptables wasn't designed as a balancer, it's a firewall.但是iptables的设计初衷不是平衡器，而是防火墙。 It isn't capable to do high-level load balancing.它无法进行高级负载平衡。
As a weak substitution iptables can create (or not create) a connection to a certain target with some probability - and thus can be used as L3/L4 balancer.作为弱替代， iptables可以以一定的概率创建（或不创建）到某个目标的连接 - 因此可以用作 L3/L4 平衡器。 This mechanism is what kube-proxy employs to somewhat imitate load balancing.这种机制是kube-proxy用来在一定程度上模仿负载平衡的机制。

Does iptables use round-robin? iptables 是否使用循环？

No, iptables is primarily used for firewalls, and it is not designed to do load balancing.不， iptables主要用于防火墙，它不是为做负载平衡而设计的。
However, you could craft a smart set of rules that could make iptables behave like a load balancer.但是，您可以制定一套智能规则，使iptables表现得像负载均衡器。
And this is precisely what happens in Kubernetes.这正是 Kubernetes 中发生的事情。

If you have three Pods, kube-proxy writes the following rules:如果你有 3 个 Pod， kube-proxy写如下规则：

select Pod 1 as the destination with a likelihood of 33%. select Pod 1 作为目的地，可能性为 33%。 Otherwise, move to the next rule否则，移动到下一条规则

choose Pod 2 as the destination with a probability of 50%.以 50% 的概率选择 Pod 2 作为目的地。 Otherwise, move to the following rule否则，请移至以下规则

select Pod 3 as the destination (no probability) select Pod 3 作为目的地（没有概率）

What happens when you use keep-alive with a Kubernetes Service?当您将 keep-alive 与 Kubernetes 服务一起使用时会发生什么？

Let's imagine that front-end and backend support keep-alive.让我们想象一下前端和后端都支持keep-alive。
You have a single instance of the front-end and three replicas for the backend.您有一个前端实例和三个后端副本。
The front-end makes the first request to the backend and opens the TCP connection.前端向后端发出第一个请求并打开 TCP 连接。
The request reaches the Service, and one of the Pod is selected as the destination.请求到达 Service，其中一个 Pod 被选为目的地。
The backend Pod replies and the front-end receives the response.后端 Pod 回复，前端接收响应。
But instead of closing the TCP connection, it is kept open for subsequent HTTP requests.但不是关闭 TCP 连接，而是为后续的 HTTP 请求保持打开状态。
What happens when the front-end issues more requests?当前端发出更多请求时会发生什么？
They are sent to the same Pod.它们被发送到同一个 Pod。

Isn't iptables supposed to distribute the traffic? iptables 不应该分配流量吗？
It is.这是。
There is a single TCP connection open, and iptables rule were invocated the first time.有一个单独的 TCP 连接打开，并且第一次调用了 iptables 规则。
One of the three Pods was selected as the destination.三个 Pod 中的一个被选为目的地。
Since all subsequent requests are channelled through the same TCP connection, iptables isn't invoked anymore.由于所有后续请求都通过相同的 TCP 连接进行引导，因此不再调用iptables 。

Also it's not quite correct to say that kube-proxy is in the middle .说kube-proxy 在中间也不完全正确。
It isn't - kube-proxy by itself doesn't manage any traffic.它不是 - kube-proxy本身不管理任何流量。
All that it does - it creates iptables rules.它所做的一切——它创建了 iptables 规则。
It's iptables who actually listens, distributes, does DNAT etc.真正监听、分发、执行 DNAT 等的是iptables 。

Similar question here .类似的问题在这里。

kube-proxy 如何处理 Pod 之间的 Service 持久连接？

问题描述

Experiment实验

Disabling HTTP Keep-Alive禁用 HTTP Keep-Alive

1 个解决方案

解决方案1
2 已采纳 2020-12-09 23:40:36

kube-proxy 如何处理 Pod 之间的 Service 持久连接？

问题描述

Experiment实验

Disabling HTTP Keep-Alive禁用 HTTP Keep-Alive

1 个解决方案

解决方案1 2 已采纳 2020-12-09 23:40:36

解决方案1
2 已采纳 2020-12-09 23:40:36