简体   繁体   English

kube-proxy 和 nginx 后端之间的连接被拒绝

[英]Connection refused between kube-proxy and nginx backend

We are regularly seeing connection refused errors on a bespoke NGINX reverse proxy installed in AWS EKS.我们经常在 AWS EKS 中安装的定制 NGINX 反向代理上看到连接被拒绝错误。 (see below for kubernetes template) (kubernetes 模板见下文)

Initially, we thought it was an issue with the load balancer.最初,我们认为这是负载均衡器的问题。 However, upon further investigation, there seems to be an issue between the kube-proxy and the nginx Pod.但是,经过进一步调查,kube-proxy 和 nginx Pod 之间似乎存在问题。

When I run repeated wget IP:PORT against just the node's internal IP and the desired node port that serves, we will see bad request several times and eventually, a failed: Connection refused当我对节点的内部 IP 和所需的节点端口重复运行wget IP:PORT时,我们将多次看到错误请求,最终failed: Connection refused

Whereas when I run a request just against the Pod IP and Port, I can not get this connection refused.而当我仅针对 Pod IP 和端口运行请求时,我无法拒绝此连接。

Example wget output示例 wget output

Fail:失败:

wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31--  http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... failed: Connection refused.

Success:成功:

wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31--  http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... connected.
HTTP request sent, awaiting response... 400 Bad Request
2020-06-26 01:15:31 ERROR 400: Bad Request.

In the logs on the NGINX service, we don't see the connection refused the request, whereas we do see the other BAD REQUEST ones.在 NGINX 服务的日志中,我们没有看到连接拒绝了请求,但我们确实看到了其他 BAD REQUEST 请求。

I have read about several issues regarding kube-proxy and I am interested in other insights to improve this situation.我已经阅读了有关kube-proxy几个问题,并且我对改善这种情况的其他见解感兴趣。

eg: https://github.com/kubernetes/kubernetes/issues/38456例如: https://github.com/kubernetes/kubernetes/issues/38456

Any help much appreciated.非常感谢任何帮助。

Kubernetes Template Kubernetes 模板

##
# Main nginx deployment. Requires updated tag potentially for
# docker image
##
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-lua-ssl-deployment
  labels:
    service: https-custom-domains
spec:
  selector:
    matchLabels:
      app: nginx-lua-ssl
  replicas: 5
  template:
    metadata:
      labels:
        app: nginx-lua-ssl
        service: https-custom-domains
    spec:
      containers:
      - name: nginx-lua-ssl
        image: "0000000000.dkr.ecr.ap-southeast-2.amazonaws.com/lua-resty-auto-ssl:v0.NN"
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        - containerPort: 8443
        - containerPort: 8999
        envFrom:
         - configMapRef:
            name: https-custom-domain-conf

##
# Load balancer which manages traffic into the nginx instance
# In aws, this uses an ELB (elastic load balancer) construct
##
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  name: nginx-lua-load-balancer
  labels:
    service: https-custom-domains
spec:
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: https
    port: 443
    targetPort: 8443
  externalTrafficPolicy: Local
  selector:
    app: nginx-lua-ssl
  type: LoadBalancer

It's a tricky one because it could be at any layer of your stack.这是一个棘手的问题,因为它可能位于堆栈的任何层。

A couple of pointers:几个指针:

  • Check the logs of the kube-proxy running on the node in question.检查在相关节点上运行的 kube-proxy 的日志。

     $ kubectl logs <kube-proxy-pod>

    or ssh to the box and或 ssh 到盒子和

    $ docker log <kube-proxy-container>

    You can also try to change the verbosity of the kube-proxy logs in the kube-proxy DaemonSet:您还可以尝试更改 kube-proxy DaemonSet 中 kube-proxy 日志的详细程度:

     containers: here - command: | - /bin/sh | - -c \|/ - kube-proxy --v=9 --config=/var/lib/kube-proxy-config/config --hostname-override=${NODE_NAME} env: - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.15.10 imagePullPolicy: IfNotPresent name: kube-proxy
  • Does your kube-proxy have enough resources in the node that it's running?您的 kube-proxy 在其运行的节点中是否有足够的资源? You can also try changing the kube-proxy DaemonSet to give it more resources (CPU, memory)您还可以尝试更改 kube-proxy DaemonSet 以为其提供更多资源(CPU、内存)

     containers: - command: - /bin/sh - -c - kube-proxy --v=2 --config=/var/lib/kube-proxy-config/config --hostname-override=${NODE_NAME} env: - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.15.10 imagePullPolicy: IfNotPresent name: kube-proxy resources: requests: cpu: 300m <== this instead of 100m
  • You can try enabling iptables logging on the node.您可以尝试在节点上启用iptables 日志记录 Check if packets are getting dropped for some reason.检查数据包是否由于某种原因被丢弃。

In the end this issue was caused by a Pod incorrectly configured such that the load balancer routing traffic to it:最后,这个问题是由于 Pod 配置不正确导致负载均衡器将流量路由到它:

selector:
  matchLabels:
    app: redis-cli

There were 5 nginx pods correctly receiving traffic and one utility Pod incorrectly receiving traffic and responding by refusing the connection as you would expect.有 5 个 nginx pod 正确接收流量,而 1 个实用程序 Pod 错误地接收流量并按照您的预期拒绝连接来响应。

Thanks for responses.感谢您的回复。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM