简体   繁体   中英

Connection refused between kube-proxy and nginx backend

We are regularly seeing connection refused errors on a bespoke NGINX reverse proxy installed in AWS EKS. (see below for kubernetes template)

Initially, we thought it was an issue with the load balancer. However, upon further investigation, there seems to be an issue between the kube-proxy and the nginx Pod.

When I run repeated wget IP:PORT against just the node's internal IP and the desired node port that serves, we will see bad request several times and eventually, a failed: Connection refused

Whereas when I run a request just against the Pod IP and Port, I can not get this connection refused.

Example wget output

Fail:

wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31--  http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... failed: Connection refused.

Success:

wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31--  http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... connected.
HTTP request sent, awaiting response... 400 Bad Request
2020-06-26 01:15:31 ERROR 400: Bad Request.

In the logs on the NGINX service, we don't see the connection refused the request, whereas we do see the other BAD REQUEST ones.

I have read about several issues regarding kube-proxy and I am interested in other insights to improve this situation.

eg: https://github.com/kubernetes/kubernetes/issues/38456

Any help much appreciated.

Kubernetes Template

##
# Main nginx deployment. Requires updated tag potentially for
# docker image
##
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-lua-ssl-deployment
  labels:
    service: https-custom-domains
spec:
  selector:
    matchLabels:
      app: nginx-lua-ssl
  replicas: 5
  template:
    metadata:
      labels:
        app: nginx-lua-ssl
        service: https-custom-domains
    spec:
      containers:
      - name: nginx-lua-ssl
        image: "0000000000.dkr.ecr.ap-southeast-2.amazonaws.com/lua-resty-auto-ssl:v0.NN"
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        - containerPort: 8443
        - containerPort: 8999
        envFrom:
         - configMapRef:
            name: https-custom-domain-conf

##
# Load balancer which manages traffic into the nginx instance
# In aws, this uses an ELB (elastic load balancer) construct
##
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  name: nginx-lua-load-balancer
  labels:
    service: https-custom-domains
spec:
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: https
    port: 443
    targetPort: 8443
  externalTrafficPolicy: Local
  selector:
    app: nginx-lua-ssl
  type: LoadBalancer

It's a tricky one because it could be at any layer of your stack.

A couple of pointers:

  • Check the logs of the kube-proxy running on the node in question.

     $ kubectl logs <kube-proxy-pod>

    or ssh to the box and

    $ docker log <kube-proxy-container>

    You can also try to change the verbosity of the kube-proxy logs in the kube-proxy DaemonSet:

     containers: here - command: | - /bin/sh | - -c \|/ - kube-proxy --v=9 --config=/var/lib/kube-proxy-config/config --hostname-override=${NODE_NAME} env: - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.15.10 imagePullPolicy: IfNotPresent name: kube-proxy
  • Does your kube-proxy have enough resources in the node that it's running? You can also try changing the kube-proxy DaemonSet to give it more resources (CPU, memory)

     containers: - command: - /bin/sh - -c - kube-proxy --v=2 --config=/var/lib/kube-proxy-config/config --hostname-override=${NODE_NAME} env: - name: NODE_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: spec.nodeName image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/kube-proxy:v1.15.10 imagePullPolicy: IfNotPresent name: kube-proxy resources: requests: cpu: 300m <== this instead of 100m
  • You can try enabling iptables logging on the node. Check if packets are getting dropped for some reason.

In the end this issue was caused by a Pod incorrectly configured such that the load balancer routing traffic to it:

selector:
  matchLabels:
    app: redis-cli

There were 5 nginx pods correctly receiving traffic and one utility Pod incorrectly receiving traffic and responding by refusing the connection as you would expect.

Thanks for responses.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM