简体   繁体   中英

502 / 503 / 404 HTTP error : GKE ingress-nginx serving traffic to the wrong services, from other namespaces

I have this kind of routing in each namespace:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
  annotations:
    janitor/expires: ${EXPIRY_DATE}
    nginx.ingress.kubernetes.io/ssl-redirect: "false" # Set to true once SSL is set up.
spec:
  ingressClassName: nginx
  rules:
    - host: api.${KUBE_DEPLOY_HOST}
      http:
        paths:
        - pathType: Prefix
          path: /
          backend:
            service:
              name: api-js
              port:
                number: 111

Served by ingress-nginx (.= nginx-ingress) 1.2.1 (same issue with 1.5.1) with Kube 1.22 (or 1,23), one deployment in the ingress-nginx namespace. two replicas in the deployment.

When I check my logs I see that sometimes, I think especially when I deploy new ingress rules in new namespaces (during and after the ingress-nginx reload event) I get 502 / 503 / 404 HTTP error responses from the ingress-nginx controller.

When I look into the detailed log, I see:

IP - - [time] "GET API_ROUTE HTTP/1.1" 503 592 "master.frontend.url" UA 449 0.000 [development-branch-api] [] - - - - ID

Which makes me think the request goes wrong because the master frontend is being served a development API response by the ingress-nginx controller, sometimes when the new api service is not even ready.

When I check the ingress from GKE's view it looks like it is serving 3 pods, corresponding to 3 namespaces that should not overlap / mix requests, instead of the one api pod in the namespace corresponding to the ingress:

在此处输入图像描述

So the error is seen here, all the ingresses for each 3 namespsace serve 3 pods instead of one pod, which means it is all mixed up, right.

I am sure there is one pod per deployment in my namespaces:

在此处输入图像描述

So if I understand correctly, it seems that the situation is ingress A, ingress B and ingress C, all three of them, serve api A AND api B AND api C instead of serving just the one api pod from their namespace (A, B, C).

But what I don't know is how is it possible that the ingress matches pods from other namespaces, when I am not using externalname, it is the opposite of what an ingress does by default.

I believe the issue is at the ingress level and not at the service level, as when I look into each service, I see that it just serve the one pod corresponding to its namespace and not 3.

The controller is the default ingress-nginx installation edited to use 2 replicas instead of one.

Example service and deployment (issue happens for all of them):

apiVersion: v1
kind: Service
metadata:
  name: api-js
  labels:
    component: api-js
    role: api-js
  annotations:
    janitor/expires: ${EXPIRY_DATE}
spec:
  type: ClusterIP
  selector:
    role: perfmaker-api-js
  ports:
    - name: httpapi
      port: 111
      targetPort: 111
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-js
  annotations:
    janitor/expires: ${EXPIRY_DATE}
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: api-js
  template:
    metadata:
      annotations:
        cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
      labels:
        app: api-js
        role: api-js
    spec:
      containers:
        - name: api-js
          image: registry/api

When I change the api name / selectors on one branch, it "untangles" the situation and each branch / namespace's ingress only serves the pod it should serve.

But the errors happen during and after 'reload' event on the ingress-controller, not all the time, an event which is fired when ingress resources are added / removed / updated. In my case it is when there is a new branch in the CI/CD which makes a new namespace and deployment + ingress, or when a finished pipeline triggers a namespace deletion.

Alas I must admit I just discovered the error does not originate from the kube.netes / ingress-nginx part of the setup but from the testing system, which includes a collision between services at deploy time, because of bad separation in the CI / CD job. Sorry for your time !

So in fact the logs from ingress nginx that stunned me:

IP - - [time] "GET API_ROUTE HTTP/1.1" 503 592 "master.frontend.url" UA 449 0.000 [development-branch-api] [] - - - - ID

Shows that a service I deploy is overwritten by another environment deployment with different variables, which makes it start to make request to another namespace. The ingress routing is correct.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM