简体   繁体   中英

kubernetes ingress-controller CrashLoopBackOff Error

I've set up a Kubernetes (1.17.11) cluster (Azure), and I've installed nginx-ingress-controller via

helm install nginx-ingress --namespace z1 stable/nginx-ingress --set controller.publishService.enabled=true

the setup seems to be ok and it's working but every now and then it fails, when I check running pods ( kubectl get pod -n z1 ) I see there is a number of restarts for the ingress-controller pod.

I thought maybe there is a huge load so better to increase replicas so I ran helm upgrade --namespace z1 stable/ingress --set controller.replicasCount=3 but still only one of the pods (out of 3) seems to be in use and one has fails due to CrashLoopBackOff sometimes (not constantly).

One thing worth mentioning, installed nginx-ingress version is 0.34.1 but 0.41.2 is also available, do you think the upgrade will help, and how can I upgrade the installed version to the new one (AFAIK helm upgrade won't replace the chart with a newer version, I may be wrong) ?

Any idea?

kubectl describe pod result:

Name:         nginx-ingress-controller-58467bccf7-jhzlx
Namespace:    z1
Priority:     0
Node:         aks-agentpool-41415378-vmss000000/10.240.0.4
Start Time:   Thu, 19 Nov 2020 09:01:30 +0100
Labels:       app=nginx-ingress
              app.kubernetes.io/component=controller
              component=controller
              pod-template-hash=58467bccf7
              release=nginx-ingress
Annotations:  <none>
Status:       Running
IP:           10.244.1.18
IPs:
  IP:           10.244.1.18
Controlled By:  ReplicaSet/nginx-ingress-controller-58467bccf7
Containers:
  nginx-ingress-controller:
    Container ID:  docker://719655d41c1c8cdb8c9e88c21adad7643a44d17acbb11075a1a60beb7553e2cf
    Image:         us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1
    Image ID:      docker-pullable://us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller@sha256:0e072dddd1f7f8fc8909a2ca6f65e76c5f0d2fcfb8be47935ae3457e8bbceb20
    Ports:         80/TCP, 443/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      /nginx-ingress-controller
      --default-backend-service=z1/nginx-ingress-default-backend
      --election-id=ingress-controller-leader
      --ingress-class=nginx
      --configmap=z1/nginx-ingress-controller
    State:          Running
      Started:      Thu, 19 Nov 2020 09:54:07 +0100
    Last State:     Terminated
      Reason:       Error
      Exit Code:    143
      Started:      Thu, 19 Nov 2020 09:50:41 +0100
      Finished:     Thu, 19 Nov 2020 09:51:12 +0100
    Ready:          True
    Restart Count:  8
    Liveness:       http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:10254/healthz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAME:       nginx-ingress-controller-58467bccf7-jhzlx (v1:metadata.name)
      POD_NAMESPACE:  z1 (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-token-7rmtk (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  nginx-ingress-token-7rmtk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nginx-ingress-token-7rmtk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From                                        Message
  ----     ------     ----                  ----                                        -------
  Normal   Scheduled  <unknown>             default-scheduler                           Successfully assigned z1/nginx-ingress-controller-58467bccf7-jhzlx to aks-agentpool-41415378-vmss000000
  Normal   Killing    58m                   kubelet, aks-agentpool-41415378-vmss000000  Container nginx-ingress-controller failed liveness probe, will be restarted
  Warning  Unhealthy  57m (x4 over 58m)     kubelet, aks-agentpool-41415378-vmss000000  Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy  57m                   kubelet, aks-agentpool-41415378-vmss000000  Readiness probe failed: Get http://10.244.1.18:10254/healthz: read tcp 10.244.1.1:54126->10.244.1.18:10254: read: connection reset by peer
  Normal   Pulled     57m (x2 over 59m)     kubelet, aks-agentpool-41415378-vmss000000  Container image "us.gcr.io/k8s-artifacts-prod/ingress-nginx/controller:v0.34.1" already present on machine
  Normal   Created    57m (x2 over 59m)     kubelet, aks-agentpool-41415378-vmss000000  Created container nginx-ingress-controller
  Normal   Started    57m (x2 over 59m)     kubelet, aks-agentpool-41415378-vmss000000  Started container nginx-ingress-controller
  Warning  Unhealthy  57m                   kubelet, aks-agentpool-41415378-vmss000000  Liveness probe failed: Get http://10.244.1.18:10254/healthz: dial tcp 10.244.1.18:10254: connect: connection refused
  Warning  Unhealthy  56m                   kubelet, aks-agentpool-41415378-vmss000000  Liveness probe failed: HTTP probe failed with statuscode: 500
  Warning  Unhealthy  23m (x10 over 58m)    kubelet, aks-agentpool-41415378-vmss000000  Liveness probe failed: Get http://10.244.1.18:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  14m (x6 over 57m)     kubelet, aks-agentpool-41415378-vmss000000  Readiness probe failed: Get http://10.244.1.18:10254/healthz: dial tcp 10.244.1.18:10254: connect: connection refused
  Warning  BackOff    9m28s (x12 over 12m)  kubelet, aks-agentpool-41415378-vmss000000  Back-off restarting failed container
  Warning  Unhealthy  3m51s (x24 over 58m)  kubelet, aks-agentpool-41415378-vmss000000  Readiness probe failed: Get http://10.244.1.18:10254/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Some logs from the controller

  NGINX Ingress controller
  Release:       v0.34.1
  Build:         v20200715-ingress-nginx-2.11.0-8-gda5fa45e2
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.1

-------------------------------------------------------------------------------

I1119 08:54:07.267185       6 main.go:275] Running in Kubernetes cluster version v1.17 (v1.17.11) - git (clean) commit 3a3612132641768edd7f7e73d07772225817f630 - platform linux/amd64
I1119 08:54:07.276120       6 main.go:87] Validated z1/nginx-ingress-default-backend as the default backend.
I1119 08:54:07.430459       6 main.go:105] SSL fake certificate created /etc/ingress-controller/ssl/default-fake-certificate.pem
W1119 08:54:07.497816       6 store.go:659] Unexpected error reading configuration configmap: configmaps "nginx-ingress-controller" not found
I1119 08:54:07.617458       6 nginx.go:263] Starting NGINX Ingress controller
I1119 08:54:08.748938       6 backend_ssl.go:66] Adding Secret "z1/z1-tls-secret" to the local store
I1119 08:54:08.801385       6 event.go:278] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"z2", Name:"zalenium", UID:"8d395a18-811b-4852-8dd5-3cdd682e2e6e", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"13667218", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress z2/zalenium
I1119 08:54:08.801908       6 backend_ssl.go:66] Adding Secret "z2/z2-tls-secret" to the local store
I1119 08:54:08.802837       6 event.go:278] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"z1", Name:"zalenium", UID:"244ae6f5-897e-432e-8ec3-fd142f0255dc", APIVersion:"networking.k8s.io/v1beta1", ResourceVersion:"13667219", FieldPath:""}): type: 'Normal' reason: 'CREATE' Ingress z1/zalenium
I1119 08:54:08.839946       6 nginx.go:307] Starting NGINX process
I1119 08:54:08.840375       6 leaderelection.go:242] attempting to acquire leader lease  z1/ingress-controller-leader-nginx...
I1119 08:54:08.845041       6 controller.go:141] Configuration changes detected, backend reload required.
I1119 08:54:08.919965       6 status.go:86] new leader elected: nginx-ingress-controller-58467bccf7-5thwb
I1119 08:54:09.084800       6 controller.go:157] Backend successfully reloaded.
I1119 08:54:09.096999       6 controller.go:166] Initial sync, sleeping for 1 second.

As OP confirmed in comment section, I am posting solution for this issue.

Yes I tried and I replaced the deprecated version with the latest version, it completely solved the nginx issue.

In this setup OP used helm chart from stable repository. In Github page, dedicated tostable/nginx-ingress there is an information that this specific chart is DEPRECATED . It was updated 12 days ago so this is a fresh change.

This chart is deprecated as we have moved to the upstream repo ingress-nginx The chart source can be found here: https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx

In Nginx Ingress Controller deploy guide using Helm option is already with new repository.

To list current repository on the cluster use command $ helm repo list .

$ helm repo list
NAME            URL
stable          https://kubernetes-charts.storage.googleapis.com
ingress-nginx   https://kubernetes.github.io/ingress-nginx

If you don't have new ingress-nginx repository, you have to:

  • Add new repository:
    • $ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
  • Update it:
    • $ helm update
  • Deploy Nginx Ingress Controller :
    • $ helm install my-release ingress-nginx/ingress-nginx

Disclaimer!

Above commands are specific to Helm v3 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM