Kubernetes - nginx-ingress 在通过 php 上传文件后崩溃

[英]Kubernetes - nginx-ingress is crashing after file upload via php

I'am running Kubernetes cluster on Google Cloud Platform via their Kubernetes Engine.我通过他们的 Kubernetes Engine 在谷歌云平台上运行 Kubernetes 集群。 Cluster version is 1.13.11-gke.14.集群版本为 1.13.11-gke.14。 PHP application pod contains 2 containers - Nginx as a reverse proxy and php-fpm (7.2). PHP 应用程序 pod 包含 2 个容器 - 作为反向代理的 Nginx 和 php-fpm (7.2)。

In google cloud is used TCP Load Balancer and then internal routing via Nginx Ingress.在谷歌云中使用 TCP 负载均衡器,然后通过 Nginx Ingress 进行内部路由。

Problem is: when I upload some bigger file (17MB), ingress is crashing with this error:问题是:当我上传一些更大的文件(17MB)时,入口崩溃并出现此错误:

W 2019-12-01T14:26:06.341588Z Dynamic reconfiguration failed: Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory 
E 2019-12-01T14:26:06.341658Z Unexpected failure reconfiguring NGINX: 
W 2019-12-01T14:26:06.345575Z requeuing initial-sync, err Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory 
I 2019-12-01T14:26:06.354869Z Configuration changes detected, backend reload required. 
E 2019-12-01T14:26:06.393528796Z Post http+unix://nginx-status/configuration/backends: dial unix /tmp/nginx-status-server.sock: connect: no such file or directory

E 2019-12-01T14:26:08.077580Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused 
I 2019-12-01T14:26:12.314526990Z - [] - - [01/Dec/2019:14:26:12 +0000] "GET / HTTP/2.0" 200 541 "-" "GoogleStackdriverMonitoring-UptimeChecks(https://cloud.google.com/monitoring)" 99 1.787 [bap-staging-bap-staging-80] [] 553 1.788 200 5ac9d438e5ca31618386b35f67e2033b

E 2019-12-01T14:26:12.455236Z healthcheck error: Get http+unix://nginx-status/healthz: dial unix /tmp/nginx-status-server.sock: connect: connection refused 
I 2019-12-01T14:26:13.156963Z Exiting with 0 

Here is yaml configuration of Nginx ingress.这是 Nginx ingress 的 yaml 配置。 Configuration is default by Gitlab's system that is creating cluster on their own. Gitlab 的系统默认配置是自行创建集群的。

apiVersion: apps/v1
kind: Deployment
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: "2019-11-24T17:35:04Z"
  generation: 3
    app: nginx-ingress
    chart: nginx-ingress-1.22.1
    component: controller
    heritage: Tiller
    release: ingress
  name: ingress-nginx-ingress-controller
  namespace: gitlab-managed-apps
  resourceVersion: "2638973"
  selfLink: /apis/apps/v1/namespaces/gitlab-managed-apps/deployments/ingress-nginx-ingress-controller
  uid: bfb695c2-0ee0-11ea-a36a-42010a84009f
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
      app: nginx-ingress
      release: ingress
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
        prometheus.io/port: "10254"
        prometheus.io/scrape: "true"
      creationTimestamp: null
        app: nginx-ingress
        component: controller
        release: ingress
      - args:
        - /nginx-ingress-controller
        - --default-backend-service=gitlab-managed-apps/ingress-nginx-ingress-default-backend
        - --election-id=ingress-controller-leader
        - --ingress-class=nginx
        - --configmap=gitlab-managed-apps/ingress-nginx-ingress-controller
        - name: POD_NAME
              apiVersion: v1
              fieldPath: metadata.name
        - name: POD_NAMESPACE
              apiVersion: v1
              fieldPath: metadata.namespace
        image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.25.1
        imagePullPolicy: IfNotPresent
          failureThreshold: 3
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        name: nginx-ingress-controller
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          name: https
          protocol: TCP
          failureThreshold: 3
            path: /healthz
            port: 10254
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 3
        resources: {}
          allowPrivilegeEscalation: true
            - NET_BIND_SERVICE
            - ALL
          runAsUser: 33
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        - mountPath: /etc/nginx/modsecurity/modsecurity.conf
          name: modsecurity-template-volume
          subPath: modsecurity.conf
        - mountPath: /var/log/modsec
          name: modsecurity-log-volume
      - args:
        - /bin/sh
        - -c
        - tail -f /var/log/modsec/audit.log
        image: busybox
        imagePullPolicy: Always
        name: modsecurity-log
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        - mountPath: /var/log/modsec
          name: modsecurity-log-volume
          readOnly: true
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: ingress-nginx-ingress
      serviceAccountName: ingress-nginx-ingress
      terminationGracePeriodSeconds: 60
      - configMap:
          defaultMode: 420
          - key: modsecurity.conf
            path: modsecurity.conf
          name: ingress-nginx-ingress-controller
        name: modsecurity-template-volume
      - emptyDir: {}
        name: modsecurity-log-volume

I have no Idea what else to try.我不知道还能尝试什么。 I'm running cluster on 3 nodes (2x 1vCPU, 1.5GB RAM and 1x Preemptile 2vCPU, 1,8GB RAM), all of them on SSD drives.我在 3 个节点(2x 1vCPU、1.5GB RAM 和 1x Preemptile 2vCPU、1.8GB RAM)上运行集群,它们都在 SSD 驱动器上。

Anytime i upload the image, disk IO will get crazy.每当我上传图像时,磁盘 IO 都会变得疯狂。

Disk IOPS Disk I/O Thanks for your help.磁盘 IOPS磁盘 I/O感谢您的帮助。

Found solution.找到解决方案。 Nginx-ingress pod contained modsecurity too. Nginx-ingress pod 也包含 modsecurity。 All requests were analyzed by mod security and bigger uploaded files caused those crashes.所有请求都由 mod security 分析,更大的上传文件导致了这些崩溃。 It wasn't crash at all but took too much CPU and I/O, that caused longer healthcheck response to all other pods.它根本没有崩溃,但占用了太多的 CPU 和 I/O,导致对所有其他 pod 的健康检查响应时间更长。 Solution is to configure correctly modsecurity or disable.解决方案是正确配置 modsecurity 或禁用。


