簡體   English   中英

使用 ALB 和 aws-load-balancer-controller 時出現 502 錯誤網關錯誤

[英]502 bad gateway errors when using ALB and aws-load-balancer-controller

我們的 EKS 集群上有一項服務,該服務是 API,每天接收數千個請求。 有時,我們注意到在發出請求時會收到 502 錯誤。 如果我從 100 個請求中猜測出來,那么 10 到 20 個可能是 502 錯誤。

我們正在使用 aws 負載均衡器 controller -https://github.com/kubernetes-sigs/aws-load-balancer-controller

示例響應

    status: 502,
    statusText: 'Bad Gateway',
    headers: {
      server: 'awselb/2.0',
      date: 'Wed, 06 Oct 2021 10:24:19 GMT',
      'content-type': 'text/html',
      'content-length': '122',
      connection: 'close'
    },

故障排除

  1. 服務沒有崩潰,也沒有收到正在發出的請求(返回 502,我們可以使用從客戶端發送到服務的相關 ID 來識別這一點)。
  2. 當端口轉發繞過 alb 並向服務發出直接連接請求時,我們不會遇到此問題。

從上面我們已經確定這些 502 不是來自我們的應用程序/服務。


經過進一步研究,我們注意到其他人遇到了與我們類似的問題。

環境

  • AWS 負載均衡器 controller 版本: v2.1.3
  • Kubernetes 版本: 1.19
  • 使用 EKS(是/否),如果是版本?: Yes/v1.19.13-eks-8df270

請參閱下面的配置詳細信息:

服務部署配置
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: staging-ingress
  namespace: staging
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/group.name: "<redacted>"
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80,"HTTPS": 443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:<redacted>:certificate/0250a551-8971-468d-a483-cad28f890463,arn:aws:acm:us-east-2:<redacted>:certificate/b32e9708-7aeb-495b-87b1-8532a2592eeb
    alb.ingress.kubernetes.io/tags: Environment=prod,Team=dev
    alb.ingress.kubernetes.io/healthcheck-path: /health
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: '300'
    # alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=dev-ingress-logs-geeiq,access_logs.s3.prefix=dev-ingress
spec:
  rules:
    ....
    - host: entity-extractor.staging.<redacted>
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: entity-extractor-api-staging
                port:
                  number: 80
 apiVersion: v1 kind: Service metadata: name: entity-extractor-api-staging labels: app: entity-extractor-api-staging namespace: staging spec: type: NodePort ports: - port: 80 protocol: TCP targetPort: 8080 selector: app: entity-extractor-api-staging --- apiVersion: apps/v1 kind: Deployment metadata: name: entity-extractor-api-staging labels: app: entity-extractor-api-staging namespace: staging spec: replicas: 1 selector: matchLabels: app: entity-extractor-api-staging template: metadata: labels: app: entity-extractor-api-staging log-label: 'true' spec: containers: - name: entity-extractor-api-staging image: <redacted>:$TAG imagePullPolicy: Always env: <redacted> ports: - containerPort: 80 resources: {} nodeSelector: geeiq/node-type: worker
入口
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: staging-ingress namespace: staging annotations: kubernetes.io/ingress.class: alb alb.ingress.kubernetes.io/group.name: "<redacted>" alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80,"HTTPS": 443}]' alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-2:<redacted>:certificate/0250a551-8971-468d-a483-cad28f890463,arn:aws:acm:us-east-2:<redacted>:certificate/b32e9708-7aeb-495b-87b1-8532a2592eeb alb.ingress.kubernetes.io/tags: Environment=prod,Team=dev alb.ingress.kubernetes.io/healthcheck-path: /health alb.ingress.kubernetes.io/healthcheck-interval-seconds: '300' # alb.ingress.kubernetes.io/load-balancer-attributes: access_logs.s3.enabled=true,access_logs.s3.bucket=dev-ingress-logs-geeiq,access_logs.s3.prefix=dev-ingress spec: rules: .... - host: entity-extractor.staging.<redacted> http: paths: - path: / pathType: Prefix backend: service: name: entity-extractor-api-staging port: number: 80

示例 alb 日志

type | time | elb | client_ip | client_port | target_ip | target_port | request_processing_time | target_processing_time | response_processing_time | elb_status_code | target_status_code | received_bytes | sent_bytes | request_verb | request_url | request_proto | user_agent | ssl_cipher | ssl_protocol | target_group_arn | trace_id | domain_name | chosen_cert_arn | matched_rule_priority | request_creation_time | actions_executed | redirect_url | lambda_error_reason | target_port_list | target_status_code_list | classification | classification_reason -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- https | 2021-10-06T14:36:19.995743Z | app/k8s-geeiq-78db7a121a/27d8ce64549c8574 | 148.252.239.114 | 52152 | 10.0.2.240 | 31037 | 0 | 0.001 | -1 | 502 | - | 481 | 272 | POST | https://entity-extractor.staging.<redacted<.com:443/ | HTTP/1.1 | axios/0.22.0 | ECDHE-RSA-AES128-GCM-SHA256 | TLSv1.2 | arn:aws:elasticloadbalancing:us-east-2:700849607999:targetgroup/k8s-staging-entityex-1eaa7dc5fd/cfa1eeb14fd42a4c | Root=1-615db463-1042ab9118cc64b70f84b5a2 | entity-extractor.staging.<redacted>.com | arn:aws:acm:us-east-2:<redacted>:certificate/b32e9708-7aeb-495b-87b1-8532a2592eeb | 17 | 2021-10-06T14:36:19.901000Z | forward | - | - | 10.0.2.240:31037 | - | - | -

如果您需要任何其他信息,請告訴我。

檢查您的服務是否正在偵聽 IPv4 的0.0.0.0或 IPv6 的::而不是127.0.0.1localhost 當我忘記將服務的偵聽接口從localhost更改為:: :(如果您使用的是 IPv4,則為0.0.0.0 )時,我的純 IPv6 AWS EKS 集群出現此錯誤。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM