GCE 负载均衡器健康检查失败（连接被拒绝）

Question

我的 (GCE) Load Balancer 健康检查因connection refused错误而失败，最终将我的 GCE Ingress 标记为UNHEALTHY 。 现在我想知道如何解决这个问题。

对于我的设置，我使用的是GKE Autopilot集群。 而且我已经多次拆除并重新启动我的设置，总是导致相同的结果。

假设我的部署配置了一个由多个容器组成的 pod 模板（并非所有容器都公开端口）。

旁注：为简单起见，我跳过了一些配置，例如配置映射。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-app-deployment
spec:
  selector:
    matchLabels:
      app: some-app
  replicas: 2
  template:
    metadata:
      labels:
        app: some-app
    spec:
      restartPolicy: Always
      containers:
        - name: web-server
          image: {{some-app-image}}
          command: ['app', 'web-server']
          ports:
            - name: web-server
              containerPort: 5000
              protocol: TCP
        - name: admin-server
          image: {{some-app-image}}
          command: ['app', 'admin-server']
          ports:
            - name: admin-server
              containerPort: 5001
              protocol: TCP
        - name: worker
          image: {{some-app-image}}
          command: ['app', 'worker']
        - name: cron
          image: {{some-app-image}}
          command: ['app', 'cron']
        - name: helper
          image: {{some-app-image}}
          command: [ '/bin/bash', '-c', '--' ]
          args: [ 'while true; do sleep 30; done;' ]

以下是 BackendConfig CRD，它应该定义了健康检查。 我选择将路径定义为/favicon.ico因为 Load Balancer Health Check 恰好需要200 OK响应并且 Web 服务器基本路径/发出重定向302 ，因此它会失败 Health Check。 使用kubectl port-forward我确认/favicon.ico实际上发出了200 OK并且确实如此。 顺便提一句。 只是为了排除这个问题，我也尝试了其他路径但没有成功。

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: http-hc-config
spec:
  healthCheck:
    type: HTTP
    port: 5000
    requestPath: "/favicon.ico"
    checkIntervalSec: 20

此外，还有一个自定义 header 添加到admin端点。

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: x-header-config
spec:
  customRequestHeaders:
    headers:
    - "X-Client-Region:{client_region}"

以下为服务说明。 它根据文档通过注释引用 BackendConfig。 文档中并没有具体说明如何引用健康检查，所以我将其映射到相关端口。

apiVersion: v1
kind: Service
metadata:
  name: some-app-service
  annotations:
    cloud.google.com/backend-config: '{
      "default":"http-hc-config",
      "ports":{"4001":"x-header-config"}}'
spec:
  type: NodePort
  selector:
    app: some-app
  ports:
    - name: web
      targetPort: 5000
      protocol: TCP
      port: 4000
    - name: admin
      targetPort: 5001
      protocol: TCP
      port: 4001

我确认我的服务正在运行，因为我可以成功地kubectl port-forward pod 并访问网络服务器内容。

现在设置的最后一部分是这个入口 object。

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.class: gce
spec:
  rules:
    - host: admin.example.com
      http:
        paths:
        - path: /*
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4001
    - http:
        paths:
        - path: /*
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4000

编辑 #1 和 #2：执行gcloud compute health-checks describe {{HEALTH_DEF_ID}}我收到以下 output port 4001上的健康检查，它似乎与定义的 BackendConfig CRD 不同步：

checkIntervalSec: 15
creationTimestamp: '2022-02-07T11:30:50.059-08:00'
description: Default kubernetes L7 Loadbalancing health check for NEG.
healthyThreshold: 1
httpHealthCheck:
  portSpecification: USE_SERVING_PORT
  proxyHeader: NONE
  requestPath: /
id: {{REDACTED}}
kind: compute#healthCheck
logConfig:
  enable: true
name: k8s1-4622dadc-{{REDACTED}}
selfLink: https://www.googleapis.com/compute/v1/projects/{{REDACTED}}
timeoutSec: 15
type: HTTP
unhealthyThreshold: 2

以下是port 4000的内容，它令人惊讶地包含正确的path和port配置：

checkIntervalSec: 20
creationTimestamp: '2022-02-07T12:12:59.248-08:00'
description: Default kubernetes L7 Loadbalancing health check for NEG.
healthyThreshold: 1
httpHealthCheck:
  port: 5000
  portSpecification: USE_FIXED_PORT
  proxyHeader: NONE
  requestPath: /favicon.ico
id: {{REDACTED}}
kind: compute#healthCheck
name: k8s1-4622dadc-{{REDACTED}}
selfLink: https://www.googleapis.com/compute/v1/projects/{{REDACTED}}
timeoutSec: 15
type: HTTP
unhealthyThreshold: 2

要么我的设置有问题，要么 BackendConfig 没有平等地应用于 Ingress 规则部分中的所有服务。

编辑#3：

port 4000的健康检查日志条目显示：

healthCheckProbeResult: {
detailedHealthState: "TIMEOUT"
healthCheckProtocol: "HTTP"
healthState: "UNHEALTHY"
ipAddress: "10.40.129.203"
previousDetailedHealthState: "UNKNOWN"
previousHealthState: "UNHEALTHY"
probeCompletionTimestamp: "2022-02-07T20:06:10.955412154Z"
probeRequest: "/favicon.ico"
probeResultText: "HTTP response: , Error: Connection refused"
probeSourceIp: "35.191.12.114"
responseLatency: "0.000569s"
targetIp: "10.40.129.203"
targetPort: 5000
}

编辑 #4：我需要调整 BackendConfig 注释，因为与我之前的定义所显示的相比，我的案例实际上涉及更多。

Answer 1

问题在于我应用服务的 BackendConfig 注释的方式。 我的假设是"default"配置将应用于两个端口，额外的 header 配置将另外应用于端口 4001。

但这种情况并非如此。

如果 BackendConfig 偏离默认值，您必须为每种偏离 HTTP 端口的情况定义一个。 BackendConfigs 不会合并，您不能为每个端口添加多个配置。

---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: config-4000
spec:
  healthCheck:
    type: HTTP
    port: 5000
    requestPath: "/favicon.ico"
    checkIntervalSec: 20
---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: config-4001
spec:
  healthCheck:
    type: HTTP
    port: 5000
    requestPath: "/favicon.ico"
    checkIntervalSec: 20
  headers:
    - "X-Client-Region:{client_region}"
---
apiVersion: v1
kind: Service
metadata:
  name: some-app-service
  annotations:
    cloud.google.com/backend-config: '{"ports":{
       "4000":"config-4000",
       "4001":"config-4001"
    }}'
spec:
  type: NodePort
  selector:
    app: some-app
  ports:
    - name: web
      targetPort: 5000
      protocol: TCP
      port: 4000
    - name: admin
      targetPort: 5001
      protocol: TCP
      port: 4001

编辑#1：

另一个问题（导致连接错误）出现在我的项目配置中：

服务器配置为绑定到127.0.0.1而不是0.0.0.0

Answer 2

你能做一个kubectl describe ing ingress吗？

我认为您应该看到有关无效通配符的错误。 你应该离开*了。

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.class: gce
spec:
  rules:
    - host: admin.example.com
      http:
        paths:
        - path: /
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4001
    - http:
        paths:
        - path: /
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4000

GCE 负载均衡器健康检查失败（连接被拒绝）

问题描述

2 个解决方案

解决方案1
1 2022-02-08 17:06:12

解决方案2
0 2022-02-07 20:36:19

GCE 负载均衡器健康检查失败（连接被拒绝）

问题描述

2 个解决方案

解决方案1 1 2022-02-08 17:06:12

解决方案2 0 2022-02-07 20:36:19

解决方案1
1 2022-02-08 17:06:12

解决方案2
0 2022-02-07 20:36:19