GCE 负载均衡器健康检查失败（连接被拒绝）

Question

My (GCE) Load Balancer health checks are failing with a connection refused error, ultimately marking my GCE Ingress as UNHEALTHY .我的 (GCE) Load Balancer 健康检查因connection refused错误而失败，最终将我的 GCE Ingress 标记为UNHEALTHY 。 Now I'm wondering how to fix this issue.现在我想知道如何解决这个问题。

For my setup I'm using a GKE Autopilot cluster.对于我的设置，我使用的是GKE Autopilot集群。 And I have teared down and restarted my setup several times, always leading to the same result.而且我已经多次拆除并重新启动我的设置，总是导致相同的结果。

Suppose I have a deployment configured with a pod template consisting of several containers (of which not all of them expose ports).假设我的部署配置了一个由多个容器组成的 pod 模板（并非所有容器都公开端口）。

Side Note : For simplicity I skipped some configurations, such as config maps.旁注：为简单起见，我跳过了一些配置，例如配置映射。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-app-deployment
spec:
  selector:
    matchLabels:
      app: some-app
  replicas: 2
  template:
    metadata:
      labels:
        app: some-app
    spec:
      restartPolicy: Always
      containers:
        - name: web-server
          image: {{some-app-image}}
          command: ['app', 'web-server']
          ports:
            - name: web-server
              containerPort: 5000
              protocol: TCP
        - name: admin-server
          image: {{some-app-image}}
          command: ['app', 'admin-server']
          ports:
            - name: admin-server
              containerPort: 5001
              protocol: TCP
        - name: worker
          image: {{some-app-image}}
          command: ['app', 'worker']
        - name: cron
          image: {{some-app-image}}
          command: ['app', 'cron']
        - name: helper
          image: {{some-app-image}}
          command: [ '/bin/bash', '-c', '--' ]
          args: [ 'while true; do sleep 30; done;' ]

The following is the BackendConfig CRD, which supposedly defines the health check.以下是 BackendConfig CRD，它应该定义了健康检查。 I chose to define the path as /favicon.ico because Load Balancer Health Check requires exactly 200 OK response and the web-servers base path / emits a redirect 302 , hence it would faild the Health Check.我选择将路径定义为/favicon.ico因为 Load Balancer Health Check 恰好需要200 OK响应并且 Web 服务器基本路径/发出重定向302 ，因此它会失败 Health Check。 With kubectl port-forward I confirmed that /favicon.ico actually emits 200 OK and it does.使用kubectl port-forward我确认/favicon.ico实际上发出了200 OK并且确实如此。 Btw.顺便提一句。 just to exclude this from being a problem, I also tried other paths without success.只是为了排除这个问题，我也尝试了其他路径但没有成功。

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: http-hc-config
spec:
  healthCheck:
    type: HTTP
    port: 5000
    requestPath: "/favicon.ico"
    checkIntervalSec: 20

Additionally there is a custom header added to the admin endpoint.此外，还有一个自定义 header 添加到admin端点。

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: x-header-config
spec:
  customRequestHeaders:
    headers:
    - "X-Client-Region:{client_region}"

The following is the service description.以下为服务说明。 It references the BackendConfig via annotations as per the documentation.它根据文档通过注释引用 BackendConfig。 The documentation is not very specific as to how to reference the health check, so I mapped it to the relevant port.文档中并没有具体说明如何引用健康检查，所以我将其映射到相关端口。

apiVersion: v1
kind: Service
metadata:
  name: some-app-service
  annotations:
    cloud.google.com/backend-config: '{
      "default":"http-hc-config",
      "ports":{"4001":"x-header-config"}}'
spec:
  type: NodePort
  selector:
    app: some-app
  ports:
    - name: web
      targetPort: 5000
      protocol: TCP
      port: 4000
    - name: admin
      targetPort: 5001
      protocol: TCP
      port: 4001

I confirmed that my service is running, as I could successfully kubectl port-forward the pods and access the web-servers content.我确认我的服务正在运行，因为我可以成功地kubectl port-forward pod 并访问网络服务器内容。

Now the final piece of the setup is this ingress object.现在设置的最后一部分是这个入口 object。

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.class: gce
spec:
  rules:
    - host: admin.example.com
      http:
        paths:
        - path: /*
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4001
    - http:
        paths:
        - path: /*
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4000

EDIT #1 & #2: Performing a gcloud compute health-checks describe {{HEALTH_DEF_ID}} I receive the following output for the health check on port 4001 , which seems to be out-of-sync with the defined BackendConfig CRD:编辑 #1 和 #2：执行gcloud compute health-checks describe {{HEALTH_DEF_ID}}我收到以下 output port 4001上的健康检查，它似乎与定义的 BackendConfig CRD 不同步：

checkIntervalSec: 15
creationTimestamp: '2022-02-07T11:30:50.059-08:00'
description: Default kubernetes L7 Loadbalancing health check for NEG.
healthyThreshold: 1
httpHealthCheck:
  portSpecification: USE_SERVING_PORT
  proxyHeader: NONE
  requestPath: /
id: {{REDACTED}}
kind: compute#healthCheck
logConfig:
  enable: true
name: k8s1-4622dadc-{{REDACTED}}
selfLink: https://www.googleapis.com/compute/v1/projects/{{REDACTED}}
timeoutSec: 15
type: HTTP
unhealthyThreshold: 2

And the following for port 4000 , which surprisingly contains the right path and port configuration:以下是port 4000的内容，它令人惊讶地包含正确的path和port配置：

checkIntervalSec: 20
creationTimestamp: '2022-02-07T12:12:59.248-08:00'
description: Default kubernetes L7 Loadbalancing health check for NEG.
healthyThreshold: 1
httpHealthCheck:
  port: 5000
  portSpecification: USE_FIXED_PORT
  proxyHeader: NONE
  requestPath: /favicon.ico
id: {{REDACTED}}
kind: compute#healthCheck
name: k8s1-4622dadc-{{REDACTED}}
selfLink: https://www.googleapis.com/compute/v1/projects/{{REDACTED}}
timeoutSec: 15
type: HTTP
unhealthyThreshold: 2

Either there's something wrong with my setup, or the BackendConfig is not equally applied for all services in the rules section of the Ingress.要么我的设置有问题，要么 BackendConfig 没有平等地应用于 Ingress 规则部分中的所有服务。

Edit #3:编辑#3：

The Healt Check log entry for port 4000 shows: port 4000的健康检查日志条目显示：

healthCheckProbeResult: {
detailedHealthState: "TIMEOUT"
healthCheckProtocol: "HTTP"
healthState: "UNHEALTHY"
ipAddress: "10.40.129.203"
previousDetailedHealthState: "UNKNOWN"
previousHealthState: "UNHEALTHY"
probeCompletionTimestamp: "2022-02-07T20:06:10.955412154Z"
probeRequest: "/favicon.ico"
probeResultText: "HTTP response: , Error: Connection refused"
probeSourceIp: "35.191.12.114"
responseLatency: "0.000569s"
targetIp: "10.40.129.203"
targetPort: 5000
}

EDIT #4: I needed to adjust the BackendConfig annotation, as my case was in fact more involved compared to what my previous definitions were showing.编辑 #4：我需要调整 BackendConfig 注释，因为与我之前的定义所显示的相比，我的案例实际上涉及更多。

Answer 1

The problem was in the way I applied the BackendConfig annotation of the service.问题在于我应用服务的 BackendConfig 注释的方式。 My assumption was that the "default" config would be applied to both ports and the extra header config would additionally be applied to port 4001.我的假设是"default"配置将应用于两个端口，额外的 header 配置将另外应用于端口 4001。

But this is not the case.但这种情况并非如此。

In case a BackendConfig deviates from the default, you'll have to define a one for each case where it deviates from the HTTP ports.如果 BackendConfig 偏离默认值，您必须为每种偏离 HTTP 端口的情况定义一个。 BackendConfigs will not get merged and you can not add multiple configs per port. BackendConfigs 不会合并，您不能为每个端口添加多个配置。

---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: config-4000
spec:
  healthCheck:
    type: HTTP
    port: 5000
    requestPath: "/favicon.ico"
    checkIntervalSec: 20
---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: config-4001
spec:
  healthCheck:
    type: HTTP
    port: 5000
    requestPath: "/favicon.ico"
    checkIntervalSec: 20
  headers:
    - "X-Client-Region:{client_region}"
---
apiVersion: v1
kind: Service
metadata:
  name: some-app-service
  annotations:
    cloud.google.com/backend-config: '{"ports":{
       "4000":"config-4000",
       "4001":"config-4001"
    }}'
spec:
  type: NodePort
  selector:
    app: some-app
  ports:
    - name: web
      targetPort: 5000
      protocol: TCP
      port: 4000
    - name: admin
      targetPort: 5001
      protocol: TCP
      port: 4001

EDIT #1:编辑#1：

An additional issue (resulting in the Connection Error) was in my project config:另一个问题（导致连接错误）出现在我的项目配置中：

The server was configured to bind to 127.0.0.1 instead of 0.0.0.0服务器配置为绑定到127.0.0.1而不是0.0.0.0

Answer 2

Can you do a kubectl describe ing ingress ?你能做一个kubectl describe ing ingress吗？

I think you should see an error about an invalid wildcard.我认为您应该看到有关无效通配符的错误。 You should leave the * out.你应该离开*了。

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.class: gce
spec:
  rules:
    - host: admin.example.com
      http:
        paths:
        - path: /
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4001
    - http:
        paths:
        - path: /
          pathType: ImplementationSpecific
          backend:
            service:
              name: some-app-service
              port:
                number: 4000

GCE 负载均衡器健康检查失败（连接被拒绝）

问题描述

2 个解决方案

解决方案1
1 2022-02-08 17:06:12

解决方案2
0 2022-02-07 20:36:19

GCE 负载均衡器健康检查失败（连接被拒绝）

问题描述

2 个解决方案

解决方案1 1 2022-02-08 17:06:12

解决方案2 0 2022-02-07 20:36:19

解决方案1
1 2022-02-08 17:06:12

解决方案2
0 2022-02-07 20:36:19