简体   繁体   English

由于 GCP 服务帐户,Pod 在 Kubernetes 中出现了 CrashLoopBackOff

[英]Pod got CrashLoopBackOff in Kubernetes because of GCP service account

After deployment with using helm carts, I got CrashLoopBackOff error.使用掌舵车进行部署后,出现 CrashLoopBackOff 错误。 NAME READY STATUS RESTARTS AGE myproject-myproject-54ff57477d-h5fng 0/1 CrashLoopBackOff 10 24m NAME READY STATUS RESTARTS AGE myproject-myproject-54ff57477d-h5fng 0/1 CrashLoopBackOff 10 24m

Then, I describe the pod to see events and I saw smth like below然后,我描述了 pod 以查看事件,我看到如下所示

 Liveness probe failed: Get http://10.16.26.26:8080/status: 
 dial tcp 10.16.26.26:8080: connect: connection refused

Readiness probe failed: Get http://10.16.26.26:8080/status: 
dial tcp 10.16.26.26:8080: connect: connection refused

Lastly, I saw invalid grant access to my GCP cloud proxy in logs as below time="2020-01-15T15:30:46Z" level=fatal msg=application_main error="Post https://www.googleapis.com/{....blabla.....}: oauth2: cannot fetch token: 400 Bad Request\\nResponse: {\\n \\"error\\": \\"invalid_grant\\",\\n \\"error_description\\": \\"Not a valid email or user ID.\\"\\n}"最后,我在日志中看到对我的 GCP 云代理的无效授权访问,如下所示time="2020-01-15T15:30:46Z" level=fatal msg=application_main error="Post https://www.googleapis.com/{....blabla.....}: oauth2: cannot fetch token: 400 Bad Request\\nResponse: {\\n \\"error\\": \\"invalid_grant\\",\\n \\"error_description\\": \\"Not a valid email or user ID.\\"\\n}"

However, I checked my service account in IAM, it has access to cloud proxy.但是,我在 IAM 中检查了我的服务帐户,它可以访问云代理。 Furthermore, I tested with using same credentials in my local, and endpoint for readiness probe was working successfully.此外,我在本地使用相同的凭据进行了测试,准备就绪探针的端点工作成功。

Does anyone has any suggestion about my problem?有人对我的问题有什么建议吗?

You can disable liveness probe to stop CrashLoopBackoff, exec into container and test from there.您可以禁用活性探测以停止 CrashLoopBackoff,执行到容器并从那里进行测试。 Ideally you should not keep save config for liveness and readiness probe.It is not advisable for liveness probe to depend on anything external, it should just check if pod is live or not.理想情况下,您不应该为 liveness 和 readiness 探针保存配置。 不建议 liveness 探针依赖任何外部内容,它应该只检查 pod 是否活动。

Referring to problem with granting access on GCP - fix this by using Email Address (the string that ends with ...@developer.gserviceaccount.com ) instead of Client ID for client_id parameter value.关于授予 GCP 访问权限的问题 - 通过使用电子邮件地址(以...@developer.gserviceaccount.com结尾的字符串)而不是 client_id 参数值的客户端 ID 来解决此问题。 The naming set by Google is confusing.谷歌设置的命名令人困惑。

More information and troubleshooting you can find here: google-oautgh-grant .您可以在此处找到更多信息和故障排除: google-oautgh-grant

Referring to problem with probes:参考探针问题:

Check if URL is health.检查 URL 是否健康。 Your Probes may be too sensitive - your application take a while to start or respond.您的探测器可能过于敏感 - 您的应用程序需要一段时间才能启动或响应。

Readiness and liveness probes can be used in parallel for the same container.就绪和活跃探测器可以并行用于同一个容器。 Using both can ensure that traffic does not reach a container that is not ready for it, and that containers are restarted when they fail.使用两者可以确保流量不会到达未准备就绪的容器,并且容器在失败时会重新启动。

Liveness probe checks if your application is in a healthy state in your already running pod. Liveness 探针检查您的应用程序在您已经运行的 pod 中是否处于健康状态。

Readiness probe will actually check if your pod is ready to receive traffic.就绪探针实际上会检查您的 Pod 是否已准备好接收流量。 Thus, if there is no /path endpoint, it will never appear as Running因此,如果没有 /path 端点,它将永远不会显示为 Running

egg:蛋:

          livenessProbe:
            httpGet:
              path: /your-path
              port: 5000
            failureThreshold: 1
            periodSeconds: 2
            initialDelaySeconds: 2
            ports:
              - name: http
              containerPort: 5000

If endpoint /index2 will not exist pod will never appear as Running.如果端点/index2不存在,pod 将永远不会显示为 Running。

Make sure that you properly set up liveness and readiness probe.确保您正确设置了 liveness 和 readiness 探针。

For an HTTP probe, the kubelet sends an HTTP request to the specified path and port to perform the check.对于 HTTP 探测,kubelet 向指定的路径和端口发送 HTTP 请求以执行检查。 The kubelet sends the probe to the pod's IP address, unless the address is overridden by the optional host field in httpGet. kubelet 将探测发送到 pod 的 IP 地址,除非该地址被 httpGet 中的可选主机字段覆盖。 If scheme field is set to HTTPS, the kubelet sends an HTTPS request skipping the certificate verification.如果 scheme 字段设置为 HTTPS,kubelet 会发送 HTTPS 请求,跳过证书验证。 In most scenarios, you do not want to set the host field.在大多数情况下,您不想设置主机字段。 Here's one scenario where you would set it.这是您设置它的一种情况。 Suppose the Container listens on 127.0.0.1 and the Pod's hostNetwork field is true.假设 Container 侦听127.0.0.1并且 Pod 的 hostNetwork 字段为 true。 Then host, under httpGet, should be set to 127.0.0.1.然后,httpGet 下的主机应设置为127.0.0.1. Make sure you did it.确保你做到了。 If your pod relies on virtual hosts, which is probably the more common case, you should not use host, but rather set the Host header in httpHeaders.如果你的 pod 依赖于虚拟主机,这可能是更常见的情况,你不应该使用主机,而应该在 httpHeaders 中设置主机头。

For a TCP probe, the kubelet makes the probe connection at the node, not in the pod, which means that you can not use a service name in the host parameter since the kubelet is unable to resolve it.对于 TCP 探测,kubelet 在节点而不是在 pod 中建立探测连接,这意味着您不能在主机参数中使用服务名称,因为 kubelet 无法解析它。

Most important thing you need to configure when using liveness probes.使用活性探针时需要配置的最重要的事情。 This is the initialDelaySeconds setting.这是 initialDelaySeconds 设置。

Make sure that you do have port 80 open on the container.确保您在容器上打开了端口80

Liveness probe failure causes the pod to restart. Liveness 探针失败会导致 pod 重新启动。 You need to make sure the probe doesn't start until the app is ready.您需要确保在应用程序准备就绪之前探针不会启动。 Otherwise, the app will constantly restart and never be ready!否则,应用程序将不断重启,永远不会准备好!

I recommend to use p99 startup time for the initialDelaySeconds.我建议为 initialDelaySeconds 使用 p99 启动时间。

Take a look here: probes-kubernetes , most-common-fails-kubernetes-deployments .看看这里: probes-kubernetesmost-common-fails-kubernetes-deployments

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM