当 Horizental Pod Auto Scaler 开始工作时，GKE 元数据服务器不可用

Question

Running Pods with WorkloadIdentity makes an Google Credential error when auto scaling started.启动自动缩放时，使用 WorkloadIdentity 运行 Pod 会导致 Google 凭据错误。

My application is configured with WorkloadIdentity to use Google Pub/Sub and also set HorizontalPodAutoscaler to scale the pods up to 5 replicas.我的应用程序配置了 WorkloadIdentity 以使用 Google Pub/Sub 并设置 HorizontalPodAutoscaler 以将 Pod 扩展到 5 个副本。

The problem arises when an auto scaler create replicas of the pod, GKE's metadata server does not work for few seconds then after 5 to 10 seconds no error created.当自动缩放器创建 pod 的副本时会出现问题，GKE 的元数据服务器在几秒钟内无法工作，然后在 5 到 10 秒后没有创建错误。

here is the error log after a pod created by auto scaler.这是自动缩放器创建 pod 后的错误日志。

WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 2 of 3. Reason: timed out    
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 3 of 3. Reason: timed out
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server
Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

what exactly is the problem here?这里到底有什么问题？

When I read the doc from here workload identity docs当我从这里阅读文档时工作负载身份文档

"The GKE metadata server takes a few seconds to start to run on a newly created Pod"

I think the problem is related to this issue but is there a solution for this kind situation?我认为这个问题与这个问题有关，但是这种情况有解决方案吗？

Thanks谢谢

Answer 1

There is no specific solution other than to ensure your application can cope with this.除了确保您的应用程序能够应对此问题外，没有特定的解决方案。 Kubernetes uses DaemonSets to launch per-node apps like the metadata intercept server but as the docs clearly tell you, that takes a few seconds (noticing the new node, scheduling the pod, pulling the image, starting the container). Kubernetes 使用 DaemonSets 启动每个节点的应用程序，如元数据拦截服务器，但正如文档清楚地告诉你的那样，这需要几秒钟（注意新节点、调度 pod、拉取图像、启动容器）。

You can use an initContainer to prevent your application from starting until some script returns, which can just try to hit a GCP API until it works.您可以使用 initContainer 来阻止您的应用程序在某些脚本返回之前启动，这可以尝试访问 GCP API 直到它正常工作。 But that's probably more work than just making your code retry when those errors happen.但这可能不仅仅是在发生这些错误时让您的代码重试。

当 Horizental Pod Auto Scaler 开始工作时，GKE 元数据服务器不可用

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-30 05:37:33

当 Horizental Pod Auto Scaler 开始工作时，GKE 元数据服务器不可用

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-30 05:37:33

解决方案1
2 已采纳 2020-12-30 05:37:33