是什么导致“Cloud Run 错误：内部系统错误，系统将稍后重试”？故障排除建议？

Question

I'm attempting to deploy a Cloud Run Service as part of tests for my open source project.我正在尝试部署 Cloud Run 服务作为我的开源项目测试的一部分。 This is done via our automated CI/CD system and has worked successfully hundreds of times previously.这是通过我们的自动化 CI/CD 系统完成的，并且之前已经成功运行了数百次。

The Cloud Run Service gets created but the first revision never gets deployed. Cloud Run 服务已创建，但第一个修订版从未部署过。 When I look at the newly created Service in the GCP Console, it shows "Cloud Run error: Internal system error, system will retry later."当我在 GCP 控制台中查看新创建的服务时，它显示“Cloud Run 错误：内部系统错误，系统将稍后重试。” as the main status message for the Service.作为服务的主要状态消息。

The command line that is failing is:失败的命令行是：

gcloud --configuration=adapt-cloud-gcloud-testing --quiet run deploy cloud-run-gen-name-a179e65d6fdfc19abc57e15df563d8cb --platform=managed --format=json --no-allow-unauthenticated --memory=128M --cpu=1 --image=gcr.io/adapt-ci/http-echo --region=us-central1 --port=5678 --set-env-vars=ADAPT_TEST_DEPLOY_ID=MockDeploy-aymb --args="-text,Adapt Test"

The output from that command (note: the dots after Creating Revision just keep going):该命令中的 output（注意： Creating Revision之后的点继续显示）：

Deploying container to Cloud Run service [cloud-run-gen-name-a179e65d6fdfc19abc57e15df563d8cb] in project [adapt-ci] region [us-central1]
Deploying new service...
Creating Revision....................................................................................................................

The YAML tab in the Console also shows the same message for each of the three status conditions (see below).控制台中的 YAML 选项卡还针对三种状态条件中的每一种显示相同的消息（见下文）。

To troubleshoot, I have also tried:为了排除故障，我还尝试过：

Using the GCP Console to create the most basic Cloud Run Service using the example container from the getting started docs manually, while logged in as the project and organization owner.使用 GCP 控制台手动使用入门文档中的示例容器创建最基本的 Cloud Run 服务，同时以项目和组织所有者身份登录。 I see the same failure.我看到同样的失败。 I have created Services manually this way previously, with this account and project, with no issues.我之前使用此帐户和项目以这种方式手动创建服务，没有任何问题。
Using the GCP Console to create the same example Service as above in a different project , but with the same user and in the same org.使用 GCP Console 在不同的项目中创建与上面相同的示例服务，但使用相同的用户和相同的组织。 This works successfully, so the issue is specific to the project.这工作成功，所以问题是特定于项目的。
I tried two different US regions with the same results.我尝试了两个不同的美国地区，结果相同。
Since this is typically automated, I attempted to look for any exceeded quotas.由于这通常是自动的，因此我试图查找任何超出的配额。 On the Cloud Run quotas page and the overall quotas page, I don't see any exceeded quotas now or historically.在 Cloud Run 配额页面和总体配额页面上，我现在和历史上都没有看到任何超出配额的情况。 However, this is an area I'm not super familiar with, so may have missed something.然而，这是一个我不太熟悉的领域，所以可能漏掉了一些东西。
Retrying dozens of times over the course of two days.这两天重试了几十次。
The GCP status page shows no outages. GCP 状态页面显示没有中断。

What are additional troubleshooting steps I should take to investigate & fix this issue?我应该采取哪些额外的故障排除步骤来调查和解决此问题？

Partial info from the YAML tab in the GCP Console for the failing Service: GCP Console 中YAML选项卡中失败服务的部分信息：

status:
  observedGeneration: 1
  conditions:
  - type: Ready
    status: Unknown
    message: 'Cloud Run error: Internal system error, system will retry later.'
    lastTransitionTime: '2020-10-08T21:07:20.844314Z'
  - type: ConfigurationsReady
    status: Unknown
    message: 'Cloud Run error: Internal system error, system will retry later.'
    lastTransitionTime: '2020-10-08T21:07:20.755212Z'
  - type: RoutesReady
    status: Unknown
    message: 'Cloud Run error: Internal system error, system will retry later.'
    lastTransitionTime: '2020-10-08T21:07:20.844314Z'
  latestCreatedRevisionName: cloud-run-gen-name-3bab80f75cfd57cf87ad89d9d2c18ba3-00001-fus

Answer 1

After quite a bit of trial and error, I got everything working again.经过多次试验和错误后，我让一切重新开始工作。

The first thing I did that made some progress was to disable the Cloud Run Admin API and re-enable it.我所做的取得一些进展的第一件事是禁用 Cloud Run Admin API 并重新启用它。 After that change, I was able to create a service using the example container from the Console, logged in as the project owner.更改之后，我能够使用控制台中的示例容器创建服务，并以项目所有者身份登录。 I was also able to create a service using the example container from the CLI, logged in as the CI service account.我还能够使用 CLI 中的示例容器创建服务，以 CI 服务帐户登录。 However, the original command from my question still had identical behavior as before.但是，我的问题中的原始命令仍然具有与以前相同的行为。 I have no idea how the project got in this state, such that the project owner couldn't use Cloud Run.不知道这个state这个项目是怎么搞到的，导致项目主无法使用Cloud Run。

The second thing I did was to re-push the container image I was trying to use ( gcr.io/adapt-ci/http-echo ) to GCR.我做的第二件事是将我尝试使用的容器映像 ( gcr.io/adapt-ci/http-echo ) 重新推送到 GCR。 I pushed the exact same image as was there previously.我推送了与之前完全相同的图片。 This finally allowed the CI system to successfully create the Service.这最终让 CI 系统成功创建了 Service。

As part of my earlier troubleshooting, I had looked at Google Container Registry for this project and had confirmed that the needed image was still present.作为我之前故障排除的一部分，我查看了该项目的 Google Container Registry，并确认所需的图像仍然存在。 However, we had somewhat recently enabled a lifecycle policy on the Cloud Storage bucket to delete items older than a certain amount of time.但是，我们最近在 Cloud Storage 存储桶上启用了生命周期策略，以删除超过一定时间的项目。 So my best guess is that policy deleted some, but not all of the files associated with the gcr.io/adapt-ci/http-echo image and this resulted in the internal error instead of an error saying that the container image couldn't be found.所以我最好的猜测是政策删除了一些但不是所有与gcr.io/adapt-ci/http-echo图像相关的文件，这导致了内部错误而不是错误说容器图像不能被发现。

是什么导致“Cloud Run 错误：内部系统错误，系统将稍后重试”？故障排除建议？

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-10-11 19:37:29

是什么导致“Cloud Run 错误：内部系统错误，系统将稍后重试”？ 故障排除建议？

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-10-11 19:37:29

是什么导致“Cloud Run 错误：内部系统错误，系统将稍后重试”？故障排除建议？

解决方案1
3 已采纳 2020-10-11 19:37:29