I'm attempting to deploy a Cloud Run Service as part of tests for my open source project. This is done via our automated CI/CD system and has worked successfully hundreds of times previously.
The Cloud Run Service gets created but the first revision never gets deployed. When I look at the newly created Service in the GCP Console, it shows "Cloud Run error: Internal system error, system will retry later." as the main status message for the Service.
The command line that is failing is:
gcloud --configuration=adapt-cloud-gcloud-testing --quiet run deploy cloud-run-gen-name-a179e65d6fdfc19abc57e15df563d8cb --platform=managed --format=json --no-allow-unauthenticated --memory=128M --cpu=1 --image=gcr.io/adapt-ci/http-echo --region=us-central1 --port=5678 --set-env-vars=ADAPT_TEST_DEPLOY_ID=MockDeploy-aymb --args="-text,Adapt Test"
The output from that command (note: the dots after Creating Revision
just keep going):
Deploying container to Cloud Run service [cloud-run-gen-name-a179e65d6fdfc19abc57e15df563d8cb] in project [adapt-ci] region [us-central1]
Deploying new service...
Creating Revision....................................................................................................................
The YAML tab in the Console also shows the same message for each of the three status conditions (see below).
To troubleshoot, I have also tried:
What are additional troubleshooting steps I should take to investigate & fix this issue?
Partial info from the YAML
tab in the GCP Console for the failing Service:
status:
observedGeneration: 1
conditions:
- type: Ready
status: Unknown
message: 'Cloud Run error: Internal system error, system will retry later.'
lastTransitionTime: '2020-10-08T21:07:20.844314Z'
- type: ConfigurationsReady
status: Unknown
message: 'Cloud Run error: Internal system error, system will retry later.'
lastTransitionTime: '2020-10-08T21:07:20.755212Z'
- type: RoutesReady
status: Unknown
message: 'Cloud Run error: Internal system error, system will retry later.'
lastTransitionTime: '2020-10-08T21:07:20.844314Z'
latestCreatedRevisionName: cloud-run-gen-name-3bab80f75cfd57cf87ad89d9d2c18ba3-00001-fus
After quite a bit of trial and error, I got everything working again.
The first thing I did that made some progress was to disable the Cloud Run Admin API and re-enable it. After that change, I was able to create a service using the example container from the Console, logged in as the project owner. I was also able to create a service using the example container from the CLI, logged in as the CI service account. However, the original command from my question still had identical behavior as before. I have no idea how the project got in this state, such that the project owner couldn't use Cloud Run.
The second thing I did was to re-push the container image I was trying to use ( gcr.io/adapt-ci/http-echo
) to GCR. I pushed the exact same image as was there previously. This finally allowed the CI system to successfully create the Service.
As part of my earlier troubleshooting, I had looked at Google Container Registry for this project and had confirmed that the needed image was still present. However, we had somewhat recently enabled a lifecycle policy on the Cloud Storage bucket to delete items older than a certain amount of time. So my best guess is that policy deleted some, but not all of the files associated with the gcr.io/adapt-ci/http-echo
image and this resulted in the internal error instead of an error saying that the container image couldn't be found.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.