简体   繁体   English

为什么gcloud ml-engine Submit命令给出“请求的cpu超出配额”?

[英]Why does gcloud ml-engine submit command give “requested cpu s exceed quota”?

I am running a tensorflow object detection job on GCP with the folowing command: 我正在使用以下命令在GCP上运行一个tensorflow对象检测作业:

gcloud ml-engine jobs submit training whoami _object_detection_ date +%s --job-dir=gs://${YOUR_GCS_BUCKET}/train --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz --module-name object_detection.model_tpu_main --runtime-version 1.9 --scale-tier BASIC_TPU --region us-central1 -- --model_dir=gs://${YOUR_GCS_BUCKET}/train --tpu_zone us-central1 --pipeline_config_path=gs://${YOUR_GCS_BUCKET}/data/pinches_pipeline.config gcloud ml-engine作业提交培训whoami _object_detection_ date +%s --job-dir = gs:// $ {YOUR_GCS_BUCKET} / train --package dist / object_detection-0.1.tar.gz,slim / dist / slim-0.1。 tar.gz,/ tmp / pycocotools / pycocotools-2.0.tar.gz --module-name object_detection.model_tpu_main --runtime-version 1.9 --scale-tier BASIC_TPU --region us-central1---model_dir = gs: // $ {YOUR_GCS_BUCKET} / train --tpu_zone us-central1 --pipeline_config_path = gs:// $ {YOUR_GCS_BUCKET} /data/pinches_pipeline.config

Got the following error: 出现以下错误:

ERROR: (gcloud.ml-engine.jobs.submit.training) RESOURCE_EXHAUSTED: Quota failure for project seal-pinches. 错误:(gcloud.ml-engine.jobs.submit.training)RESOURCE_EXHAUSTED:项目失败的配额失败。 The requested 54.0 CPUs exceeds the allowed maximum of 20.0. 请求的54.0 CPU超过了允许的最大值20.0。 To read more about Cloud ML Engine quota, see https://cloud.google.com/ml-engine/quotas . 要了解有关Cloud ML Engine配额的更多信息,请参阅https://cloud.google.com/ml-engine/quotas - '@type': type.googleapis.com/google.rpc.QuotaFailure violations: - description: The requested 54.0 CPUs exceeds the allowed maximum of 20.0. -'@type':type.googleapis.com/google.rpc.Quota违规行为:-说明:请求的54.0 CPU超过了允许的最大值20.0。

My question is how the requested CPU getting set to 54? 我的问题是请求的CPU如何设置为54? I am not setting this anywhere explicitly. 我没有在任何地方明确设置此设置。

Thanks in advance. 提前致谢。

This option in your code is setting the size and type of your ml instance: 您代码中的此选项用于设置ml实例的大小和类型:

--scale-tier BASIC_TPU

The BASIC_TPU costs $6.8474 per hour. BASIC_TPU每小时收费$ 6.8474。 I am not sure of the formula, but a Cloud TPU translates into N CPUs in equivalent billing. 我不确定这个公式,但是Cloud TPU可以等效的计费转换为N个CPU。 You also need to add the cost of the Cloud ML Engine machine type to your cost: standard is $0.2774 per hour. 您还需要将Cloud ML Engine机器类型的成本添加到成本中:标准为每小时0.2774美元。

Google's description: Google的说明:

Quota is defined in terms of Cloud TPU cores. 配额是根据Cloud TPU核心定义的。 A single Cloud TPU device comprises 4 TPU chips and 8 cores: 2 cores per TPU chip. 单个Cloud TPU设备包含4个TPU芯片和8个核心:每个TPU芯片2个核心。 A Cloud TPU v2 Pod (alpha) consists of 64 TPU devices containing 256 TPU chips (512 cores). Cloud TPU v2 Pod(alpha)由包含256个TPU芯片(512个内核)的64个TPU设备组成。 The number of cores also specifies the quota for a particular Cloud TPU. 内核数还指定了特定Cloud TPU的配额。 For example, a quota of 8 enables the use of 8 cores. 例如,配额8允许使用8个内核。 A quota of 16 enables use of up to 16 cores, and so forth. 配额16允许使用多达16个内核,依此类推。

Your CPU quota is 20. You will need to increase your quota or choose a different model such as BASIC or BASIC_GPU which does not use TPUs. 您的CPU配额为20。您将需要增加配额或选择不使用TPU的其他模型,例如BASICBASIC_GPU Also double check that you have billing setup with a credit / debit card with sufficient credit available. 还要仔细检查您是否已使用具有足够信用额度的信用卡/借记卡进行了结算设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM