[英]GCP deploy instance fails from ansible script
I've been deploying clusters in GCP via ansible scripts for more then a year now, but all of a sudden one of my scripts keeps giving me this error: 一年多以前,我一直在通过ansible脚本在GCP中部署集群,但是突然之间,我的一个脚本一直在给我这个错误:
libcloud.common.google.GoogleBaseError: u\\"The zone 'projects/[project]/zones/europe-west1-d' does not have enough resources available to fulfill the request. Try a different zone, or try again later.
libcloud.common.google.GoogleBaseError:u \\“区域'projects / [project] / zones / europe-west1-d'没有足够的资源来满足请求。请尝试其他区域,或稍后再试。
The obvious reason would be that I don't have enough resources, but not a whole lot has changed and quotas look good: 显而易见的原因是我没有足够的资源,但是并没有改变很多,而且配额看起来不错:
The ansible script itself doesn't ask for a lot. ansible脚本本身并不需要太多。 I'm creating 3 instances of n1-standard-4 with 100GB SSD.
我正在使用100GB SSD创建3个n1-standard-4实例。 See snippet of script below:
请参见下面的脚本片段:
tasks:
- name: create boot disks
gce_pd:
disk_type: pd-ssd
image: "debian-9-stretch-v20171025"
name: "{{ item.node }}-disk"
size_gb: 100
state: present
zone: "europe-west1-d"
service_account_email: "{{ service_account_email }}"
credentials_file: "{{ credentials_file }}"
project_id: "{{ project_id }}"
with_items: "{{nodes}}"
async: 3600
poll: 2
- name: create instances
gce:
instance_names: "{{item.node}}"
zone: "europe-west1-d"
machine_type: "n1-standard-4"
preemptible: "{{ false if item.num == '0' else true }}"
disk_auto_delete: true
disks:
- name: "{{ item.node }}-disk"
mode: READ_WRITE
state: present
service_account_email: "{{ service_account_email }}"
service_account_permissions: "compute-rw"
credentials_file: "{{ credentials_file }}"
project_id: "{{ project_id }}"
tags: "elasticsearch"
register: gce_raw_results
with_items: "{{nodes}}"
async: 3600
poll: 2
Update 1: 更新1:
preemptible: "{{ false if item.num == '0' else true }}"
If I turn off preemptible (false) then it runs without a hitch. preemptible: "{{ false if item.num == '0' else true }}"
如果我关闭了可抢占(false)然后运行顺利。 The 'workaround' seems to be just don't use preemptible instances, but this used to work for a year without failing once. The full error is: 完整的错误是:
TASK [Gathering Facts] ****************************************************************************************************************************************************************************************************************************************************************************************************** ok: [localhost]
任务[聚会事实] ************************************************* ************************************************** ************************************************** ************************************************** ************************************************** ************************************************* 好:[本地主机]
TASK [create boot disks] **************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) ok: [localhost] => (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-standa
任务[创建启动盘] ************************************************ ************************************************** ************************************************** ************************************************** ************************************************** ****************************************************已更改: [localhost] =>(item = {u'node':u'elasticsearch-link-0',u'ip_field':u'private_ip',u'zone':u'europe-west1-d',u'cluster_name ':u'elasticsearch-link',u'num':u'0',u'machine_type':u'n1-standard-4',u'project_id':u'[projectid]'})更改为:[本地主机] =>(item = {u'node':u'elasticsearch-link-1',u'ip_field':u'private_ip',u'zone':u'europe-west1-d',u'cluster_name': u'elasticsearch-link',u'num':u'1',u'machine_type':u'n1-standard-4',u'project_id':u'[projectid]'})好:[localhost] = >(item = {u'node':u'elasticsearch-link-2',u'ip_field':u'private_ip',u'zone':u'europe-west1-d',u'cluster_name':u' elasticsearch-link',u'num':u'2',u'machine_type':u'n1-standa rd-4', u'project_id': u'[projectid]'})
rd-4',u'project_id':u'[projectid]'})
TASK [create instances] ***************************************************************************************************************************************************************************************************************************************************************************************************** changed: [localhost] => (item={u'node': u'elasticsearch-link-0', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'0', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) changed: [localhost] => (item={u'node': u'elasticsearch-link-1', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'1', u'machine_type': u'n1-standard-4', u'project_id': u'[projectid]'}) failed: [localhost] (item={u'node': u'elasticsearch-link-2', u'ip_field': u'private_ip', u'zone': u'europe-west1-d', u'cluster_name': u'elasticsearch-link', u'num': u'2', u'machine_type': u'n1-stand
任务[创建实例] ************************************************* ************************************************** ************************************************** ************************************************** ************************************************** ****************************************************已更改: [localhost] =>(item = {u'node':u'elasticsearch-link-0',u'ip_field':u'private_ip',u'zone':u'europe-west1-d',u'cluster_name ':u'elasticsearch-link',u'num':u'0',u'machine_type':u'n1-standard-4',u'project_id':u'[projectid]'})更改为:[本地主机] =>(item = {u'node':u'elasticsearch-link-1',u'ip_field':u'private_ip',u'zone':u'europe-west1-d',u'cluster_name': u'elasticsearch-link',u'num':u'1',u'machine_type':u'n1-standard-4',u'project_id':u'[projectid]'})失败:[localhost]( item = {u'node':u'elasticsearch-link-2',u'ip_field':u'private_ip',u'zone':u'europe-west1-d',u'cluster_name':u'elasticsearch- link',u'num':u'2',u'machine_type':u'n1-stand ard-4', u'project_id': u'[projectid]'}) => {"ansible_job_id": "371957735383.2688", "changed": false, "cmd": "/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/gce.py", "data": "", "failed": 1, "finished": 1, "item": {"cluster_name": "elasticsearch-link", "ip_field": "private_ip", "machine_type": "n1-standard-4", "node": "elasticsearch-link-2", "num": "2", "project_id": "[projectid]", "zone": "europe-west1-d"}, "msg": "Traceback (most recent call last):\\n File \\"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\\", line 158, in _run_module\\n (filtered_outdata, json_warnings) = _filter_non_json_lines(outdata)\\n File \\"/tmp/.ansible-airflow/ansible-tmp-1522742180.0-71790706749341/async_wrapper.py\\", line 99, in _filter_non_json_lines\\n raise ValueError('No start of json char found')\\nValueError: No start of json char found\\n", "stderr": "Traceback (most recent call last):\\n File \\"/tmp/ansible_OnIK1e/ansible_module_gce.py\\", line 750, in \\n main()\\n
ard-4',u'project_id':u'[projectid]'})=> {“ ansible_job_id”:“ 371957735383.2688”,“ changed”:false,“ cmd”:“ /tmp/.ansible-airflow/ansible- tmp-1522742180.0-71790706749341 / gce.py“,” data“:”“,” failed“:1,” finished“:1,” item“:{” cluster_name“:” elasticsearch-link“,” ip_field“:” private_ip”,“ machine_type”:“ n1-standard-4”,“ node”:“ elasticsearch-link-2”,“ num”:“ 2”,“ project_id”:“ [projectid]”,“ zone”:“ europe-west1-d“},” msg“:”追踪(最近一次通话最近):\\ n文件\\“ / tmp / .ansible-airflow / ansible-tmp-1522742180.0-71790706749341 / async_wrapper.py \\”,第158行,在_run_module \\ n(filtered_outdata,json_warnings)= _filter_non_json_lines(outdata)\\ n文件\\“ / tmp / .ansible-airflow / ansible-tmp-1522742180.0-71790706749341 / async_wrapper.py \\”中,第99行,在_filter_non_json_lines \\ n中ValueError('找不到json字符的开始')\\ nValueError:找不到json字符的开始\\ n“,” stderr“:”追踪(最近一次调用是最近的):\\ n File \\“ / tmp / ansible_OnIK1e / ansible_module_gce.py \\“,第750行,位于\\ n main()\\ n中 File \\"/tmp/ansible_OnIK1e/ansible_module_gce.py\\", line 712, in main\\n module, gce, inames, number)\\n File \\"/tmp/ansible_OnIK1e/ansible_module_gce.py\\", line 524, in create_instances\\n instance, lc_machine_type, lc_image(), **gce_args\\n File \\"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\\", line 3874, in create_node\\n self.connection.async_request(request, method='POST', data=node_data)\\n File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\\", line 784, in async_request\\n response = request(**kwargs)\\n File \\"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\\", line 121, in request\\n response = super(GCEConnection, self).request(*args, **kwargs)\\n File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\\", line 806, in request\\n *args, **kwargs)\\n File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\\", line 641, in request\\n response = responseCls(**kwargs)\\n File \\"/usr/local/lib/python2
文件\\“ / tmp / ansible_OnIK1e / ansible_module_gce.py \\”,行712,在主\\ n模块中,gce,inames,数字)\\ n文件\\“ / tmp / ansible_OnIK1e / ansible_module_gce.py \\”,行524,在create_instances中\\ n实例,lc_machine_type,lc_image(),** gce_args \\ n文件\\“ / usr / local / lib / python2.7 / dist-packages / libcloud / compute / drivers / gce.py \\”,行3874,位于create_node中\\ n self.connection.async_request(请求,方法=“ POST”,数据= node_data)\\ n文件\\“ / usr / local / lib / python2.7 / dist-packages / libcloud / common / base.py \\”,第784行,位于async_request \\ n response = request(** kwargs)\\ n文件\\“ / usr / local / lib / python2.7 / dist-packages / libcloud / compute / drivers / gce.py \\”,第121行,在请求中\\ n response = super(GCEConnection,self).request(* args,** kwargs)\\ n文件\\“ / usr / local / lib / python2.7 / dist-packages / libcloud / common / google.py \\ “,位于请求的第806行\\ n * args,** kwargs)\\ n文件\\“ / usr / local / lib / python2.7 / dist-packages / libcloud / common / base.py \\”,位于第641行request \\ n response = responseCls(** kwargs)\\ n文件\\“ / usr / local / lib / python2 .7/dist-packages/libcloud/common/base.py\\", line 163, in init \\n self.object = self.parse_body()\\n File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\\", line 268, in parse_body\\n raise GoogleBaseError(message, self.status, code)\\nlibcloud.common.google.GoogleBaseError: u\\"The zone 'projects/[projectid]/zones/europe-west1-d' does not have enough resources available to fulfill the request.
.7 / dist-packages / libcloud / common / base.py \\“,第163行, init \\ n self.object = self.parse_body()\\ n File \\” / usr / local / lib / python2.7 / dist -packages / libcloud / common / google.py \\“,行268,位于parse_body \\ n中,引发GoogleBaseError(消息,self.status,代码)\\ nlibcloud.common.google.GoogleBaseError:u \\”区域'projects / [projectid ] / zones / europe-west1-d'没有足够的资源来满足请求。 Try a different zone, or try again later.\\"\\n", "stderr_lines": ["Traceback (most recent call last):", " File \\"/tmp/ansible_OnIK1e/ansible_module_gce.py\\", line 750, in ", " main()", " File \\"/tmp/ansible_OnIK1e/ansible_module_gce.py\\", line 712, in main", "
尝试使用其他区域,或稍后再试。\\“ \\ n”,“ stderr_lines”:[“追踪(最近一次通话最近):”,“ File \\” / tmp / ansible_OnIK1e / ansible_module_gce.py \\“,第750行,在“,“ main()”,“ File \\” / tmp / ansible_OnIK1e / ansible_module_gce.py \\”,“ main”中的第712行中,
module, gce, inames, number)", " File \\"/tmp/ansible_OnIK1e/ansible_module_gce.py\\", line 524, in create_instances", " instance, lc_machine_type, lc_image(), **gce_args", " File \\"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\\", line 3874, in create_node", "module,gce,inames,number)“,” File \\“ / tmp / ansible_OnIK1e / ansible_module_gce.py \\”,行524,在create_instances中“,” instance,lc_machine_type,lc_image(),** gce_args“,” File \\“ /usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py \\“,第3874行,位于create_node中,”,“
self.connection.async_request(request, method='POST', data=node_data)", " File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\\", line 784, in async_request", " response = request(**kwargs)", " File \\"/usr/local/lib/python2.7/dist-packages/libcloud/compute/drivers/gce.py\\", line 121, in request", " response = super(GCEConnection, self).request(*args, **kwargs)", " File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\\", line 806, in request", " *args, **kwargs)", " File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\\", line 641, in request", " response = responseCls(**kwargs)", " File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/base.py\\", line 163, in init ", " self.object = self.parse_body()", " File \\"/usr/local/lib/python2.7/dist-packages/libcloud/common/google.py\\", line 268, in parse_body", " raise GoogleBaseError(message, self.status, code)", "libcloud.common.google.GoogleBaseError: u\\"The zone 'projects/[projectidself.connection.async_request(request,method ='POST',data = node_data)“,” File \\“ / usr / local / lib / python2.7 / dist-packages / libcloud / common / base.py \\”,行784,在async_request中”,“ response = request(** kwargs)”,“ File \\” / usr / local / lib / python2.7 / dist-packages / libcloud / compute / drivers / gce.py \\”,第121行,在请求中”,“ response = super(GCEConnection,self).request(* args,** kwargs)”,“ File \\” / usr / local / lib / python2.7 / dist-packages / libcloud / common / google .py \\”,第806行,在请求中”,“ * args,** kwargs)”,“ File \\” / usr / local / lib / python2.7 / dist-packages / libcloud / common / base.py \\” ,第641行,在请求中”,“ response = responseCls(** kwargs)”,“ File \\” / usr / local / lib / python2.7 / dist-packages / libcloud / common / base.py \\”,第163行,在init的 “”,“ self.object = self.parse_body()”,“ File \\” / usr / local / lib / python2.7 / dist-packages / libcloud / common / google.py \\”的第268行中parse_body“,”引发GoogleBaseError(消息,self.status,代码)“,” libcloud.common.google.GoogleBaseError:u \\“区域'projects / [projectid ]/zones/europe-west1-d' does not have enough resources available to fulfill the request.
] / zones / europe-west1-d'没有足够的资源来满足请求。 Try a different zone, or try again later.\\""]} to retry, use: --limit @/usr/local/airflow/ansible/playbooks/elasticsearch-link-cluster-create.retry
尝试使用其他区域,或稍后再试。\\“”]}重试,请使用:--limit @ / usr / local / airflow / ansible / playbooks / elasticsearch-link-cluster-create.retry
The error message is not showing that is an error with the quota, but rather an issue with the zone resources, I would advise you to try a new zone. 错误消息没有显示配额错误,而是区域资源问题,我建议您尝试一个新区域。
Quoting from the documentation : 引用文档 :
Even if you have a regional quota, it is possible that a resource might not be available in a specific zone.
即使您具有区域配额,也可能在特定区域中资源不可用。 For example, you might have quota in region us-central1 to create VM instances, but might not be able to create VM instances in the zone us-central1-a if the zone is depleted.
例如,您可能在区域us-central1中有配额来创建VM实例,但是如果该区域已耗尽,则可能无法在区域us-central1-a中创建VM实例。 In such cases, try creating the same resource in another zone, such as us-central1-f.
在这种情况下,请尝试在另一个区域(例如us-central1-f)中创建相同的资源。
Therefore when creating the script you should take this possibility into account even if it is not so common. 因此,在创建脚本时,即使这种情况并不常见,也应考虑到这种可能性。
This issue is even more highlithed in case of preentible instances since: 对于容易出现的实例,此问题更加重要,因为:
Preemptible instances are finite Compute Engine resources, so they might not always be available.
可抢占的实例是有限的Compute Engine资源,因此它们可能并不总是可用。 [...] these instances if it requires access to those resources for other tasks.
这些实例,如果它需要访问这些资源来执行其他任务。 Preemptible instances are excess Compute Engine capacity so their availability varies with usage.
可抢占实例是Compute Engine的多余容量,因此其可用性随使用情况而变化。
UPDATE UPDATE
To doublecheck what I am saying you can try to keep the preentible flag and change the zone to be sure the script it is working properly and it is a stockout happening during the evening (and since during the day it works this should be the case). 要再次确认我在说什么,您可以尝试保留preentible标志并更改区域,以确保脚本正常运行并且在晚上发生断货(而且在白天,应该是这种情况) 。
As I promised I created on your behalf the feature request, you can follow the updates on the public tracker. 正如我所承诺的代表您创建功能请求一样,您可以在公共跟踪器上关注更新。 I advise you to start it in order to receive the updates on the email:
我建议您启动它,以便通过电子邮件接收更新:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.