![](/img/trans.png)
[英]Apache AIrflow KubernetesExecutor and KubernetesPodOperator: xcom pushes not working
[英]Airflow KubernetesExecutor scheduler kube watch process dies
在 AWS 上有一个 K8S 集群,尝试在其中部署带有KubernetesExecutor
Airflow Webserver + Scheduler。 不幸的是,我每次触发网络服务器DAG中,在read_timeout
(在规定的时间量airflow.cfg
)调度提出了这个错误:
[2019-11-27 11:25:26,607] {kubernetes_executor.py:440} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2019-11-27 11:25:26,617] {kubernetes_executor.py:344} INFO - Event: and now my watch begins starting at resource_version: 0
[2019-11-27 11:26:26,700] {kubernetes_executor.py:335} ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 294, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1840, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1646, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 360, in _error_catcher
yield
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 666, in read_chunked
self._update_chunk_length()
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 598, in _update_chunk_length
line = self._fp.fp.readline()
File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 307, in recv_into
raise timeout('The read operation timed out')
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 333, in run
self.worker_uuid, self.kube_config)
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 357, in _run
**kwargs):
File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 694, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 365, in _error_catcher
raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='100.64.0.1', port=443): Read timed out.
Process KubernetesJobWatcher-16:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 294, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1840, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1646, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 360, in _error_catcher
yield
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 666, in read_chunked
self._update_chunk_length()
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 598, in _update_chunk_length
line = self._fp.fp.readline()
File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 307, in recv_into
raise timeout('The read operation timed out')
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 333, in run
self.worker_uuid, self.kube_config)
File "/usr/local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py", line 357, in _run
**kwargs):
File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 144, in stream
for line in iter_resp_lines(resp):
File "/usr/local/lib/python3.7/site-packages/kubernetes/watch/watch.py", line 48, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 694, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 365, in _error_catcher
raise ReadTimeoutError(self._pool, None, 'Read timed out.')
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='100.64.0.1', port=443): Read timed out.
[2019-11-27 11:26:26,898] {kubernetes_executor.py:440} ERROR - Error while health checking kube watcher process. Process died for unknown reasons
[2019-11-27 11:26:26,968] {kubernetes_executor.py:344} INFO - Event: and now my watch begins starting at resource_version: 0
PostgreSQL 通过 helm charts 安装。
kubectl 版本。
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.3", GitCommit:"b3cbbae08ec52a7fc73d334838e18d17e8512749", GitTreeState:"clean", BuildDate:"2019-11-14T04:24:29Z", GoVersion:"go1.12.13", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.8", GitCommit:"4e209c9383fa00631d124c8adcc011d617339b3c", GitTreeState:"clean", BuildDate:"2019-02-28T18:40:05Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
100.64.0.1 是一个 kubernetes 服务(集群 ip)。
有什么建议?
根据我写给一个问题的评论,这个问题不会干扰 Pod 的运行。 然而,它存在。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.