繁体   English   中英

Python 数据流作业提交失败

[英]Python dataflow job fails to submit

我们在 GCP 上有一个 kubernetes cron 作业,它提交相同 Python 数据流作业的多个副本,每个副本都在自己的容器中。 每当我们需要作业的新副本时,我们只需将其添加到 cron 作业 yaml 的 spec->jobTemplate->spec->template->spec->containers 部分,并根据需要调整数据流作业参数。 这通常可以正常工作,但我们尝试添加的最新副本不起作用。 现有副本都仍在按预期工作。 该作业似乎在提交到 GCP 时失败,并且错误消息不是很有帮助:

Traceback (most recent call last):
  File "/app/job.py", line 117, in <module>
    newness.pipeline.run_dataflow(sys.argv)
  File "/app/newness/pipeline.py", line 480, in run_dataflow
    result = pipe.run()
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 403, in run
    self.to_runner_api(), self.runner, self._options).run(False)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/pipeline.py", line 416, in run
    return self.runner.run_pipeline(self)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 389, in run_pipeline
    self.dataflow_client.create_job(self.job), self)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 504, in create_job
    return self.submit_job_description(job)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, in wrapper
    return fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 551, in submit_job_description
    response = self._client.projects_locations_jobs.Create(request)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 578, in Create
    config, request, global_params=global_params)
  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 731, in _RunMethod
    return self.ProcessHttpResponse(method_config, http_response, request)
  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
    self.__ProcessHttpResponse(method_config, http_response, request))
  File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 604, in __ProcessHttpResponse
    http_response, method_config=method_config, request=request)
apitools.base.py.exceptions.HttpError: <exception str() failed>

该作业根本不会出现在数据流控制台中。

容器日志的前几行如下所示:

2019-10-13T03:57:47.725542287Z Successfully downloaded apache-beam

2019-10-13T03:58:17.125601280Z INFO:root:Staging SDK sources from PyPI to gs://gcs-bucket-name/staging/newness-boosting-c2898.1570936519.827087/dataflow_python_sdk.tar

2019-10-13T03:58:17.324843623Z INFO:root:Starting GCS upload to gs://gcs-bucket-name/staging/newness-boosting-c2898.1570936519.827087/dataflow_python_sdk.tar...

2019-10-13T03:58:24.825657227Z INFO:root:Completed GCS upload to gs://gcs-bucket-name/staging/newness-boosting-c2898.1570936519.827087/dataflow_python_sdk.tar

2019-10-13T03:58:25.225646529Z INFO:root:Downloading binary distribtution of the SDK from PyPi

2019-10-13T03:58:25.225716554Z INFO:root:Executing command: ['/usr/bin/python', '-m', 'pip', 'download', '--dest', '/tmp/tmpk5TfMS', 'apache-beam==2.8.0', '--no-deps', '--only-binary', ':all:', '--python-version', '27', '--implementation', 'cp', '--abi', 'cp27mu', '--platform', 'manylinux1_x86_64']

2019-10-13T03:59:33.926186906Z Collecting apache-beam==2.8.0

2019-10-13T03:59:52.125678183Z   Using cached https://files.pythonhosted.org/packages/0f/63/ea5453ba656d060936acf41d2ec057f23aafd69649e2129ac66fdda67d48/apache_beam-2.8.0-cp27-cp27mu-manylinux1_x86_64.whl

2019-10-13T04:00:11.525435891Z   Saved /tmp/tmpk5TfMS/apache_beam-2.8.0-cp27-cp27mu-manylinux1_x86_64.whl

2019-10-13T04:00:12.025054706Z Successfully downloaded apache-beam

2019-10-13T04:00:26.726190542Z INFO:root:Staging binary distribution of the SDK from PyPI to gs://gcs-bucket-name/staging/newness-boosting-c2898.1570936519.827087/apache_beam-2.8.0-cp27-cp27mu-manylinux1_x86_64.whl

2019-10-13T04:00:26.825618945Z INFO:root:Starting GCS upload to gs://gcs-bucket-name/staging/newness-boosting-c2898.1570936519.827087/apache_beam-2.8.0-cp27-cp27mu-manylinux1_x86_64.whl...

2019-10-13T04:00:33.725522899Z INFO:root:Completed GCS upload to gs://gcs-bucket-name/staging/newness-boosting-c2898.1570936519.827087/apache_beam-2.8.0-cp27-cp27mu-manylinux1_x86_64.whl

2019-10-13T04:06:14.525017097Z Traceback (most recent call last):
...

为什么这个作业提交失败? 我们可以查看其他日志以了解此故障的原因吗?

(我们的大多数数据流作业都是用 Java 编写的,其中错误消息通常更有帮助。)

更新:使用 apache-beam 2.16 在本地(Windows)运行作业有同样的问题,但更多的日志记录细节:

...
INFO:root:Starting GCS upload to gs://gcs-bucket-name/staging/newness-boosting-c2898.1571606418.971000/apache_beam-2.16.0-cp27-cp27mu-manylinux1_x86_64.whl...
INFO:root:Completed GCS upload to gs://gcs-bucket-name/staging/newness-boosting-c2898.1571606418.971000/apache_beam-2.16.0-cp27-cp27mu-manylinux1_x86_64.whl in 3 seconds.
WARNING:root:Discarding unparseable args: ['job.py', '--days_history=30']
WARNING:root:Discarding unparseable args: ['job.py', '--days_history=30']
WARNING:root:Retry with exponential backoff: waiting for 2.64795143823 seconds before retrying submit_job_description because we caught exception: error: [Errno 10053] An established connection was aborted by the software in your host machine
 Traceback for above exception (most recent call last):
  File "C:\Python27\lib\site-packages\apache_beam\utils\retry.py", line 206, in wrapper
    return fun(*args, **kwargs)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\apiclient.py", line 593, in submit_job_description
    response = self._client.projects_locations_jobs.Create(request)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\clients\dataflow\dataflow_v1b3_client.py", line 657, in Create
    config, request, global_params=global_params)
  File "C:\Python27\lib\site-packages\apitools\base\py\base_api.py", line 729, in _RunMethod
    http, http_request, **opts)
  File "C:\Python27\lib\site-packages\apitools\base\py\http_wrapper.py", line 346, in MakeRequest
    check_response_func=check_response_func)
  File "C:\Python27\lib\site-packages\apitools\base\py\http_wrapper.py", line 396, in _MakeRequestNoRetry
    redirections=redirections, connection_type=connection_type)
  File "C:\Python27\lib\site-packages\oauth2client\transport.py", line 169, in new_request
    redirections, connection_type)
  File "C:\Python27\lib\site-packages\oauth2client\transport.py", line 169, in new_request
    redirections, connection_type)
  File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1694, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1434, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1390, in _conn_request
    response = conn.getresponse()
  File "C:\Python27\lib\httplib.py", line 1121, in getresponse
    response.begin()
  File "C:\Python27\lib\httplib.py", line 438, in begin
    version, status, reason = self._read_status()
  File "C:\Python27\lib\httplib.py", line 394, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "C:\Python27\lib\socket.py", line 480, in readline
    data = self._sock.recv(self._rbufsize)
  File "C:\Python27\lib\ssl.py", line 754, in recv
    return self.read(buflen)
  File "C:\Python27\lib\ssl.py", line 641, in read
    v = self._sslobj.read(len)

... retries 4 times total ...

Traceback (most recent call last):
  File "job.py", line 117, in <module>
    newness.pipeline.run_dataflow(sys.argv)
  File "C:\Users\LeeW\Desktop\newness\newness\pipeline.py", line 480, in run_dataflow
    result = pipe.run()
  File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 407, in run
    self._options).run(False)
  File "C:\Python27\lib\site-packages\apache_beam\pipeline.py", line 420, in run
    return self.runner.run_pipeline(self, self._options)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\dataflow_runner.py", line 485, in run_pipeline
    self.dataflow_client.create_job(self.job), self)
  File "C:\Python27\lib\site-packages\apache_beam\utils\retry.py", line 206, in wrapper
    return fun(*args, **kwargs)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\apiclient.py", line 546, in create_job
    return self.submit_job_description(job)
  File "C:\Python27\lib\site-packages\apache_beam\utils\retry.py", line 219, in wrapper
    raise_with_traceback(exn, exn_traceback)
  File "C:\Python27\lib\site-packages\apache_beam\utils\retry.py", line 206, in wrapper
    return fun(*args, **kwargs)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\apiclient.py", line 593, in submit_job_description
    response = self._client.projects_locations_jobs.Create(request)
  File "C:\Python27\lib\site-packages\apache_beam\runners\dataflow\internal\clients\dataflow\dataflow_v1b3_client.py", line 657, in Create
    config, request, global_params=global_params)
  File "C:\Python27\lib\site-packages\apitools\base\py\base_api.py", line 729, in _RunMethod
    http, http_request, **opts)
  File "C:\Python27\lib\site-packages\apitools\base\py\http_wrapper.py", line 346, in MakeRequest
    check_response_func=check_response_func)
  File "C:\Python27\lib\site-packages\apitools\base\py\http_wrapper.py", line 396, in _MakeRequestNoRetry
    redirections=redirections, connection_type=connection_type)
  File "C:\Python27\lib\site-packages\oauth2client\transport.py", line 169, in new_request
    redirections, connection_type)
  File "C:\Python27\lib\site-packages\oauth2client\transport.py", line 169, in new_request
    redirections, connection_type)
  File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1694, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1434, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "C:\Python27\lib\site-packages\httplib2\__init__.py", line 1390, in _conn_request
    response = conn.getresponse()
  File "C:\Python27\lib\httplib.py", line 1121, in getresponse
    response.begin()
  File "C:\Python27\lib\httplib.py", line 438, in begin
    version, status, reason = self._read_status()
  File "C:\Python27\lib\httplib.py", line 394, in _read_status
    line = self.fp.readline(_MAXLINE + 1)
  File "C:\Python27\lib\socket.py", line 480, in readline
    data = self._sock.recv(self._rbufsize)
  File "C:\Python27\lib\ssl.py", line 754, in recv
    return self.read(buflen)
  File "C:\Python27\lib\ssl.py", line 641, in read
    v = self._sslobj.read(len)
socket.error: [Errno 10053] An established connection was aborted by the software in your host machine

您使用的是哪个版本的 Beam Python SDK?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM