简体   繁体   English

数据流管道“与服务失去联系”

[英]Dataflow pipeline "lost contact with the service"

I'm running into trouble with an Apache Beam pipline on Google Cloud Dataflow.我在使用 Google Cloud Dataflow 上的 Apache Beam 管道时遇到了麻烦。

The pipeline is simple: Reading json from GCS, extracting text from some nested fields, writing back to GCS.管道很简单:从 GCS 读取 json,从一些嵌套字段中提取文本,写回 GCS。

It works fine when testing with a smaller subset of input files but when I run it on the full data set, I get the following error (after running fine through around 260M items).当使用较小的输入文件子集进行测试时,它工作正常,但是当我在完整数据集上运行它时,出现以下错误(在正常运行大约 260M 项之后)。

Somehow the "worker eventually lost contact with the service"不知何故,“工人最终失去了与服务的联系”

  (8662a188e74dae87): Workflow failed. Causes: (95e9c3f710c71bc2): S04:ReadFromTextWithFilename/Read+FlatMap(extract_text_from_raw)+RemoveLineBreaks+FormatText+WriteText/Write/WriteImpl/WriteBundles/Do+WriteText/Write/WriteImpl/Pair+WriteText/Write/WriteImpl/WindowInto(WindowIntoFn)+WriteText/Write/WriteImpl/GroupByKey/Reify+WriteText/Write/WriteImpl/GroupByKey/Write failed., (da6389e4b594e34b): A work item was attempted 4 times without success. Each time the worker eventually lost contact with the service. The work item was attempted on: 
  extract-tags-150110997000-07261602-0a01-harness-jzcn,
  extract-tags-150110997000-07261602-0a01-harness-828c,
  extract-tags-150110997000-07261602-0a01-harness-3w45,
  extract-tags-150110997000-07261602-0a01-harness-zn6v

The Stacktrace shows a Failed to update work status / Progress reporting thread got error error: Stacktrace 显示Failed to update work status / Progress reporting thread got error错误:

Exception in worker loop: Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 776, in run deferred_exception_details=deferred_exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 629, in do_work exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 168, in wrapper return fun(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 490, in report_completion_status exception_details=exception_details) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 298, in report_status work_executor=self._work_executor) File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 333, in report_status self._client.projects_locations_jobs_workItems.ReportStatus(request)) File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py", line 467, in ReportStatus config, request, global_params=global_params) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 723, in _RunMethod return self.ProcessHttpResponse(method_config, http_response, request) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 729, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, http_response, request)) File "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 600, in __ProcessHttpResponse http_response.request_url, method_config, request) HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/qollaboration-live/locations/us-central1/jobs/2017-07-26_16_02_36-1885237888618334364/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '360', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 26 Jul 2017 23:54:12 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(7f8a0ec09d20c3a3): Failed to publish the result of the work update. Causes: (7f8a0ec09d20cd48): Failed to update work status. Causes: (afa1cd74b2e65619): Failed to update work status., (afa1cd74b2e65caa): Work \"6306998912537661254\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >

And Finally:最后:

HttpError: HttpError accessing <https://dataflow.googleapis.com/v1b3/projects/[projectid-redacted]/locations/us-central1/jobs/2017-07-26_18_28_43-10867107563808864085/workItems:reportStatus?alt=json>: response: <{'status': '400', 'content-length': '358', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Thu, 27 Jul 2017 02:00:10 GMT', 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 400, "message": "(5845363977e915c1): Failed to publish the result of the work update. Causes: (5845363977e913a8): Failed to update work status. Causes: (44379dfdb8c2b47): Failed to update work status., (44379dfdb8c2e88): Work \"9100669328839864782\" not leased (or the lease was lost).", "status": "INVALID_ARGUMENT" } } >
at __ProcessHttpResponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:600)
at ProcessHttpResponse (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:729)
at _RunMethod (/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py:723)
at ReportStatus (/usr/local/lib/python2.7/dist-packages/apache_beam/runners/dataflow/internal/clients/dataflow/dataflow_v1b3_client.py:467)
at report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py:333)
at report_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:298)
at report_completion_status (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:490)
at wrapper (/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py:168)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:629)
at run (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:776)

This looks like an error to the data flow internals to me.对我来说,这看起来像是数据流内部的错误。 Can anyone confirm?任何人都可以确认吗? Are there any workarounds?有什么解决方法吗?

The HttpError typically appears after the workflow has failed and is part of the failure/teardown process. HttpError 通常在工作流失败出现,并且是失败/拆卸过程的一部分。

It looks like there were others error reported in your pipeline, such as the following.您的管道中似乎报告了其他错误,例如以下错误。 Note that if the same elements fail 4 times it will be marked failing.请注意,如果相同的元素失败 4 次,它将被标记为失败。

Try looking the Stack Traces section in the UI to identify the other errors and their stack traces.尝试查看 UI 中的Stack Traces部分以识别其他错误及其堆栈跟踪。 Since this only occurs on the larger dataset, consider the possibility of their being malformed elements that only exist in the larger dataset.由于这只发生在较大的数据集上,请考虑它们是仅存在于较大数据集中的畸形元素的可能性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM