[英]Dataflow job failed after more than 6 hours with “The worker lost contact with the service”?
[英]Google Cloud Dataflow (Python SDK) : Workflow failed | Each time the worker process eventually lost contact with the service
我建立了一個工作流程,從Google Cloud Storage提取數據,在ParDo中執行轉換並將輸出轉儲到BigQuery。
import apache_beam as beam
import logging
class ParseValidateRecordDoFn(beam.DoFn):
def process(self, context):
# All transformations come here
return custom_object
try:
data = json.loads(context)
yield beam.pvalue.TaggedOutput('PASS', data)
except:
print "ERROR"
yield beam.pvalue.TaggedOutput('ERROR', context)
job_name = JOB_NAME
project = PROJECT_NAME
staging_location = STAGING_LOCATION
temp_location = TEMP_LOCATION
p = beam.Pipeline(argv=[
'--job_name', job_name,
'--project', project,
'--staging_location', staging_location,
'--temp_location', temp_location,
'--no_save_main_session',
'--runner', 'DataflowRunner',
'--num_workers', '25',
'--requirements_file', 'requirements.txt'])
text = p | "Reading Source" >> beam.io.ReadFromText('SOURCE LOCATION')
output_validate = text | beam.ParDo(ParseValidateRecordDoFn()).with_outputs('PASS','ERROR', main='main')
(output_validate.PASS | "Writing to BQ" >> beam.io.Write(beam.io.BigQuerySink('Table_name',
create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND, validate=True)))
(output_validate.ERROR | "Writing UNPARSED File" >> beam.io.WriteToText('ERROR_LOCATION'))
logging.getLogger().setLevel(logging.INFO)
p.run().wait_until_finish()
從本周開始,代碼開始引發錯誤:
錯誤堆棧跟蹤:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 778, in run
deferred_exception_details=deferred_exception_details)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 630, in do_work
exception_details=exception_details)
File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 168, in wrapper
return fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 491, in report_completion_status
exception_details=exception_details)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 299, in report_status
work_executor=self._work_executor)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 342, in report_status
append_counter(work_item_status, counter)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/workerapiclient.py", line 38, in append_counter
if isinstance(counter.name, counters.CounterName):
AttributeError: 'module' object has no attribute 'CounterName'
我嘗試過的事情:
以上所有均未導致成功,所有這些都引發了與以下相同的錯誤:一個工作項嘗試了4次而沒有成功。 每次工人最終失去與服務的聯系。
提前致謝 :)
相同的代碼在DirectRunner的本地計算機上正常工作。 當我們刪除此代碼對PANDAS部分的引用時,該代碼正在執行,沒有任何問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.