繁体   English   中英

Apache Beam DataFlow Runner 在启动期间写入数据存储/强制节流时抛出错误

[英]Apache Beam DataFlow Runner throwing error for Write to Data-store/Enforce throttling during ramp-up

我最近将 GCP 数据流上的管道从版本 2.27 更新到版本 2.34 由于以下错误,使用WriteToDataStore连接器的管道失败:

Error message from worker: Traceback (most recent call last): 
File "apache_beam/runners/common.py", line 1233, in apache_beam.runners.common.DoFnRunner.process 
File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process 
File "apache_beam/runners/common.py", line 1369, in apache_beam.runners.common._OutputProcessor.process_outputs 
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn.py", line 83, in process max_ops_budget = self._calc_max_ops_budget(self._first_instant, instant) 
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn.py", line 74, in _calc_max_ops_budget max_ops_budget = int(self._BASE_BUDGET / self._num_workers * (1.5**growth)) OverflowError: (34, 'Numerical result out of range') 

During handling of the above exception, another exception occurred: Traceback (most recent call last): 
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 651, in do_work work_executor.execute() 
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py", line 213, in execute op.start() 
File "dataflow_worker/shuffle_operations.py", line 63, in "dataflow_worker/shuffle_operations.py", line 261, in 
...contd error message "apache_beam/runners/worker/operations.py", line 714, in apache_beam.runners.worker.operations.DoOperation.process 
File "apache_beam/runners/common.py", line 1235, in apache_beam.runners.common.DoFnRunner.process 
File "apache_beam/runners/common.py", line 1316, in apache_beam.runners.common.DoFnRunner._reraise_augmented 
File "apache_beam/runners/common.py", line 1233, in apache_beam.runners.common.DoFnRunner.process 
File "apache_beam/runners/common.py", line 571, in apache_beam.runners.common.SimpleInvoker.invoke_process 
File "apache_beam/runners/common.py", line 1369, in apache_beam.runners.common._OutputProcessor.process_outputs 
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn.py", line 83, in process max_ops_budget = self._calc_max_ops_budget(self._first_instant, instant) 
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/rampup_throttling_fn.py", line 74, in _calc_max_ops_budget max_ops_budget = int(self._BASE_BUDGET / self._num_workers * (1.5**growth)) 
RuntimeError: OverflowError: (34, 'Numerical result out of range') [while running 'Write to Data-store/Enforce throttling during ramp-up']

到目前为止,这些工作都很好。

我检查了 2.32 版中添加的 apache-beam python sdk 更新,用于向 DatastoreIO 连接器添加加速[BEAM-12272] Python - Backport FirestoreIO 连接器向 DatastoreIO 连接器的加速 - ASF JIRA

这为连接器throttle_rampuphint_num_workers引入了两个新参数,如apache_beam.io.gcp.datastore.v1new.datastoreio 模块 — Apache Beam 文档中所述

我没有对参数值进行任何更改。 我需要帮助来理解参数的含义,特别是hint_num_workers以及为什么默认值失败。

但是设置throttle_rampup=False作业运行正常。 如果我想使用最佳实践 go 并使用throttle_rampup=True ,如何使作业成功运行。

提前致谢。

这是rampup_throttling_fn.py中的已知问题,由max_ops_budget变量的数据类型引起,导致溢出。 您可以在 beam GitHub 上查看问题报告 一个修复已经合并到 master 中。 因此,更新较新版本应该可以解决问题(或在版本不可用时降级)

关于参数的含义,除了文档描述外没有太多内容:

throttle_rampup – 是否强制逐步加速。

即,您有许多工人可能会因负载而增加,这可能是渐进的或突然的。

hint_num_workers – 预期工作人员数量的提示,用于在加速限制期间估计适当的限制。

即预期的最终工人数量,然后 function 可以计算出在一个时间段内可以创建多少工人,以对斜坡产生渐进或突然的影响。

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM