![](/img/trans.png)
[英]Encounter assertionError: Job did not reach to a terminal state after waiting indefinitely with Beam/Dataflow
[英]Dataflow job retains old error state after updating
當我使用 DataflowRunner 提交我的 Dataflow 作業時(我使用帶有 Pub/Sub 源的流式作業),我在定義 BQ 表名的執行參數時出錯(假設錯誤的表名是 project-A)和這項工作引發了一些錯誤。 然后我使用--update 命令用正確的表名更新了作業,但作業又拋出了一些錯誤,即錯誤告訴我我仍在使用project-A 作為BQ 表名。
簡而言之,這就是我正在做的樣子:
python main.py \
--job_name=dataflow-job1 \
--runner=DataflowRunner \
--staging_location=gs://project-B-bucket/staging \
--temp_location=gs://project-B-bucket/temp \
--dataset=project-A:table-A
{
"error": {
"code": 403,
"message": "Access Denied: Dataset project-A:table-A: User does not have bigquery.datasets.get permission for dataset project-A:table-A.",
"errors": [
{
"message": "Access Denied: Dataset project-A:table-A: User does not have bigquery.datasets.get permission for dataset project-A:table-A.",
"domain": "global",
"reason": "accessDenied"
}
],
"status": "PERMISSION_DENIED"
}
}
python main.py \
--job_name=dataflow-job1 \
--runner=DataflowRunner \
--staging_location=gs://project-B-bucket/staging \
--temp_location=gs://project-B-bucket/temp \
--dataset=project-B:table-B \
--update
為什么舊的 state 似乎仍然保留? 我認為如果 Dataflow 在作業中檢測到錯誤,它將不會處理管道,並且 Pub/Sub 不會被確認並且管道將重新啟動。
2020-12-08 更新:這就是我傳遞參數 arguments 的方式:
class MyOptions(PipelineOptions):
@classmethod
def _add_argparse_args(cls, parser):
parser.add_argument('--dataset')
...
class WriteToBigQuery(beam.PTransform):
def __init__(self, name):
self.name = name
def expand(self, pcoll):
return (pcoll
| 'WriteBQ' >> beam.io.WriteToBigQuery(
'{0}.my_table'.format(self.name),
create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND))
def run(argv=None, save_main_session=True):
pipeline_options = PipelineOptions(flags=argv)
pipeline_options.view_as(StandardOptions).streaming = True
my_args = pipeline_options.view_as(MyOptions)
...
with beam.Pipeline(options=pipeline_options) as p:
...
# I wrapped the BQ write component inside a PTransform class
output | 'WriteBQ' >> WriteToBigQuery(my_args.dataset)
您無法在更新數據流流作業時更改管道參數。 您只能更新管道的轉換。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.