简体   繁体   English

apache 光束数据流管道(python)中的步骤的 If 语句

[英]If statement for steps in a apache beam dataflow pipeline (python)

I was wondering if it is possible to have an if statement in a beam pipeline for enacting a different transform based on different scenarios.我想知道是否可以在光束管道中使用 if 语句来根据不同的场景执行不同的转换。 For example:例如:

1) Make one of the input arguments backfill/regular and then based on that input argument it would decide whether to start with 1)使输入 arguments 回填/常规之一,然后根据该输入参数决定是否开始

(p 
            | fileio.MatchFiles(known_args.input_bucket)
            | fileio.ReadMatches()
            | beam.Map(lambda file: file.metadata.path, json.loads(file.read_utf8())))

or或者

p | beam.io.ReadFromText(known_args.input_file_name)

2) If the file name contains a certain country name (ie USA), call TransformUSA(beam.DoFn) , else call TransformAllCountries(beam.DoFn) 2)如果文件名包含某个国家名称(即美国),调用TransformUSA(beam.DoFn) ,否则调用TransformAllCountries(beam.DoFn)

Sorry if this isn't a great question, i haven't seen this anywhere else and am trying to make my code modular instead of having separate pipelines抱歉,如果这不是一个好问题,我在其他任何地方都没有看到过,并且正在尝试使我的代码模块化而不是使用单独的管道

It is completely possible to have an if statement for your pipeline, but remember that things should be known at pipeline construction time.完全有可能为您的管道使用 if 语句,但请记住,在管道构建时应该知道这些事情。 So, for instance:因此,例如:

with beam.Pipeline(...) as p:
  if known_args.backfill == True:
    input_pcoll = (p
                   | fileio.MatchFiles(known_args.input_bucket)
                   | fileio.ReadMatches()
                   | beam.Map(lambda file: file.read_utf8().split('\n'))
  else:
    input_pcoll = (p
                   | beam.io.ReadFromText(known_args.input_file_name)

And then, for your TransformUSA , you would do something like:然后,对于您的TransformUSA ,您将执行以下操作:

if 'USA' in known_args.input_file_name:
  next_pcoll = input_pcoll | beam.ParDo(TransformUSA())
else:
  next_pcoll = input_pcoll | beam.ParDo(TransformAllCountries())

Does that make sense?那有意义吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache 光束侧输入在流数据流管道中不工作 Python SDK - Apache Beam Side Inputs not working in Streaming Dataflow Pipeline with Python SDK 条件语句Python Apache Beam管道 - Conditional statement Python Apache Beam pipeline Apache Beam 数据流管道使用 Bazel 构建和部署 - Apache Beam Dataflow pipeline build and deploy with Bazel 预计 ETA 在使用 python 的 apache beam GCP 数据流管道中使用管道 I/O 和运行时参数? - Expected ETA to avail Pipeline I/O and runtime parameters in apache beam GCP dataflow pipeline using python? 什么是为Google Cloud Dataflow部署和管理Python SDK Apache Beam管道执行的便捷方法 - What is a convenient way to deploy and manage execution of a Python SDK Apache Beam pipeline for Google cloud Dataflow Apache Beam / Google数据流Python流自动缩放 - Apache Beam/Google dataflow Python streaming autoscaling Dataflow中的自定义Apache Beam Python版本 - Custom Apache Beam Python version in Dataflow 在 Python 数据流/Apache Beam 上启动 CloudSQL 代理 - Start CloudSQL Proxy on Python Dataflow / Apache Beam 在 GCP 数据流上使用 python apache 光束中的 scipy - Using scipy in python apache beam on GCP Dataflow Apache Beam 数据流管道 - 具有高墙时间的简单 DoFn - Apache Beam Dataflow pipeline - Simple DoFn with high Wall time
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM