[英]If statement for steps in a apache beam dataflow pipeline (python)
I was wondering if it is possible to have an if statement in a beam pipeline for enacting a different transform based on different scenarios.我想知道是否可以在光束管道中使用 if 语句来根据不同的场景执行不同的转换。 For example:
例如:
1) Make one of the input arguments backfill/regular and then based on that input argument it would decide whether to start with 1)使输入 arguments 回填/常规之一,然后根据该输入参数决定是否开始
(p
| fileio.MatchFiles(known_args.input_bucket)
| fileio.ReadMatches()
| beam.Map(lambda file: file.metadata.path, json.loads(file.read_utf8())))
or或者
p | beam.io.ReadFromText(known_args.input_file_name)
2) If the file name contains a certain country name (ie USA), call TransformUSA(beam.DoFn)
, else call TransformAllCountries(beam.DoFn)
2)如果文件名包含某个国家名称(即美国),调用
TransformUSA(beam.DoFn)
,否则调用TransformAllCountries(beam.DoFn)
Sorry if this isn't a great question, i haven't seen this anywhere else and am trying to make my code modular instead of having separate pipelines抱歉,如果这不是一个好问题,我在其他任何地方都没有看到过,并且正在尝试使我的代码模块化而不是使用单独的管道
It is completely possible to have an if statement for your pipeline, but remember that things should be known at pipeline construction time.完全有可能为您的管道使用 if 语句,但请记住,在管道构建时应该知道这些事情。 So, for instance:
因此,例如:
with beam.Pipeline(...) as p:
if known_args.backfill == True:
input_pcoll = (p
| fileio.MatchFiles(known_args.input_bucket)
| fileio.ReadMatches()
| beam.Map(lambda file: file.read_utf8().split('\n'))
else:
input_pcoll = (p
| beam.io.ReadFromText(known_args.input_file_name)
And then, for your TransformUSA
, you would do something like:然后,对于您的
TransformUSA
,您将执行以下操作:
if 'USA' in known_args.input_file_name:
next_pcoll = input_pcoll | beam.ParDo(TransformUSA())
else:
next_pcoll = input_pcoll | beam.ParDo(TransformAllCountries())
Does that make sense?那有意义吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.