[英]Apache Beam in python: How to reuse exactly the same transform on another PCollection
Several of my PCollections (that come from different sources) have to be decoded in the same way.我的几个 PCollections(来自不同来源)必须以相同的方式解码。
hits = (msgs | 'Parse' >> beam.Map(parse)
| 'Decode' >> beam.Map(decode_hit))
Then:然后:
dummy_hits = (dummy_msgs | 'Parse' >> beam.Map(parse)
| 'Decode' >> beam.Map(decode_hit))
It would be really nice if I could reuse the transforms thanks to the names I've given them earlier.多亏了我之前给它们的名字,如果我可以重用这些转换,那就太好了。 I naively tried this:我天真地尝试了这个:
dummy_hits = (dummy_msgs | 'Parse'
| 'Decode')
But my pipeline won't build.但我的管道不会建立。 (TypeError: Expected a PTransform object, got Parse). (类型错误:需要一个 PTransform 对象,得到解析)。
I thought it would be possible as documentation for the pipeline module states: "If same transform instance needs to be applied then the right shift operator should be used to designate new names (eg input | "label" >> my_tranform
)"我认为有可能作为管道模块的文档说明:“如果需要应用相同的转换实例,则应使用右移运算符来指定新名称(例如input | "label" >> my_tranform
)”
What's the way for doing this?这样做的方法是什么? Is this only possible?只有这可能吗?
Names have to be unique, but since your sequence of steps is the same maybe you want to create a composite transform like this名称必须是唯一的,但由于您的步骤顺序相同,因此您可能想要创建这样的复合变换
https://beam.apache.org/get-started/wordcount-example/#creating-composite-transforms https://beam.apache.org/get-started/wordcount-example/#creating-composite-transforms
So do this:所以这样做:
class ParseDecode(beam.PTransform):
def expand(self, pcoll):
return (pcoll
| 'Parse' >> beam.Map(parse)
| 'Decode' >> beam.Map(decode_hit))
So that you can do this:这样你就可以做到这一点:
hits = (msgs | 'Parse msgs' >> ParseDecode()
and then this:然后这个:
dummy_hits = (dummy_msgs | 'Parse dummy msgs' >> ParseDecode()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.