python中的Apache Beam：如何在另一个PCollection上重用完全相同的转换

Question

Several of my PCollections (that come from different sources) have to be decoded in the same way.我的几个 PCollections（来自不同来源）必须以相同的方式解码。

hits = (msgs | 'Parse' >> beam.Map(parse)
    | 'Decode' >> beam.Map(decode_hit))

Then:然后：

dummy_hits = (dummy_msgs | 'Parse' >> beam.Map(parse)
    | 'Decode' >> beam.Map(decode_hit))

It would be really nice if I could reuse the transforms thanks to the names I've given them earlier.多亏了我之前给它们的名字，如果我可以重用这些转换，那就太好了。 I naively tried this:我天真地尝试了这个：

dummy_hits = (dummy_msgs | 'Parse'
    | 'Decode')

But my pipeline won't build.但我的管道不会建立。 (TypeError: Expected a PTransform object, got Parse). （类型错误：需要一个 PTransform 对象，得到解析）。

I thought it would be possible as documentation for the pipeline module states: "If same transform instance needs to be applied then the right shift operator should be used to designate new names (eg input | "label" >> my_tranform )"我认为有可能作为管道模块的文档说明：“如果需要应用相同的转换实例，则应使用右移运算符来指定新名称（例如input | "label" >> my_tranform ）”

What's the way for doing this?这样做的方法是什么？ Is this only possible?只有这可能吗？

Answer 1

Names have to be unique, but since your sequence of steps is the same maybe you want to create a composite transform like this名称必须是唯一的，但由于您的步骤顺序相同，因此您可能想要创建这样的复合变换

https://beam.apache.org/get-started/wordcount-example/#creating-composite-transforms https://beam.apache.org/get-started/wordcount-example/#creating-composite-transforms

So do this:所以这样做：

class ParseDecode(beam.PTransform):

  def expand(self, pcoll):
    return (pcoll
            | 'Parse' >> beam.Map(parse)
            | 'Decode' >> beam.Map(decode_hit))

So that you can do this:这样你就可以做到这一点：

hits = (msgs | 'Parse msgs' >> ParseDecode()

and then this:然后这个：

dummy_hits = (dummy_msgs | 'Parse dummy msgs' >> ParseDecode()

python中的Apache Beam：如何在另一个PCollection上重用完全相同的转换

问题描述

1 个解决方案

解决方案1
5 已采纳 2018-10-27 00:19:38

python中的Apache Beam：如何在另一个PCollection上重用完全相同的转换

问题描述

1 个解决方案

解决方案1 5 已采纳 2018-10-27 00:19:38

解决方案1
5 已采纳 2018-10-27 00:19:38