简体   繁体   English

python中的Apache Beam:如何在另一个PCollection上重用完全相同的转换

[英]Apache Beam in python: How to reuse exactly the same transform on another PCollection

Several of my PCollections (that come from different sources) have to be decoded in the same way.我的几个 PCollections(来自不同来源)必须以相同的方式解码。

hits = (msgs | 'Parse' >> beam.Map(parse)
    | 'Decode' >> beam.Map(decode_hit))

Then:然后:

dummy_hits = (dummy_msgs | 'Parse' >> beam.Map(parse)
    | 'Decode' >> beam.Map(decode_hit))

It would be really nice if I could reuse the transforms thanks to the names I've given them earlier.多亏了我之前给它们的名字,如果我可以重用这些转换,那就太好了。 I naively tried this:我天真地尝试了这个:

dummy_hits = (dummy_msgs | 'Parse'
    | 'Decode')

But my pipeline won't build.但我的管道不会建立。 (TypeError: Expected a PTransform object, got Parse). (类型错误:需要一个 PTransform 对象,得到解析)。

I thought it would be possible as documentation for the pipeline module states: "If same transform instance needs to be applied then the right shift operator should be used to designate new names (eg input | "label" >> my_tranform )"我认为有可能作为管道模块的文档说明:“如果需要应用相同的转换实例,则应使用右移运算符来指定新名称(例如input | "label" >> my_tranform )”

What's the way for doing this?这样做的方法是什么? Is this only possible?只有这可能吗?

Names have to be unique, but since your sequence of steps is the same maybe you want to create a composite transform like this名称必须是唯一的,但由于您的步骤顺序相同,因此您可能想要创建这样的复合变换

https://beam.apache.org/get-started/wordcount-example/#creating-composite-transforms https://beam.apache.org/get-started/wordcount-example/#creating-composite-transforms

So do this:所以这样做:

class ParseDecode(beam.PTransform):

  def expand(self, pcoll):
    return (pcoll
            | 'Parse' >> beam.Map(parse)
            | 'Decode' >> beam.Map(decode_hit))

So that you can do this:这样你就可以做到这一点:

hits = (msgs | 'Parse msgs' >> ParseDecode()

and then this:然后这个:

dummy_hits = (dummy_msgs | 'Parse dummy msgs' >> ParseDecode()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Apache Beam在Python中将有界pcollection转换为无界? - How to transform bounded pcollection to unbounded in Python with Apache Beam? 如何从PCollection Apache Beam Python创建N个元素组 - How to create groups of N elements from a PCollection Apache Beam Python Apache Beam-Python:如何通过累积获取PCollection的前10个元素? - Apache Beam - Python : How to get the top 10 elements of a PCollection with Accumulation? 如何计算Apache Beam中PCollection的元素数量 - How to calculate the number of elements of a PCollection in Apache beam Apache 光束列表到 PCollection - Apache beam list to PCollection 如何通过Apache Beam(Python)中的键以流模式在静态查找表上加入PCollection - How to join PCollection in streaming mode on a static lookup table by key in Apache Beam (Python) 我可以使用 python 对 Apache beam PCollection 中的项目进行排序吗? - Can I sort the items in an Apache beam PCollection using python? 使用python的Apache Beam中PCollection内几个字段的最大值和最小值 - Max and Min for several fields inside PCollection in apache beam with python Select PCollection 中的一些列(Apache Beam、Python) - Select some columns from PCollection (Apache Beam, Python) 如何在单元测试时正确测试 pcollection 长度 Apache Beam - How to properly test pcollection length when unit testing Apache Beam
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM