简体   繁体   English

从 apache beam pcollection 返回什么以写入 bigquery

[英]What to return from apache beam pcollection to write to bigquery

I am reading beam documentation and some of stackoverflow questions/ answers in order to understand how would i write a pubsub message to bigquery.我正在阅读 Beam 文档和一些 stackoverflow 问题/ 答案,以了解我将如何向 bigquery 写入 pubsub 消息。 As of now, I have working example of getting protobuf messages and able to decode them.截至目前,我有获取protobuf消息并能够对其进行decode工作示例。 the code looks like this代码看起来像这样

(p
 | 'ReadData' >> apache_beam.io.ReadFromPubSub(topic=known_args.input_topic, with_attributes=True)
 | 'ParsePubsubMessage' >> apache_beam.Map(parse_pubsubmessage)
 )

Eventually, what i want to do is write decoded pub-sub message to bigquery.最终,我想要做的是将解码的发布订阅消息写入 bigquery。 all attribtues (and decoded byte data) will have one-to-one column mapping.所有属性(和解码的字节数据)都将具有一对一的列映射。

So what is confusing me is what should my parse_pubsubmessage return.所以让我困惑的是我的parse_pubsubmessage应该返回什么。 As of now, It is returning a custom class which has all fields ie,截至目前,它正在返回一个包含所有字段的自定义类,即,

class DecodedPubsubMessage:
    def __init__(self, attr, event):
        self.attribute_one = attr['attribute_one']
        self.attribute_two = attr['attribute_two']

        self.order_id = event.order.order_id
        self.sku = event.item.item_id
        self.triggered_at = event.timestamp
        self.status = event.order.status

Is this correct approach to do this dataflow?这是执行此数据流的正确方法吗? What i was thinking that i will use this returned value to write to bigquery but due to advance python feature, i am unable to understand how to.我在想我将使用这个返回值写入 bigquery 但由于先进的 python 功能,我无法理解如何。 Here is a reference example that i was looking at.这是我正在查看的参考示例 From this example, I am not sure how would i do the lambda map on my returned object to write to bigquery.从这个例子中,我不确定我将如何在返回的对象上执行lambda映射以写入 bigquery。

Your class must inherit from DoFn and overload the "process" method and not do the transformation on init您的类必须从 DoFn 继承并重载“process”方法,而不是在init上进行转换

and after the transformation you can use the "return [obj]" or "yield obj" to return the desired output PCollection转换后,您可以使用“return [obj]”或“yield obj”来返回所需的输出 PCollection

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM