I am reading beam documentation and some of stackoverflow questions/ answers in order to understand how would i write a pubsub message to bigquery. As of now, I have working example of getting protobuf
messages and able to decode
them. the code looks like this
(p
| 'ReadData' >> apache_beam.io.ReadFromPubSub(topic=known_args.input_topic, with_attributes=True)
| 'ParsePubsubMessage' >> apache_beam.Map(parse_pubsubmessage)
)
Eventually, what i want to do is write decoded pub-sub message to bigquery. all attribtues (and decoded byte data) will have one-to-one column mapping.
So what is confusing me is what should my parse_pubsubmessage
return. As of now, It is returning a custom class which has all fields ie,
class DecodedPubsubMessage:
def __init__(self, attr, event):
self.attribute_one = attr['attribute_one']
self.attribute_two = attr['attribute_two']
self.order_id = event.order.order_id
self.sku = event.item.item_id
self.triggered_at = event.timestamp
self.status = event.order.status
Is this correct approach to do this dataflow? What i was thinking that i will use this returned value to write to bigquery but due to advance python feature, i am unable to understand how to. Here is a reference example that i was looking at. From this example, I am not sure how would i do the lambda
map on my returned object to write to bigquery.
Your class must inherit from DoFn and overload the "process" method and not do the transformation on init
and after the transformation you can use the "return [obj]" or "yield obj" to return the desired output PCollection
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.