简体   繁体   中英

Read Pubsub message to read from Bigquery

I would like to read a BigQuery table based on the content of a pubsub message

subscription = 'my_subscription'

tablename = (p
              | 'receive msg from PubSub' >> beam.io.ReadFromPubSub(subscription=subscription)
              | "convert msg to dict" >> beam.Map(lambda x: json.loads(x))
              | "extract tablename" >> beam.Map(lambda x: x['tablename'])
             )
(p
  | 'Read from BQ Table' >> beam.io.ReadFromBigQuery(table=tablename)
  | #more transformations...     )

Deploying this to a template_metadata file yields the following error message:

"TypeError: expected string or buffer"

The readFromPubSub IO returns a PubSubMessage object.

If you want to retrieve the data inside the PubSubMessage object, you have to change your code as follow:

subscription = 'my_subscription'

tablename = (p
              | 'receive msg from PubSub' >> beam.io.ReadFromPubSub(subscription=subscription)
              | "convert msg to dict" >> beam.Map(lambda msg: json.loads(msg.data.decode()))
              | "extract tablename" >> beam.Map(lambda x: x['tablename'])
             )
(p
  | 'Read from BQ Table' >> beam.io.ReadFromBigQuery(table=tablename)
  | #more transformations...     )

I changed the transformation from PubSubMessage to Dict : json.loads(msg.data.decode())

The msg.data.decode allows to retrieve the data as str , then the json.loads will convert it correctly to Dict .

Your last transformation should normally work well and you could retrieve the tablename from the Dict .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM