简体   繁体   中英

GCP: Where to schedule PubSub subscriber which writes to BigQuery

I need to write to BigQuery from PubSub in Python. I tested some async subscriber code and it works fine. But this needs to run continuously and I am not 100% sure where to schedule this. I have been using Cloud Composer (Airflow) but it doesn't look like an ideal fit and it looks like Dataflow is the one recommended by GCP? Is that correct?

Or is there a way to run this from Cloud Composer reliably? I think I can run it once but I want to make sure it runs again in case it fails for some reason.

The two best ways to accomplish this goal would be by either using Cloud Functions or by using Cloud Dataflow . For Cloud Functions, you would set up a trigger on the Pub/Sub topic and then in your code write to BigQuery. It would look similar to the tutorial on streaming from Cloud Storage to BigQuery , except the input would be Pub/Sub messages. For Dataflow, you could use one of the Google-provided, open-source templates to write Pub/Sub messages to BigQuery .

Cloud Dataflow would probably be better suited if your throughput is high (thousands of messages per second) and consistent. If you have low or infrequent throughput, Cloud Functions would likely be a better fit. Either of these solutions would run constantly and write the messages to BigQuery when available.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM