简体   繁体   中英

Dataflow send PubSub message after BigQuery write completion

I have a Dataflow job that transforms data and writes out to BigQuery (batch job). Following the completion of the write operation I want to send a message to PubSub which will trigger further processing of the data in BigQuery. I have seen a few older questions/answers that hint at this being possible but only on streaming jobs:

I'm wondering if this is supported in any way for batch write jobs now? I cant use apache airflow to orchestrate all this unfortunately so sending a PubSub message seemed like the easiest way.

The conception of Beam implies the impossibility to do what you want. Indeed, you write a PCollection to BigQuery. By definition, a PCollection is a bounded or unbounded collection . How can you trigger something after a unbounded collection? When do you know that you have reach the end?

So, you have different way to achieve this. In your code, you can wait the pipeline completion and then publish a PubSub message.

Personally, I prefer to base this on the logs; When the the dataflow job is finish, I get the log of the end of job and I sink it into PubSub . That's decorrelated the pipeline code and the next step.

You can also have a look to Workflow . It's not really mature yet, but very promising for simple workflow like yours.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM