简体   繁体   中英

GCP Dataflow Pub/Sub to Text Files on Cloud Storage

I'm referring to Google provided dataflow Pub/Sub to Text Files on Cloud Storage .

The messages once read by dataflow don't get acknowledged. How do we ensure that messages once consumed by dataflow is acknowledged and is not available to any other subscriber?

To reproduce and test it, create 2 Jobs from the same template and you would see that both the job processing the same message.

Firstly, the messages are correctly acknowledge.

Then, to demonstrate this, and how your reproduction is wrong, I would like to focus on PubSub behavior.

  • One or several publishers publish messages in a topic
  • One or several subscription can be created on a topic
  • All the messages published in a topic are copied in each subscription
  • Subscription can have one or several subscribers.
  • Each subscriber receives a subset of the messages in the subscription.

Go back to your template. You specify only a topic, not a subscription. When your dataflow is running, go to the subscription, you will be able to see a new subscription created.

-> When you start a PubSub to TextFiles template a subscription is automatically created on the provided topic

Therefore, if you create 2 jobs, you will have 2 subscribtions, and thus, all the messages published in the topic are copied in each subscription. That's why you will have 2 times the same messages.

Now, keep your job up and go to the subscription. Here you can see the number of message in the queue and the unacked messages. You should see 0 in the unacked message graph.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM