简体   繁体   English

Google PubSub和来自TOPIC的重复消息

[英]Google PubSub and duplicated messages from the TOPIC

How to prevent duplicated msg from happening in Google Cloud PubSub? 如何防止重复的味精在Google Cloud PubSub中发生?

Say, I have a code that handles the msg that it is subscribed for. 说,我有一个代码来处理它所订阅的味精。

Say, I have 2 nodes with the same Service that has this code. 说,我有2个具有相同服务且具有此代码的节点。

Once one has received the msg but not yet acknowledged it, another node will receive the same message. 一旦收到了消息但尚未确认,则另一节点将收到相同的消息。 And this is where there's the problem that we have two duplicated msgs . 这就是我们有两个重复的msg的问题所在。

void messageReceiver(PubsubMessage pubsubMessage, AckReplyConsumer ackReply) {

        submitHandler.handle(toMessage(pubsubMessage))
                .doOnSuccess((response) -> {
                    log.info("Acknowledging the successfully processed message id: {}, response {}", pubsubMessage.getMessageId(), response);
                    ackReply.ack();  // <---- acknowledged
                })
                .doOnError((e) -> {
                    log.error("Not acknowledging due to an exception", e);
                    ackReply.nack();
                })
                .doOnTerminate(span::finish)
                .subscribe();
    }

What is the solution for this? 有什么解决方案? Is it normal behaviour? 这是正常行为吗?

Google Cloud Pub/Sub uses "At-Least-Once" delivery. Google Cloud Pub / Sub使用“至少一次”交付。 From the docs : 文档

Typically, Cloud Pub/Sub delivers each message once and in the order in which it was published. 通常,Cloud Pub / Sub按发布的顺序一次传递每个消息。 However, messages may sometimes be delivered out of order or more than once. 但是,有时邮件可能会乱序发送或不止一次发送。 In general, accommodating more-than-once delivery requires your subscriber to be idempotent when processing messages. 通常,要容纳多于一次的传递,则要求您的订户在处理消息时必须是幂等的

This means it guarantees it will deliver the message 1:N times, so you can potentially get the message multiple times if you don't pipe it through something else that deduplicates it first. 这意味着它保证它将以1:N的速度传递消息,因此,如果您不通过其他方式对其进行重复数据消除的管道,则可以多次获得该消息。 There isn't a setting you can define to guarantee exactly once delivery. 您无法定义任何设置来保证一次交货。 The docs do reference you can get the behavior you desire using Cloud Dataflow's PubSubIO , but that solution appears to be deprecated : 该文档确实引用了您可以使用Cloud Dataflow的PubSubIO获得所需的行为,但是该解决方案似乎已被弃用

You can achieve exactly once processing of Cloud Pub/Sub message streams using Cloud Dataflow PubsubIO . 您可以使用Cloud Dataflow PubsubIO对Cloud Pub / Sub消息流进行一次处理。 PubsubIO de-duplicates messages on custom message identifiers or those assigned by Cloud Pub/Sub. PubsubIO对自定义消息标识符或由Cloud Pub / Sub分配的消息进行重复数据删除。

Saying all of this, I've never actually seen Google Cloud Pub/Sub send a message twice. 说了这么多,我实际上从未见过Google Cloud Pub / Sub两次发送消息。 Are you sure that's really the problem you're having, or is the message being reissued because you are not acknowledging the message within the Acknowledgement Deadline (as you stated above, this defaults to 10 seconds). 您确定这确实是您遇到的问题,还是因为您未在“确认截止日期”内确认消息而重新发布了消息(如上所述,默认值为10秒)。 If you don't acknowledge it, it will get reissued. 如果您不同意,它将重新发出。 From the docs (emphasis mine) : 文档 (重点是我的)

A subscription is created for a single topic. 为单个主题创建一个订阅。 It has several properties that can be set at creation time or updated later, including: 它具有几个可以在创建时设置或以后更新的属性,包括:

  • An acknowledgment deadline : If your code doesn't acknowledge the message before the deadline, the message is sent again. 确认期限如果您的代码在期限之前未确认消息,则会再次发送该消息。 The default is 10 seconds. 默认值为10秒。 The maximum custom deadline you can specify is 600 seconds (10 minutes). 您可以指定的最大自定义截止期限为600秒(10分钟)。

If that's the situation, just acknowledge your messages within the deadline and you won't see these duplicates as often. 如果是这种情况,只需在截止日期之前确认您的消息,您就不会经常看到这些重复项。

You can use Redis from Memorystore in order to deduplicate messages. 您可以使用Memorystore中的Redis来重复删除邮件。 Your publisher should add trace iD to the message body just before publishing it to PubSub. 您的发布者应在将其发布到PubSub之前将跟踪iD添加到消息正文中。 On the other side client (subscriber) should check if the trace ID is in the cache - skip the message. 另一方面,客户端(订户)应检查跟踪ID是否在缓存中-跳过该消息。 If there is no such message - process the message and add trace ID to cache with 7-8 days expiry time (PubSub deadline is 7 days). 如果没有这样的消息,请处理该消息,并在7-8天的到期时间(PubSub截止日期为7天)中添加跟踪ID到缓存。 In such a simple way You can grant the correct messages received. 您可以通过这种简单的方式授予收到的正确消息。

All messages in a given topic have a unique messageID field: 给定主题中的所有消息都具有唯一的messageID字段:

ID of this message, assigned by the server when the message is published. 该消息的ID,由服务器在发布消息时分配。 Guaranteed to be unique within the topic. 保证在主题内是唯一的。 This value may be read by a subscriber that receives a PubsubMessage via a subscriptions.pull call or a push delivery. 该值可由订阅者读取,该订阅者通过subscriptions.pull调用或推送传递接收PubsubMessage。 It must not be populated by the publisher in a topics.publish call. 发布者不得在topic.publish调用中填充它。

You can use it to deduplicate incoming messages. 您可以使用它来重复删除传入的消息。 No need to manually assigning ID. 无需手动分配ID。

It is a bit harder in distributed systems (eg multiple instances of consumers for a given subscription). 在分布式系统中(例如,给定订阅的使用者的多个实例),这有点困难。 You would need a global synchronization mechanism, the simplest would be to setup database (eg Redis) and use it to keep processed messages IDs. 您将需要一个全局同步机制,最简单的方法是设置数据库(例如Redis)并使用它来保留已处理的消息ID。

You should take a look at Replaying and discarding messages which describes how to configure message retention. 您应该查看重放和丢弃消息 ,其中描述了如何配置消息保留。

There are two properties of subscription: 订阅有两个属性:

  • retain_acked_messages - keep acknowledge messages, retain_acked_messages保留确认消息,
  • message_retention_duration - how long to keep messages. message_retention_duration保留邮件的时间。

If you do not plan to rewind your subscription to a past point in time, eg if you do not plan to reprocess messages or have bugs forcing you to reset your subscription you can set retain_acked_messages=false and message_retention_duration='3600s' . 如果您不打算将订阅倒带到过去的某个时间点,例如,如果您不打算重新处理消息或存在迫使您重置订阅的retain_acked_messages=false ,则可以设置retain_acked_messages=falsemessage_retention_duration='3600s' This will allow you to keep only last hour message IDs. 这将使您仅保留最后一小时的消息ID。

Bear in mind that PubSub message also have publish_time so you don't need to add it in your message's data. 请记住,PubSub消息也具有publish_time因此您无需将其添加到消息的数据中。 It can be used with message_id . 它可以与message_id一起使用。 Both of these are set by a PubSub server when it receives a message. 这两个都是由PubSub服务器在收到消息时设置的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM