简体   繁体   English

Google Cloud Pubsub数据丢失了

[英]Google Cloud Pubsub Data lost

I'm experiencing a problem with GCP pubsub where a small percentage of data was lost when publishing thousands of messages in couple seconds. 我遇到了GCP pubsub的问题,在几秒钟内发布数千封邮件时,一小部分数据丢失了。

I'm logging both message_id from pubsub and a session_id unique to each message on both the publishing end as well as the receiving end, and the result I'm seeing is that some message on the receiving end has same session_id , but different message_id . 我正在发布pubsub上的message_id和发布端以及接收端的每条消息都是唯一的session_id ,我看到的结果是接收端的某些消息具有相同的session_id ,但是message_id不同。 Also, some messages were missing. 此外,一些消息丢失了。

For example, in one test I send 5,000 messages to pubsub, and exactly 5,000 messages were received, with 8 messages lost. 例如,在一次测试中,我向pubsub发送了5,000条消息,并且收到了5,000条消息,其中8条消息丢失。 The log lost messages look like this: 日志丢失的消息如下所示:

MISSING sessionId:sessionId: 731 (missing in log from pull request, but present in log from Flask API)

messageId FOUND: messageId:108562396466545

API: 200 **** sessionId: 731, messageId:108562396466545 ******(Log from Flask API)

Pubsub: sessionId: 730, messageId:108562396466545(Log from pull request)

And the duplicates looks like: 副本看起来像:

======= Duplicates FOUND on sessionId: 730=======

sessionId: 730, messageId:108562396466545

sessionId: 730, messageId:108561339282318

(both are logs from pull request)

All missing data and duplicates look like this. 所有缺失的数据和重复数据都是这样的。

From the above example, it is clear that some messages has taken the message_id of another message, and has been sent twice with two different message_id s. 从上面的例子中可以看出,一些消息已经获取了另一条消息的message_id ,并且已经使用两个不同的message_id发送了两次。

I wonder if anyone would help me figure out what is going on? 我想知道是否有人会帮我弄清楚发生了什么? Thanks in advance. 提前致谢。

Code

I have an API sending message to pubsub, which looks like this: 我有一个API向pubsub发送消息,如下所示:

from flask import Flask, request, jsonify, render_template
from flask_cors import CORS, cross_origin
import simplejson as json
from google.cloud import pubsub
from functools import wraps
import re
import json


app = Flask(__name__)
ps = pubsub.Client()

...

@app.route('/publish', methods=['POST'])
@cross_origin()
@json_validator
def publish_test_topic():
    pubsub_topic = 'test_topic'
    data = request.data

    topic = ps.topic(pubsub_topic)

    event = json.loads(data)

    messageId = topic.publish(data)
    return '200 **** sessionId: ' + str(event["sessionId"]) + ", messageId:" + messageId + " ******"

And this is the code I used to read from pubsub: 这是我以前从pubsub读取的代码:

from google.cloud import pubsub import re import json 来自google.cloud导入pubsub导入重新导入json

ps = pubsub.Client()
topic = ps.topic('test-xiu')
sub = topic.subscription('TEST-xiu')

max_messages = 1
stop = False

messages = []

class Message(object):
    """docstring for Message."""
    def __init__(self, sessionId, messageId):
        super(Message, self).__init__()
        self.seesionId = sessionId
        self.messageId = messageId


def pull_all():
    while stop == False:

        m = sub.pull(max_messages = max_messages, return_immediately = False)

        for data in m:
            ack_id = data[0]
            message = data[1]
            messageId = message.message_id
            data = message.data
            event = json.loads(data)
            sessionId = str(event["sessionId"])
            messages.append(Message(sessionId = sessionId, messageId = messageId))

            print '200 **** sessionId: ' + sessionId + ", messageId:" + messageId + " ******"

            sub.acknowledge(ack_ids = [ack_id])

pull_all()

For generating session_id, sending request & logging response from API: 要生成session_id,请从API发送请求和日志记录响应:

// generate trackable sessionId
var sessionId = 0

var increment_session_id = function () {
  sessionId++;
  return sessionId;
}

var generate_data = function () {
  var data = {};
  // data.sessionId = faker.random.uuid();
  data.sessionId = increment_session_id();
  data.user = get_rand(userList);
  data.device = get_rand(deviceList);
  data.visitTime = new Date;
  data.location = get_rand(locationList);
  data.content = get_rand(contentList);

  return data;
}

var sendData = function (url, payload) {
  var request = $.ajax({
    url: url,
    contentType: 'application/json',
    method: 'POST',
    data: JSON.stringify(payload),
    error: function (xhr, status, errorThrown) {
      console.log(xhr, status, errorThrown);
      $('.result').prepend("<pre id='json'>" + JSON.stringify(xhr, null, 2) + "</pre>")
      $('.result').prepend("<div>errorThrown: " + errorThrown + "</div>")
      $('.result').prepend("<div>======FAIL=======</div><div>status: " + status + "</div>")
    }
  }).done(function (xhr) {
    console.log(xhr);
    $('.result').prepend("<div>======SUCCESS=======</div><pre id='json'>" + JSON.stringify(payload, null, 2) + "</pre>")
  })
}

$(submit_button).click(function () {
  var request_num = get_request_num();
  var request_url = get_url();
  for (var i = 0; i < request_num; i++) {
    var data = generate_data();
    var loadData = changeVerb(data, 'load');
    sendData(request_url, loadData);
  }
}) 

UPDATE UPDATE

I made a change on the API, and the issue seems to go away. 我对API进行了更改,问题似乎消失了。 The changes I made was instead of using one pubsub.Client() for all request, I initialized a client for every single request coming in. The new API looks like: 我所做的更改不是对所有请求使用一个pubsub.Client() ,而是为每个请求进入初始化客户端。新的API如下所示:

from flask import Flask, request, jsonify, render_template
from flask_cors import CORS, cross_origin
import simplejson as json
from google.cloud import pubsub
from functools import wraps
import re
import json


app = Flask(__name__)

...

@app.route('/publish', methods=['POST'])
@cross_origin()
@json_validator
def publish_test_topic():

    ps = pubsub.Client()


    pubsub_topic = 'test_topic'
    data = request.data

    topic = ps.topic(pubsub_topic)

    event = json.loads(data)

    messageId = topic.publish(data)
    return '200 **** sessionId: ' + str(event["sessionId"]) + ", messageId:" + messageId + " ******"

Google Cloud Pub/Sub message IDs are unique. Google Cloud Pub / Sub消息ID是唯一的。 It should not be possible for "some messages [to] taken the message_id of another message." “某些消息[以]取得另一条消息的message_id应该是不可能的。” The fact that message ID 108562396466545 was seemingly received means that Pub/Sub did deliver the message to the subscriber and was not lost. 看似收到消息ID 108562396466545的事实意味着Pub / Sub确实将消息传递给订户并且没有丢失。

I recommend you check how your session_id s are generated to ensure that they are indeed unique and that there is exactly one per message. 我建议您检查session_id的生成方式,以确保它们确实是唯一的,并且每条消息只有一条。 Searching for the sessionId in your JSON via a regular expression search seems a little strange. 通过正则表达式搜索在JSON中搜索sessionId似乎有点奇怪。 You would be better off parsing this JSON into an actual object and accessing fields that way. 您最好将此JSON解析为实际对象并以此方式访问字段。

In general, duplicate messages in Cloud Pub/Sub are always possible; 通常,Cloud Pub / Sub中的重复消息始终是可能的; the system guarantees at-least-once delivery. 系统保证至少一次交付。 Those messages can be delivered with the same message ID if the duplication happens on the subscribe side (eg, the ack is not processed in time) or with a different message ID (eg, if the publish of the message is retried after an error like a deadline exceeded). 如果在订阅方发生重复(例如,ack未及时处理)或具有不同的消息ID(例如,如果在发生错误之后重试消息的发布,则可以使用相同的消息ID来传递这些消息)超过截止日期)。

You shouldn't need to create a new client for every publish operation. 您不需要为每个发布操作创建新客户端。 I'm betting that the reason that that "fixed the problem" is because it mitigated a race that exists in the publisher client side. 我敢打赌,“修复问题”的原因是因为它减轻了发布者客户端存在的竞争。 I'm also not convinced that the log line you've shown on the publisher side: 我也不相信您在发布商方面显示的日志行:

API: 200 **** sessionId: 731, messageId:108562396466545 ****** API:200 **** sessionId:731,messageId:108562396466545 ******

corresponds to a successful publish of sessionId 731 by publish_test_topic(). 对应于publish_test_topic()成功发布sessionId 731。 Under what conditions is that log line printed? 在什么条件下打印日志? The code that has been presented so far does not show this. 到目前为止提供的代码没有显示出来。

Talked with some guy from Google, and it seems to be an issue with the Python Client: 与谷歌的一些人交谈,这似乎与Python客户端有关:

The consensus on our side is that there is a thread-safety problem in the current python client. 我们的共识是,当前的python客户端存在线程安全问题。 The client library is being rewritten almost from scratch as we speak, so I don't want to pursue any fixes in the current version. 我们说话时,客户端库几乎从头开始重写,因此我不想在当前版本中进行任何修复。 We expect the new version to become available by end of June. 我们预计新版本将于6月底上市。

Running the current code with thread_safe: false in app.yaml or better yet just instantiating the client in every call should is the work around -- the solution you found. 在app.yaml中使用thread_safe:false运行当前代码或者更好但只是在每个调用中实例化客户端应该是解决方法 - 您找到的解决方案。

For detailed solution, please see the Update in the question 有关详细解决方案,请参阅问题中的更新

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM