获取 Google Cloud PubSub 中单条消息的大小

Question

I have a setup where I am publishing messages to Google Cloud PubSub service.我有一个设置，我将消息发布到Google Cloud PubSub服务。

I wish to get the size of each individual message that I am publishing to PubSub .我希望获得要发布到 PubSub 的每条消息的大小。 So for this, I identified the following approaches (Note: I am using the Python clients for publishing and subscribing, following a line-by-line implementation as presented in their documentation):因此，为此，我确定了以下方法（注意：我使用 Python 客户端进行发布和订阅，遵循文档中介绍的逐行实现）：

View the message count from the Google Cloud Console using the ' Monitoring ' feature使用“监控”功能从 Google Cloud Console 查看消息计数
Create a pull subscription client and view the size using message.size in the callback function for the messages that are being pulled from the requested topic.创建一个请求订阅客户端并使用回调 function 中的message.size查看从请求主题中提取的消息的大小。
Estimate the size of the messages before publishing by converting them to JSON as per the PubSub message schema and using sys.getsizeof()通过根据PubSub 消息模式并使用sys.getsizeof()将消息转换为 JSON 在发布之前估计消息的大小

For a sample message like as follows which I published using a Python publisher client:对于如下示例消息，我使用 Python 发布者客户端发布：

{
  "data": 'Test_message',
  "attributes": {
    'dummyField1': 'dummyFieldValue1',
    'dummyField2': 'dummyFieldValue2'
  }
}

, I get the size as 101 as the message.size output from the following callback function in the subcription client: ，我从订阅客户端中的以下回调 function 中得到了 101 作为message.size output 的大小：

def callback(message):
    print(f"Received {message.data}.")
    if message.attributes:
        print("Attributes:")
        for key in message.attributes:
            value = message.attributes.get(key)
            print(f"{key}: {value}")
    print(message.size)
    message.ack()

Whereas the size displayed on Cloud Console Monitoring is something around 79 B.而 Cloud Console Monitoring 上显示的大小约为 79 B。

So these are my questions:所以这些是我的问题：

Why are the sizes different for the same message?为什么同一条消息的大小不同？
Is the output of message.size in bytes? message.size的 output 是否以字节为单位？
How do I view the size of a message before publishing using the python client?如何在使用 python 客户端发布之前查看消息的大小？
How do I view the size of a single message on the Cloud Console, rather than a aggregated measure of size during a given timeframe which I could find in the Monitoring section?如何在 Cloud Console 上查看单个消息的大小，而不是在给定时间范围内的大小聚合度量，我可以在“监控”部分找到？

Answer 1

In order to further contribute to the community, I am summarising our discussion as an answer.为了进一步为社区做出贡献，我将我们的讨论总结为答案。

Regarding message.size , it is an attribute from a message in the subscriber client.关于message.size ，它是订阅者客户端中消息的属性。 In addition, according to the documentation , its definition is:另外，根据文档，它的定义是：

Returns the size of the underlying message, in bytes返回底层消息的大小，以字节为单位

Thus you would not be able to use it before publishing.因此，您将无法在发布之前使用它。

On the opposite side, message_size is a metric in Google Cloud Metrics and it is used by Cloud Monitoring, here .另一方面， message_size是 Google Cloud Metrics 中的一个指标，供 Cloud Monitoring 使用，此处为。

Finally, the last topic discussed was that your aim is to monitor your quota expenditure, so you can stay in the free tier.最后，讨论的最后一个主题是您的目标是监控您的配额支出，以便您可以留在免费套餐中。 For this reason, the best option would be using Cloud Monitoring and setup alerts based on the metrics such as pubsub.googleapis.com/topic/byte_cost .出于这个原因，最好的选择是使用 Cloud Monitoring 并根据pubsub.googleapis.com/topic/byte_cost等指标设置警报。 Here are some links, where you can find more about it: Quota utilisation , Alert event based , Alert Policies .以下是一些链接，您可以在其中找到更多相关信息：配额利用率、基于警报事件的警报策略。

Answer 2

Regarding your third question about viewing the message size before publishing, the billable message size is the sum of the message data, the attributes (key plus value), 20 bytes for the timestamp, and some bytes for the message_id .关于您在发布前查看消息大小的第三个问题，可计费消息大小是消息数据、属性（键加值）、时间戳的 20 个字节和message_id的一些字节的总和。 See the Cloud Pub/Sub Pricing guide.请参阅Cloud Pub/Sub 定价指南。 Note that the minimum of 1000 bytes is billable regardless of message size so if your messages may be smaller than 1000 bytes it's important to have good batch settings .请注意，无论消息大小如何，至少 1000 字节都是可计费的，因此如果您的消息可能小于 1000 字节，那么拥有良好的批处理设置很重要。 The message_id is assigned server-side and is not guaranteed to be a certain size but it is returned by the publish call as a future so you can see examples. message_id是在服务器端分配的，不保证一定大小，但它是由发布调用作为未来返回的，因此您可以查看示例。 This should allow you to get a pretty accurate estimate of message cost within the publisher client.这应该允许您在发布者客户端中获得相当准确的消息成本估计。 Note that you can also use the monitoring client library to read Cloud Monitoring metrics from within the Python client.请注意，您还可以使用监控客户端库从 Python 客户端中读取 Cloud Monitoring 指标。

Regarding your fourth question, there's no way to extract single data points from a distribution metric (Unless you have only published one message during the time period in the query in which case the mean would tell you the size of that one message).关于您的第四个问题，无法从分布指标中提取单个数据点（除非您在查询的时间段内仅发布了一条消息，在这种情况下，平均值会告诉您该消息的大小）。

获取 Google Cloud PubSub 中单条消息的大小

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-02-17 08:58:25

解决方案2
1 2021-02-18 13:17:08

获取 Google Cloud PubSub 中单条消息的大小

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-02-17 08:58:25

解决方案2 1 2021-02-18 13:17:08

解决方案1
1 已采纳 2021-02-17 08:58:25

解决方案2
1 2021-02-18 13:17:08