Get the size of a single message in Google Cloud PubSub

Question

I have a setup where I am publishing messages to Google Cloud PubSub service.

I wish to get the size of each individual message that I am publishing to PubSub . So for this, I identified the following approaches (Note: I am using the Python clients for publishing and subscribing, following a line-by-line implementation as presented in their documentation):

View the message count from the Google Cloud Console using the ' Monitoring ' feature
Create a pull subscription client and view the size using message.size in the callback function for the messages that are being pulled from the requested topic.
Estimate the size of the messages before publishing by converting them to JSON as per the PubSub message schema and using sys.getsizeof()

For a sample message like as follows which I published using a Python publisher client:

{
  "data": 'Test_message',
  "attributes": {
    'dummyField1': 'dummyFieldValue1',
    'dummyField2': 'dummyFieldValue2'
  }
}

, I get the size as 101 as the message.size output from the following callback function in the subcription client:

def callback(message):
    print(f"Received {message.data}.")
    if message.attributes:
        print("Attributes:")
        for key in message.attributes:
            value = message.attributes.get(key)
            print(f"{key}: {value}")
    print(message.size)
    message.ack()

Whereas the size displayed on Cloud Console Monitoring is something around 79 B.

So these are my questions:

Why are the sizes different for the same message?
Is the output of message.size in bytes?
How do I view the size of a message before publishing using the python client?
How do I view the size of a single message on the Cloud Console, rather than a aggregated measure of size during a given timeframe which I could find in the Monitoring section?

Answer 1

In order to further contribute to the community, I am summarising our discussion as an answer.

Regarding message.size , it is an attribute from a message in the subscriber client. In addition, according to the documentation , its definition is:

Returns the size of the underlying message, in bytes

Thus you would not be able to use it before publishing.

On the opposite side, message_size is a metric in Google Cloud Metrics and it is used by Cloud Monitoring, here .

Finally, the last topic discussed was that your aim is to monitor your quota expenditure, so you can stay in the free tier. For this reason, the best option would be using Cloud Monitoring and setup alerts based on the metrics such as pubsub.googleapis.com/topic/byte_cost . Here are some links, where you can find more about it: Quota utilisation , Alert event based , Alert Policies .

Answer 2

Regarding your third question about viewing the message size before publishing, the billable message size is the sum of the message data, the attributes (key plus value), 20 bytes for the timestamp, and some bytes for the message_id . See the Cloud Pub/Sub Pricing guide. Note that the minimum of 1000 bytes is billable regardless of message size so if your messages may be smaller than 1000 bytes it's important to have good batch settings . The message_id is assigned server-side and is not guaranteed to be a certain size but it is returned by the publish call as a future so you can see examples. This should allow you to get a pretty accurate estimate of message cost within the publisher client. Note that you can also use the monitoring client library to read Cloud Monitoring metrics from within the Python client.

Regarding your fourth question, there's no way to extract single data points from a distribution metric (Unless you have only published one message during the time period in the query in which case the mean would tell you the size of that one message).

Get the size of a single message in Google Cloud PubSub

Question

2 answers

solution1
1 ACCPTED 2021-02-17 08:58:25

solution2
1 2021-02-18 13:17:08

Get the size of a single message in Google Cloud PubSub

Question

2 answers

solution1 1 ACCPTED 2021-02-17 08:58:25

solution2 1 2021-02-18 13:17:08

solution1
1 ACCPTED 2021-02-17 08:58:25

solution2
1 2021-02-18 13:17:08