I have a setup where I am publishing messages to Google Cloud PubSub service.
I wish to get the size of each individual message that I am publishing to PubSub . So for this, I identified the following approaches (Note: I am using the Python clients for publishing and subscribing, following a line-by-line implementation as presented in their documentation):
message.size
in the callback function for the messages that are being pulled from the requested topic.sys.getsizeof()
For a sample message like as follows which I published using a Python publisher client:
{
"data": 'Test_message',
"attributes": {
'dummyField1': 'dummyFieldValue1',
'dummyField2': 'dummyFieldValue2'
}
}
, I get the size as 101 as the message.size
output from the following callback function in the subcription client:
def callback(message):
print(f"Received {message.data}.")
if message.attributes:
print("Attributes:")
for key in message.attributes:
value = message.attributes.get(key)
print(f"{key}: {value}")
print(message.size)
message.ack()
Whereas the size displayed on Cloud Console Monitoring is something around 79 B.
So these are my questions:
message.size
in bytes? In order to further contribute to the community, I am summarising our discussion as an answer.
message.size
, it is an attribute from a message in the subscriber client. In addition, according to the documentation , its definition is:Returns the size of the underlying message, in bytes
Thus you would not be able to use it before publishing.
message_size
is a metric in Google Cloud Metrics and it is used by Cloud Monitoring, here . Finally, the last topic discussed was that your aim is to monitor your quota expenditure, so you can stay in the free tier. For this reason, the best option would be using Cloud Monitoring and setup alerts based on the metrics such as pubsub.googleapis.com/topic/byte_cost
. Here are some links, where you can find more about it: Quota utilisation , Alert event based , Alert Policies .
Regarding your third question about viewing the message size before publishing, the billable message size is the sum of the message data, the attributes (key plus value), 20 bytes for the timestamp, and some bytes for the message_id
. See the Cloud Pub/Sub Pricing guide. Note that the minimum of 1000 bytes is billable regardless of message size so if your messages may be smaller than 1000 bytes it's important to have good batch settings . The message_id
is assigned server-side and is not guaranteed to be a certain size but it is returned by the publish call as a future so you can see examples. This should allow you to get a pretty accurate estimate of message cost within the publisher client. Note that you can also use the monitoring client library to read Cloud Monitoring metrics from within the Python client.
Regarding your fourth question, there's no way to extract single data points from a distribution metric (Unless you have only published one message during the time period in the query in which case the mean would tell you the size of that one message).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.