[英]How to receive the recent data only in event hub
在 eventthub 中,我有两个“发送者”和“接收者”脚本用于这两者之间的通信。
我面临的问题是,我似乎收到了我昨天发送的数据集以及我刚刚一起发送的数据集。 我试图通过时间段或事件数来控制数据量。
sender.py 的基本代码如下:
CONSUMER_GROUP = "$default"
OFFSET = Offset("-1")
PARTITION = "0"
total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
receiver = client.add_receiver(
CONSUMER_GROUP, PARTITION, prefetch=0, offset=OFFSET)
client.run()
start_time = time.time()
batch = receiver.receive(timeout=100)
for event_data in batch[-10:]:
print("Received: {}".format(event_data.body_as_str(encoding='UTF-8')))
total += 1
end_time = time.time()
client.stop()
run_time = end_time - start_time
print("Received {} messages in {} seconds".format(total, run_time))
except KeyboardInterrupt:
pass
finally:
client.stop()
我刚刚找到了一个使用偏移量来控制事件数据读取过程的解决方案。
我们首先需要做的是获取事件数据的偏移量。
如下代码:
logger = logging.getLogger("azure")
ADDRESS = "amqps://xxx.servicebus.windows.net/xxx"
USER = "RootManageSharedAccessKey"
KEY = "xxx"
CONSUMER_GROUP = "$default"
#first, set offset to -1 to read all the event data
OFFSET = Offset("-1")
PARTITION = "0"
total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
receiver = client.add_receiver(
CONSUMER_GROUP, PARTITION, prefetch=5000, offset=OFFSET)
client.run()
start_time = time.time()
print("**begin receive**")
for event_data in receiver.receive(timeout=100):
last_offset = event_data.offset.value
last_sn = event_data.sequence_number
#here, we print out the offset of each event data
print("Received: {}, last_offset: {}, last_sn: {}".format(event_data.body_as_str(encoding='UTF-8'),last_offset,last_sn))
total += 1
end_time = time.time()
client.stop()
run_time = end_time - start_time
print("Received {} messages in {} seconds".format(total, run_time))
except KeyboardInterrupt:
pass
finally:
client.stop()
执行后,您可以看到每个数据的所有偏移量,截图如下:
然后,您知道每个事件数据的偏移量。 如果你想从数字 40 到数字 53 获取数据。数字 40 的偏移量是 237080,所以在你的代码中,将偏移量更改为小于 237080 的值,在这行代码中将其设置为 237079 OFFSET = Offset("237079")
。
如下代码:
logger = logging.getLogger("azure")
ADDRESS = "amqps://xxx.servicebus.windows.net/xx"
USER = "RootManageSharedAccessKey"
KEY = "xxx"
CONSUMER_GROUP = "$default"
#set the offset
OFFSET = Offset("237079")
PARTITION = "0"
total = 0
last_sn = -1
last_offset = "-1"
client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
try:
receiver = client.add_receiver(
CONSUMER_GROUP, PARTITION, prefetch=5000, offset=OFFSET)
client.run()
start_time = time.time()
print("**begin receive**")
for event_data in receiver.receive(timeout=100):
last_offset = event_data.offset.value
last_sn = event_data.sequence_number
print("Received: {}, last_offset: {}, last_sn: {}".format(event_data.body_as_str(encoding='UTF-8'),last_offset,last_sn))
total += 1
end_time = time.time()
client.stop()
run_time = end_time - start_time
print("Received {} messages in {} seconds".format(total, run_time))
except KeyboardInterrupt:
pass
finally:
client.stop()
执行代码后,只返回指定偏移量的事件数据。 截图如下:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.