[英]get results from kafka for a specific period of time
Here is my code, that uses kafka-python
.这是我的代码,它使用kafka-python
。
now = datetime.now()
month_ago = now - relativedelta(month=1)
topic = 'some_topic_name'
consumer = KafkaConsumer(topic, bootstrap_servers=PROD_KAFKA_SERVER,
security_protocol=PROTOCOL,
group_id=GROUP_ID,
enable_auto_commit=False,
sasl_mechanism=SASL_MECHANISM, sasl_plain_username=SASL_USERNAME,
sasl_plain_password=SASL_PASSWORD)
for msg in consumer:
print(msg)
I want to get results from topic just between now
and month_ago
in a loop.我想循环从now
到month_ago
之间的主题获取结果。 How can I do this?我怎样才能做到这一点?
Thanks for any help!谢谢你的帮助!
Get topic partitions, assigned to your consumer:获取分配给您的消费者的主题分区:
partitions = consumer.assignment()
Get offsets for partitions by datetime:按日期时间获取分区的偏移量:
month_ago_timestamp = int(month_ago.timestamp() * 1000)
partition_to_timestamp = {part: month_ago_timestamp for part in partitions}
mapping = consumer.offsets_for_times(partition_to_timestamp)
Seek partitions to offsets:寻找偏移量的分区:
for partition, offset_and_timestamp in partition_to_offset_and_timestamp.items():
consumer.seek(partition, offset_and_timestamp[0])
Warning, Consumer can return None, set with int zero or block indefinitely in cases like missing topic, missing partition or messages without timestamp警告,在缺少主题、缺少分区或没有时间戳的消息等情况下,消费者可以返回无、设置为零或无限期阻塞
Finally, I do this:) My code looks like this:最后,我这样做了 :) 我的代码如下所示:
topic = 'some_topic_name'
consumer = KafkaConsumer(bootstrap_servers=PROD_KAFKA_SERVER,
security_protocol=PROTOCOL,
group_id=GROUP_ID,
sasl_mechanism=SASL_MECHANISM, sasl_plain_username=SASL_USERNAME,
sasl_plain_password=SASL_PASSWORD)
month_ago = (datetime.now() - relativedelta(months=1)).timestamp()
topic_partition = TopicPartition(topic, 0)
assigned_topic = [topic_partition]
consumer.assign(assigned_topic)
partitions = consumer.assignment()
partition_to_timestamp = {part: int(month_ago * 1000) for part in partitions}
end_offsets = consumer.end_offsets(list(partition_to_timestamp.keys()))
mapping = consumer.offsets_for_times(partition_to_timestamp)
for partition, ts in mapping.items():
end_offset = end_offsets.get(partition)
consumer.seek(partition, ts[0])
for msg in consumer:
value = json.loads(msg.value.decode('utf-8'))
# do something
if msg.offset == end_offset - 1:
consumer.close()
break
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.