简体   繁体   English

kafka 消费者/生产者如何提交消息/分区?

[英]How does kafka consumers/producers commit messages/partitions?

I have just started working with kafka, and started playing with kafka in my local machine.我刚刚开始使用 kafka,并开始在我的本地机器上使用 kafka。 I am able to produce and consume messages using python kafka client provided by confluent.我能够使用 confluent 提供的python kafka 客户端生成和使用消息。

What my understanding is so far,到目前为止我的理解是,

  1. kafka client (ie both consumer and producers) maintains a queue of messages. kafka 客户端(即消费者和生产者)维护一个消息队列。
  2. Producers, will store every produced message in local buffer queue.生产者将每条生产的消息存储在本地缓冲区队列中。 And producers need to explicitly push messages from local buffer to kafka cluster.并且生产者需要显式地将消息从本地缓冲区推送到 kafka 集群。
  3. On Consumer's end, it will somehow fetch messages from kafka cluster and store it in a local buffer queue and then pull these messages from buffer via api calls such as poll()在消费者端,它将以某种方式从 kafka 集群中获取消息并将其存储在本地缓冲区队列中,然后通过 api 调用(例如poll()从缓冲区中提取这些消息

First of all, am I missing here?首先,我在这里失踪了吗?

Also, I frequently come across a phrase "commiting messages/offsets" at both clients.此外,我经常在两个客户端遇到一个短语“提交消息/偏移量”。 What exactly does commiting messages/offsets to kafka mean?向 kafka 提交消息/偏移量究竟是什么意思?

Producers, will store every produced message in local buffer queue.生产者将每条生产的消息存储在本地缓冲区队列中。 And producers need to explicitly push messages from to kafka cluster.生产者需要将消息显式推送到 kafka 集群。

You can configure how big that batch is, the acknowledgment, etc. but you do not need to "explicitly push" after calling producer.produce(...)您可以配置该批次的大小、确认等,但您不需要在调用producer.produce(...)后“显式推送”

On Consumer's end, it will somehow fetch messages from kafka cluster and store it in a local buffer queue and then pull these messages from buffer via api calls such as poll()在消费者端,它将以某种方式从 kafka 集群中获取消息并将其存储在本地缓冲区队列中,然后通过 api 调用(例如poll()从缓冲区中提取这些消息

You are correct.你是对的。

First of all, am I missing here?首先,我在这里失踪了吗?

No, these concepts are important to understand for tuning/performance and you should take some care in how you configure your producers and consumers, but they should not impact how you code your producers/consumers (eg don't do any tricks in timing your cosumer.poll() ).不,这些概念对于了解调优/性能很重要,您应该注意如何配置生产者和消费者,但它们不应影响您对生产者/消费者的编码方式(例如,不要在计时cosumer.poll() )。

On the producing side, there is a notion of transactions, if you want to back out of publishing a message (this is to support two phase commits) but I am not sure how this works with the Python API.在生产方面,如果您想退出发布消息(这是为了支持两阶段提交),则存在事务的概念,但我不确定这如何与 Python API 一起使用。 Beyond that, you should consider messages sent when you call producer.produce() .除此之外,您应该考虑调用producer.produce()时发送的消息。 You should not need to call producer.flush() after every producer.produce()您不需要在每个 producer.produce( producer.flush() producer.produce()

On the consuming side, there is the notion of commits and offsets.在消费方面,有提交和偏移的概念。 At a high level you can think of it this way: when you poll() you are asking for messages form the API, but for ease of understanding, you are asking for messages from the topic.在高层次上,您可以这样想:当您poll()时,您正在从 API 请求消息,但为了便于理解,您正在请求来自该主题的消息。 Now as you work your way through messages in the topic, you may want some way to let Kafka know that you have already read and processed some messages, and would not like to see them again.现在,当您处理主题中的消息时,您可能希望通过某种方式让 Kafka 知道您已经阅读并处理了一些消息,并且不想再看到它们。

Each message has a offset in the topic (this is actually in the partition, but we can keep it simple and say topic).每条消息在主题中都有一个偏移量(这实际上是在分区中,但我们可以保持简单,说主题)。 You can think of this offset as the position of the message in the topic (first message 0 , second message 1 , so on and so forth, each message added has a higher offset than the last).您可以将此偏移量视为主题中消息的 position(第一条消息0 ,第二条消息1 ,依此类推,添加的每条消息的偏移量都高于最后一条)。 When you explicitly call commit , you actually committing the offset and telling Kafka that you have read all of the messages up to that point and don't want to see them again.当您显式调用commit时,您实际上提交了offset并告诉 Kafka 您已经阅读了到目前为止的所有消息并且不想再次看到它们。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM