简体   繁体   English

不清楚Kafka中auto.offset.reset和enable.auto.commit的含义

[英]Not clear about the meaning of auto.offset.reset and enable.auto.commit in Kafka

I am new to Kafka,and I don't really understand the meaning of Kafka configuration, can anyone explain more understandable to me !我是 Kafka 的新手,我不是很了解 Kafka 配置的含义,谁能解释得更懂我!

Here is my code:这是我的代码:

 val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "master:9092,slave1:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "GROUP_2017",
  "auto.offset.reset" -> "latest", //earliest or latest
  "enable.auto.commit" -> (true: java.lang.Boolean)
)

what does it mean in my code?在我的代码中是什么意思?

I will explain to you the meaning, but I highly suggest to read Kafka Web Site Configuration我会解释给你的意思,但我强烈建议阅读Kafka网站配置

"bootstrap.servers" -> "master:9092,slave1:9092"

Essentially the Kafka cluster configuration: IP and Port.本质上是 Kafka 集群配置:IP 和端口。

 "key.deserializer" -> classOf[StringDeserializer]
 "value.deserializer" -> classOf[StringDeserializer]

This SO answer explain what is the purpose. 这个 SO答案解释了目的是什么。

"group.id" -> "GROUP_2017"

A consumer process will belong to a groupId.一个消费者进程将属于一个 groupId。 A groupId can have multiple Consumers and Kafka will assign only one Consumer process to only one Partition (for data consuming).一个 groupId 可以有多个 Consumer,Kafka 只会将一个 Consumer 进程分配给一个 Partition(用于数据消费)。 If the number of consumers is greater than the partitions available, then some processes will be idle.如果消费者数量大于可用分区,那么一些进程将处于空闲状态。

"enable.auto.commit" -> (true: java.lang.Boolean)

Wether that flag is true, then Kafka is able to commit the message you brought from Kafka using Zookeeper to persist the last 'offset' which it read.如果该标志为真,那么 Kafka 能够提交您使用 Zookeeper 从 Kafka 带来的消息,以保留它读取的最后一个“偏移量”。 This approach is not the best to use when you want a more robust solution for a production system, because does not ensure that the records you brought were correctly processed (using the logic you wrote in your code).当您想要为生产系统提供更健壮的解决方案时,这种方法不是最好的使用方法,因为不能确保正确处理您带来的记录(使用您在代码中编写的逻辑)。 If this flag is false, Kafka will not know which was the last offset read so when you restart the process, it will start reading the 'earliest' or the 'latest' offset depending on the value of your next flag (auto.offset.reset).如果此标志为 false,Kafka 将不知道最后读取的偏移量是哪个,因此当您重新启动进程时,它将开始读取“最早”或“最新”偏移量,具体取决于您的下一个标志(auto.offset.offset)的值。重启)。 Finally, This Cloudera article explains in details how to manage in a proper way the offsets.最后, 这篇 Cloudera 文章详细解释了如何以适当的方式管理偏移量。

"auto.offset.reset" -> "latest"

This flag tells Kafka where to start reading offsets in case you do not have any 'commit' yet.如果您还没有任何“提交”,此标志会告诉 Kafka 从哪里开始读取偏移量。 In others words, it will start either from the 'earliest' or from the 'latest' if you have not persisted any offset in Zookeeper yet (Manually or using enable.auto.commit flag).换句话说,如果您尚未在 Zookeeper 中保留任何偏移量(手动或使用 enable.auto.commit 标志),它将从“最早”或“最新”开始。

Adding more details on configurations mentioned in the title: "Not clear about the meaning of auto.offset.reset and enable.auto.commit in Kafka"添加标题中提到的配置的更多细节:“不清楚Kafka中auto.offset.resetenable.auto.commit的含义”

auto.offset.reset自动偏移重置

With the auto.offset.reset configuration you can steer the behavior of your consumer (as part of a consumer group) in situations when your Consumer Group has never consumed and committed from a particular topic or the last committed offset from that Consumer Group was deleted (eg through cleanup policy).使用auto.offset.reset配置,您可以在您的消费者组从未从特定主题消费和提交或该消费者组的最后提交偏移量被删除的情况下引导您的消费者(作为消费者组的一部分)的行为(例如,通过清理策略)。

Each message in a partition of a Kafka topic has a unique identifier which is the offset . Kafka 主题分区中的每条消息都有一个唯一标识符,即offset Offsets are unique per Kafka partition.每个 Kafka 分区的偏移量是唯一的。 A consumer usually commits back the offsets on each partition of the topic it consumed.消费者通常会在其消费的主题的每个分区上提交偏移量。 That way, the consumer is able to avoid duplicate readings.这样,消费者就能够避免重复读数。

Imagine you have a consumer reading from a topic for the first time (or if you change the consumer group name).假设您有一个消费者第一次阅读某个主题(或者如果您更改了消费者组名称)。 The consumer group has therefore never committed any offsets.因此,消费者组从未提交任何偏移量。 According to the Config Docs you can choose between the following behaviors with the configuration auto.offset.reset :根据配置文档,您可以使用配置auto.offset.reset在以下行为之间进行选择:

  • earliest: automatically reset the offset to the earliest offset最早:自动将偏移量重置为最早的偏移量

  • latest: automatically reset the offset to the latest offset最新:自动将偏移量重置为最新的偏移量

  • none: throw exception to the consumer if no previous offset is found for the consumer's group none:如果没有找到消费者组的先前偏移量,则向消费者抛出异常

  • anything else: throw exception to the consumer.其他任何事情:向消费者抛出异常。

The default setting is latest .默认设置为latest

enable.auto.commit 启用.自动.提交

As mentioned above it is critical to think about your offsets and their commits when consuming messages from Kafka.如上所述,在使用来自 Kafka 的消息时,考虑您的偏移量及其提交至关重要。 When setting the configuration enable.auto.commit to true the consumer offsets will be committed automatically in the background.将配置enable.auto.commit设置为true ,消费者偏移量将在后台自动提交。

In the JavaDocs of KafkaConsumer you will find a nice example on how to manually commit the offsets in a Consumer Client usingKafkaConsumerJavaDocs 中,您将找到一个很好的示例,说明如何使用手动提交消费者客户端中的偏移量

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.commitSync();

To emphasize the importance of the Offset Management in you consumer client again it is worth reading the whole Java Docs description or the confluent Kafka Documentation on Offset Management .为了再次强调 Offset Management 在您的消费者客户端中的重要性,值得阅读整个 Java Docs 描述或关于 Offset Management融合的 Kafka 文档

auto.offset.reset is ONLY at play when there is no valid committed offset; auto.offset.reset仅在没有有效提交偏移量时才起作用; such as at the first time you start the system, or after a committed offset expires and is deleted because its too old.例如在您第一次启动系统时,或者在提交的偏移量到期并因为太旧而被删除之后。

enable.auto.commit is about a choice to have offsets committed automatically in the background vs explicit manual control in the foreground. enable.auto.commit是关于在后台自动提交偏移量与在前台显式手动控制的选择。

auto.offset.reset

What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (eg because that data has been deleted):当 Kafka 中没有初始偏移量或服务器上不再存在当前偏移量(例如,因为该数据已被删除)时该怎么办:

  • earliest : automatically reset the offset to the earliest offset earliest :自动将偏移量重置为最早的偏移量
  • latest : automatically reset the offset to the latest offset latest :自动将偏移量重置为最新的偏移量
  • none : throw exception to the consumer if no previous offset is found for the consumer's group none :如果没有找到消费者组的先前偏移量,则向消费者抛出异常
  • anything else: throw exception to the consumer.其他任何事情:向消费者抛出异常。
Type:
string
Default:
latest
Valid Values:
[latest, earliest, none]
Importance:
medium
enable.auto.commit

If true, the consumer's offset will be periodically committed in the background.如果为 true,则消费者的偏移量将在后台定期提交。

Type:
boolean
Default:
true
Valid Values:

Importance:
medium
auto.commit.interval.ms

The frequency in milliseconds that the consumer offsets are auto-committed to Kafka if enable.auto.commit is set to true .如果enable.auto.commit设置为true ,则消费者偏移量自动提交给 Kafka 的频率(以毫秒为单位)。

Type:
int
Default:
5000 (5 seconds)
Valid Values:
[0,...]
Importance:
low

The full set of consumer configuration parameters in documented on the Apache Kafka web site at https://kafka.apache.org/documentation.html#newconsumerconfigs Apache Kafka 网站上记录了完整的消费者配置参数集,网址为https://kafka.apache.org/documentation.html#newconsumerconfigs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM