简体   繁体   English

对于具有 2 个分区和 2 个副本的主题,如何处理具有两个节点的 kafka 领导者故障转移?

[英]How to handle kafka leader failover with two nodes for a topic with 2 partitions and 2 replicas?

I'm playing with kafka in a multi node environment to test how failover works.我在多节点环境中使用 kafka 来测试故障转移的工作原理。 Actually, i have 2 VMs with 1 kafka node inside each VM, and only 1 zookeeper inside one of the two VMs.实际上,我有 2 个 VM,每个 VM 内有 1 个 kafka 节点,而两个 VM 之一内只有 1 个 zookeeper。 I know that there is not an optimal production configuration, but it's just to train myself and understand things better.我知道没有最佳的生产配置,但它只是为了训练自己并更好地理解事物。

Here is my configuration: VM1 ip: 192.168.64.2 (With only one broker with broker.id=2) VM2 ip: 192.168.64.3 (With zookeeper running here and broker with broker.id=1)这是我的配置: VM1 ip: 192.168.64.2(只有一个 broker.id=2 的代理) VM2 ip: 192.168.64.3(在这里运行 zookeeper,broker.id=1)

I start kafka through podman (this is not a problem with podman, everything is well configured)我通过podman启动kafka(这不是podman的问题,一切都配置好)

On VM1:在 VM1 上:

podman run -e KAFKA_BROKER_ID=2 -e KAFKA_ZOOKEEPER_CONNECT=192.168.64.3:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9093,PLAINTEXT_HOST://192.168.64.2:29092 -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=2 -e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT_HOST -e UNCLEAN_LEADER_ELECTION_ENABLE=true --pod zookeeper-kafka confluentinc/cp-kafka:latest

On VM2:在 VM2 上:

podman run -e KAFKA_BROKER_ID=1 -e KAFKA_ZOOKEEPER_CONNECT=localhost:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092,PLAINTEXT_HOST://192.168.64.3:29092 -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=2 -e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT_HOST -e UNCLEAN_LEADER_ELECTION_ENABLE=true  --pod zookeeper-kafka confluentinc/cp-kafka:latest

Now i create a topic "orders":现在我创建一个主题“订单”:

./kafktopics --create --bootstrap-server 192.168.64.2:29092,192.168.64.3:29092 --replication-factor 2 --partitions 2 --topic orders

Then i create a producer:然后我创建一个生产者:

./kafkconsole-producer --broker-list 192.168.64.2:29092,192.168.64.3:29092 --topic orders

And a consumer:还有一个消费者:

./kafka-console-consumer --bootstrap-server 192.168.64.2:29092,192.168.64.3:29092 --topic orders```

Here is what i am try to do:这是我正在尝试做的事情:

  1. Start Zookeeper, the 2 kafka nodes, create the "order" topic, the producer and the consumer (OK, everything works well)启动Zookeeper,2个kafka节点,创建“订单”主题,生产者和消费者(OK,一切正常)
  2. Send message in my producer and check the consumer receive it (OK)在我的生产者中发送消息并检查消费者是否收到它(OK)
  3. Kill the kafka node on VM2 (OK)杀死VM2上的kafka节点(OK)
  4. Send again a message in my producer and check the consumer receive it (OK, the broker on VM1 can distribute the message)在我的生产者中再次发送一条消息并检查消费者是否收到它(好的,VM1 上的代理可以分发消息)
  5. Restart the killed kafka node on VM2 (OK. After that i can see that the 2 partitions have VM1 as the leader)在 VM2 上重新启动被杀死的 kafka 节点(好的。之后我可以看到 2 个分区以 VM1 作为领导者)
  6. Send again a message in my producer and check the consumer receive id (OK)在我的生产者中再次发送一条消息并检查消费者接收 id (OK)
  7. Kill the kafka node on VM1, which is the leader of the 2 partitions now (OK)杀死VM1上的kafka节点,现在是2个分区的leader(OK)
  8. Send again a message in my producer and check the consumer receive it (OK, the broker on VM2 can distribute the message)在我的生产者中再次发送一条消息并检查消费者是否收到它(OK,VM2 上的代理可以分发消息)
  9. Restart the killed kafka node on VM1 (OK. After that i can see that the 2 partitions have VM2 as the leader)在 VM1 上重新启动被杀死的 kafka 节点(好的。之后我可以看到 2 个分区以 VM2 作为领导者)
  10. Send again a message in my producer and check the consumer receive it (OK)在我的生产者中再次发送一条消息并检查消费者是否收到它(OK)
  11. Kill again the kafka node on VM2 (OK)再次杀死VM2上的kafka节点(OK)
  12. Send again a message in my producer and check the consumer receive it (NOT OK): Here, the producer can't send the message, and my consumer never receive the message, After a few period: i get an error in my producer :在我的生产者中再次发送一条消息并检查消费者是否收到它(不正常):这里,生产者无法发送消息,我的消费者从未收到消息,经过一段时间后:我的生产者出现错误:
ERROR Error when sending message to topic orders with key: null, value: 9 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for orders-0:120000 ms has passed since batch creation

I really don't understand what is happening here?我真的不明白这里发生了什么? It works well at the beginning, but after start/stop/start broker, it start to fail.一开始它运行良好,但在启动/停止/启动代理之后,它开始失败。 I need to precise that i never kill the 2 broker at the same time.我需要准确地说我永远不会同时杀死 2 个经纪人。

Could you please explain me what i am missing here?你能解释一下我在这里缺少什么吗?

Thank you all:)谢谢你们:)


EDIT编辑

To complete comments below:要完成以下评论:

@OneCricketeer, I put the answer of your comment here. @OneCricketeer,我把你评论的答案放在这里。

At startup, when all it's fine:在启动时,一切都很好:

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 2   Replicas: 2,1   Isr: 2,1
    Topic: orders   Partition: 1    Leader: 1   Replicas: 1,2   Isr: 1,2

After killing VM2:杀死VM2后:

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 2   Replicas: 2,1   Isr: 2
    Topic: orders   Partition: 1    Leader: 2   Replicas: 1,2   Isr: 2

After killing VM1:杀死VM1后:

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 1   Replicas: 2,1   Isr: 1
    Topic: orders   Partition: 1    Leader: 1   Replicas: 1,2   Isr: 1

After killing VM2:杀死VM2后:

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 2   Replicas: 2,1   Isr: 2
    Topic: orders   Partition: 1    Leader: 2   Replicas: 1,2   Isr: 2

(From here, the producer can't publish message anymore) (从这里开始,生产者不能再发布消息了)

I put the answer of your comment here.我把你评论的答案放在这里。

At startup, when all it's fine:在启动时,一切都很好:

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 2   Replicas: 2,1   Isr: 2,1
    Topic: orders   Partition: 1    Leader: 1   Replicas: 1,2   Isr: 1,2

After killing VM2:杀死VM2后:

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 2   Replicas: 2,1   Isr: 2
    Topic: orders   Partition: 1    Leader: 2   Replicas: 1,2   Isr: 2

After killing VM1:杀死VM1后:

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 1   Replicas: 2,1   Isr: 1
    Topic: orders   Partition: 1    Leader: 1   Replicas: 1,2   Isr: 1

After killing VM2: (From here, the producer can't publish message anymore)杀死VM2后:(从这里,生产者不能再发布消息了)

Topic: orders   TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2   ReplicationFactor: 2    Configs:
    Topic: orders   Partition: 0    Leader: 2   Replicas: 2,1   Isr: 2
    Topic: orders   Partition: 1    Leader: 2   Replicas: 1,2   Isr: 2

After a long time of reading and investigating things about kafka, i finally found the answer of my problem.经过长时间阅读和调查有关kafka的事情,我终于找到了我的问题的答案。

With only 2 broker, i need the following configuration只有 2 个代理,我需要以下配置

KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=2
KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS=1

The problem was the default number of partitions for the offset topics.问题是偏移主题的默认分区数。 (it was 49 of 50 if i remember well). (如果我没记错的话,那是 50 次中的 49 次)。

Now with only one partition and 2 replicas, everything works well and i can start/stop/start/stop/.... my brokers as many time as i want, and the other broker take the lead and continue to handle my messages.现在只有一个分区和 2 个副本,一切正常,我可以启动/停止/启动/停止/....

Hope that could help someone in the future.希望这可以帮助将来的人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM