[英]How to handle kafka leader failover with two nodes for a topic with 2 partitions and 2 replicas?
I'm playing with kafka in a multi node environment to test how failover works.我在多节点环境中使用 kafka 来测试故障转移的工作原理。 Actually, i have 2 VMs with 1 kafka node inside each VM, and only 1 zookeeper inside one of the two VMs.实际上,我有 2 个 VM,每个 VM 内有 1 个 kafka 节点,而两个 VM 之一内只有 1 个 zookeeper。 I know that there is not an optimal production configuration, but it's just to train myself and understand things better.我知道没有最佳的生产配置,但它只是为了训练自己并更好地理解事物。
Here is my configuration: VM1 ip: 192.168.64.2 (With only one broker with broker.id=2) VM2 ip: 192.168.64.3 (With zookeeper running here and broker with broker.id=1)这是我的配置: VM1 ip: 192.168.64.2(只有一个 broker.id=2 的代理) VM2 ip: 192.168.64.3(在这里运行 zookeeper,broker.id=1)
I start kafka through podman (this is not a problem with podman, everything is well configured)我通过podman启动kafka(这不是podman的问题,一切都配置好)
On VM1:在 VM1 上:
podman run -e KAFKA_BROKER_ID=2 -e KAFKA_ZOOKEEPER_CONNECT=192.168.64.3:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9093,PLAINTEXT_HOST://192.168.64.2:29092 -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=2 -e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT_HOST -e UNCLEAN_LEADER_ELECTION_ENABLE=true --pod zookeeper-kafka confluentinc/cp-kafka:latest
On VM2:在 VM2 上:
podman run -e KAFKA_BROKER_ID=1 -e KAFKA_ZOOKEEPER_CONNECT=localhost:2181 -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:9092,PLAINTEXT_HOST://192.168.64.3:29092 -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=2 -e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT_HOST -e UNCLEAN_LEADER_ELECTION_ENABLE=true --pod zookeeper-kafka confluentinc/cp-kafka:latest
Now i create a topic "orders":现在我创建一个主题“订单”:
./kafktopics --create --bootstrap-server 192.168.64.2:29092,192.168.64.3:29092 --replication-factor 2 --partitions 2 --topic orders
Then i create a producer:然后我创建一个生产者:
./kafkconsole-producer --broker-list 192.168.64.2:29092,192.168.64.3:29092 --topic orders
And a consumer:还有一个消费者:
./kafka-console-consumer --bootstrap-server 192.168.64.2:29092,192.168.64.3:29092 --topic orders```
Here is what i am try to do:这是我正在尝试做的事情:
ERROR Error when sending message to topic orders with key: null, value: 9 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for orders-0:120000 ms has passed since batch creation
I really don't understand what is happening here?我真的不明白这里发生了什么? It works well at the beginning, but after start/stop/start broker, it start to fail.一开始它运行良好,但在启动/停止/启动代理之后,它开始失败。 I need to precise that i never kill the 2 broker at the same time.我需要准确地说我永远不会同时杀死 2 个经纪人。
Could you please explain me what i am missing here?你能解释一下我在这里缺少什么吗?
Thank you all:)谢谢你们:)
EDIT编辑
To complete comments below:要完成以下评论:
@OneCricketeer, I put the answer of your comment here. @OneCricketeer,我把你评论的答案放在这里。
At startup, when all it's fine:在启动时,一切都很好:
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: orders Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
After killing VM2:杀死VM2后:
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2
Topic: orders Partition: 1 Leader: 2 Replicas: 1,2 Isr: 2
After killing VM1:杀死VM1后:
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 1 Replicas: 2,1 Isr: 1
Topic: orders Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1
After killing VM2:杀死VM2后:
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2
Topic: orders Partition: 1 Leader: 2 Replicas: 1,2 Isr: 2
(From here, the producer can't publish message anymore) (从这里开始,生产者不能再发布消息了)
I put the answer of your comment here.我把你评论的答案放在这里。
At startup, when all it's fine:在启动时,一切都很好:
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2,1
Topic: orders Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1,2
After killing VM2:杀死VM2后:
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2
Topic: orders Partition: 1 Leader: 2 Replicas: 1,2 Isr: 2
After killing VM1:杀死VM1后:
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 1 Replicas: 2,1 Isr: 1
Topic: orders Partition: 1 Leader: 1 Replicas: 1,2 Isr: 1
After killing VM2: (From here, the producer can't publish message anymore)杀死VM2后:(从这里,生产者不能再发布消息了)
Topic: orders TopicId: I3hMNln9TpSuo76xHSpMXQ PartitionCount: 2 ReplicationFactor: 2 Configs:
Topic: orders Partition: 0 Leader: 2 Replicas: 2,1 Isr: 2
Topic: orders Partition: 1 Leader: 2 Replicas: 1,2 Isr: 2
After a long time of reading and investigating things about kafka, i finally found the answer of my problem.经过长时间阅读和调查有关kafka的事情,我终于找到了我的问题的答案。
With only 2 broker, i need the following configuration只有 2 个代理,我需要以下配置
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=2
KAFKA_OFFSETS_TOPIC_NUM_PARTITIONS=1
The problem was the default number of partitions for the offset topics.问题是偏移主题的默认分区数。 (it was 49 of 50 if i remember well). (如果我没记错的话,那是 50 次中的 49 次)。
Now with only one partition and 2 replicas, everything works well and i can start/stop/start/stop/.... my brokers as many time as i want, and the other broker take the lead and continue to handle my messages.现在只有一个分区和 2 个副本,一切正常,我可以启动/停止/启动/停止/....
Hope that could help someone in the future.希望这可以帮助将来的人。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.