如何使 Kafka 代理故障轉移對消費者起作用?

我正在使用 2 個經紀人方案。 創建了一個像這樣的復制主題:

  $KAFKA_HOME/bin/kafka-topics.sh --create \
  --zookeeper localhost:2181 \
  --replication-factor 2 \
  --partitions 3 \
  --topic replicated_topic

服務器配置的摘錄如下所示(請注意,除了端口、代理 ID 和日志目錄外,兩台服務器都相同):

讓我們使用 2 個經紀人來描述我的主題:

Topic:replicated_topic  PartitionCount:3    ReplicationFactor:2 Configs:
    Topic: replicated_topic Partition: 0    Leader: 1   Replicas: 1,0   Isr: 1,0
    Topic: replicated_topic Partition: 1    Leader: 0   Replicas: 0,1   Isr: 1,0
    Topic: replicated_topic Partition: 2    Leader: 1   Replicas: 1,0   Isr: 1,0

我們看一下消費者的代碼: Consumer ( impl Callable )

public Void call() throws Exception {
    final Properties props = new Properties();

    final Consumer<Integer, String> consumer = new KafkaConsumer<>(props);


    ConsumerRecords<Integer, String> records = null;
    while (!Thread.currentThread().isInterrupted()) {
        records = consumer.poll(1000);
        if (records.isEmpty()) {
        records.forEach(rec -> LOGGER.debug("{}@{} consumed from topic {}, partition {} pair ({},{})",
                ConsumerCallable.class.getSimpleName(), hashCode(), rec.topic(), rec.partition(), rec.key(), rec.value()));

    return null;


private static final String TOPIC_NAME = "replicated_topic";
private static final String BOOTSTRAP_SERVERS = "localhost:9092, localhost:9093";
private static final Logger LOGGER = LoggerFactory.getLogger(Main.class);

public static void main(String[] args) {

    ExecutorService executor = Executors.newCachedThreadPool();
    executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group1"));
    executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group2"));
    executor.submit(new ConsumerCallable(TOPIC_NAME, BOOTSTRAP_SERVERS, "group3"));

    try (Producer<Integer, String> producer = createProducer()) {
        Scanner scanner = new Scanner(System.in);
        String line = null;
        LOGGER.debug("Please enter 'k v' on the command line to send to Kafka or 'quit' to exit");
        while (scanner.hasNextLine()) {
            line = scanner.nextLine();
            if (line.trim().toLowerCase().equals("quit")) {
            String[] elements = line.split(" ");
            Integer key = Integer.parseInt(elements[0]);
            String value = elements[1];
            producer.send(new ProducerRecord<>(TOPIC_NAME, key, value));

private static Producer<Integer, String> createProducer() {
    Properties props = new Properties();
    props.put(ProducerConfig.CLIENT_ID_CONFIG, "KafkaExampleProducer");
    return new KafkaProducer<>(props);


  1. 所有經紀人都起來了:

kafka 主題的輸出:

Topic:replicated_topic  PartitionCount:3    ReplicationFactor:2 Configs:
    Topic: replicated_topic Partition: 0    Leader: 1   Replicas: 1,0   Isr: 1,0
    Topic: replicated_topic Partition: 1    Leader: 0   Replicas: 0,1   Isr: 1,0
    Topic: replicated_topic Partition: 2    Leader: 1   Replicas: 1,0   Isr: 1,0


12:52:30.460 DEBUG Main - Please enter 'k v' on the command line to send to Kafka or 'quit' to exit
1 u
12:52:35.555 DEBUG ConsumerCallable - ConsumerCallable@1241910294 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable@1361430455 consumed from topic replicated_topic, partition 0 pair (1,u)
12:52:35.559 DEBUG ConsumerCallable - ConsumerCallable@186743616 consumed from topic replicated_topic, partition 0 pair (1,u)
2 d
12:52:38.096 DEBUG ConsumerCallable - ConsumerCallable@186743616 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.098 DEBUG ConsumerCallable - ConsumerCallable@1361430455 consumed from topic replicated_topic, partition 2 pair (2,d)
12:52:38.100 DEBUG ConsumerCallable - ConsumerCallable@1241910294 consumed from topic replicated_topic, partition 2 pair (2,d)


2 關閉代理 2:


Topic:replicated_topic  PartitionCount:3    ReplicationFactor:2 Configs:
    Topic: replicated_topic Partition: 0    Leader: 0   Replicas: 1,0   Isr: 0
    Topic: replicated_topic Partition: 1    Leader: 0   Replicas: 0,1   Isr: 0
    Topic: replicated_topic Partition: 2    Leader: 0   Replicas: 1,0   Isr: 0


3 t
12:57:03.898 DEBUG ConsumerCallable - ConsumerCallable@186743616 consumed from topic replicated_topic, partition 1 pair (3,t)
4 p
12:57:06.058 DEBUG ConsumerCallable - ConsumerCallable@186743616 consumed from topic replicated_topic, partition 1 pair (4,p)

現在只有 1 個消費者接收數據。 讓我們再次啟動代理 2:現在其他 2 個消費者收到丟失的數據:

12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable@1241910294 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.863 DEBUG ConsumerCallable - ConsumerCallable@1241910294 consumed from topic replicated_topic, partition 1 pair (4,p)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable@1361430455 consumed from topic replicated_topic, partition 1 pair (3,t)
12:57:50.870 DEBUG ConsumerCallable - ConsumerCallable@1361430455 consumed from topic replicated_topic, partition 1 pair (4,p)
  1. 關閉broker 1:

現在只有 2 個消費者接收數據:

5 c
12:59:13.718 DEBUG ConsumerCallable - ConsumerCallable@1361430455 consumed from topic replicated_topic, partition 2 pair (5,c)
12:59:13.737 DEBUG ConsumerCallable - ConsumerCallable@1241910294 consumed from topic replicated_topic, partition 2 pair (5,c)
6 s
12:59:16.437 DEBUG ConsumerCallable - ConsumerCallable@1361430455 consumed from topic replicated_topic, partition 2 pair (6,s)
12:59:16.438 DEBUG ConsumerCallable - ConsumerCallable@1241910294 consumed from topic replicated_topic, partition 2 pair (6,s)


我的觀點是(對不起,我寫了很多,但我正在嘗試捕捉上下文),如何確保無論我停止哪個代理,消費者都能正常工作? (正常接收所有消息)?

PS:我嘗試設置 offsets.topic.replication.factor=2 或 3,但沒有任何效果。

如果否,將不忽略發送給該代理的消息。 活動代理的數量少於配置的副本。 每當新的Kafka代理加入群集時,數據就會復制到該節點。 https://stackoverflow.com/a/38998062/6274525




此屬性用於管理偏移量和使用者交互。 啟動kafka服務器時,它將自動創建名稱為__consumer_offsets的主題。 因此,如果未在此主題中創建副本,那么使用者就無法確定是否將某些內容推送到它正在偵聽的主題中。

鏈接到此屬性的詳細信息: https : //kafka.apache.org/documentation/#brokerconfigs


  1. 如果您降低1個非領導者的節點-那么您就很好了,消費者繼續工作

  2. 如果降低領導者節點,那么使用者可能會或可能不會工作(工作=繼續收到發布)

這是一個問題。 我使用的是Kafka 1.1.0。

  1. 此外,如果您殺死了領導者0並觀察到消費者不起作用,您還將注意到新的領導者現在是1(或2)。

  2. 您帶回經紀人“ 0”,並觀察到消費者收到“丟失”消息

  3. 現在放下新的領導者(1 OR 2),消費者仍然可以正常工作。 因此,問題似乎正在扼殺最初的領導者。


因此出現的模式是,如果您殺死啟動群集時啟動的FIRST代理,則使用者將停止接收消息。 將測試更多並更新。 顯然,只要維持法定人數,關閉其他經紀商就不會影響消費者。


