简体繁体 English

Kafka将主题存储在多节点群集中的何处？

[英]Where do Kafka stores the topic in a multi node cluster?

原文 2015-12-11 11:39:38 3 2 java/ apache-kafka

I have a 3 node Kafka cluster and I am creating a topic in one of the node with the below command: bin/kafka-create-topic.sh --zookeeper host1.com:2181,host2.com:2181,host3.com:2181 --replica 1 --partition 1 --topic test 我有一个3节点的Kafka群集，并且正在使用以下命令在其中一个节点中创建一个主题： bin / kafka-create-topic.sh --zookeeper host1.com:2181,host2.com:2181,host3.com ：2181-副本1-分区1-主题测试

So,now when I push messages to the topic,one of my host is getting overloaded with the topic messages as Kafka stores the messages in disk space. 因此，现在当我将消息推送到主题时，由于Kafka将消息存储在磁盘空间中，因此我的主机中的一个正变得超载主题消息。 I want to know if there is any configuration to set to distribute the storing process across the cluster. 我想知道是否需要设置任何配置以在整个群集中分配存储过程。

Thanks, 谢谢，

2 个解决方案

As @om-nom-nom points out, you are creating a topic with a single partition. 正如@ om-nom-nom所指出的，您正在创建具有单个分区的主题。 So that topic will only ever be on the node that you created it on. 因此，该主题将永远只在创建该主题的节点上。 So even though you have a 3 node setup, the other two nodes will never be used. 因此，即使您设置了3个节点，也将永远不会使用其他两个节点。

Changing your topic to use multiple partitions is how you distribute a Kafka topic. 更改主题以使用多个分区是分发Kafka主题的方式。 The Kafka broker doesn't distribute messages to different nodes. Kafka代理不会将消息分发到其他节点。 It's the producers responsibility to determine which partition a message goes to. 确定消息转到哪个分区是生产者的责任。 This is something you can you determine, or let the producer use a round-robin approach to distribute to partitions, as @om-nom-nom points out. 正如@ om-nom-nom所指出的，您可以确定这一点，或者让生产者使用循环方法来分发到分区。

In Kafka producer, a partition key can be specified to indicate the destination partition of the message. 在Kafka生产者中，可以指定分区键以指示消息的目标分区。 By default, a hashing-based partitioner is used to determine the partition id given the key, and people can use customized partitioners also. 默认情况下，基于散列的分区器用于确定给定键的分区ID，人们也可以使用自定义分区器。

To reduce # of open sockets, in 0.8.0 ( https://issues.apache.org/jira/browse/KAFKA-1017 ), when the partitioning key is not specified or null, a producer will pick a random partition and stick to it for some time (default is 10 mins) before switching to another one. 为了减少打开套接字的数量，在0.8.0（ https://issues.apache.org/jira/browse/KAFKA-1017 ）中，当未指定分区键或为null时，生产者将选择一个随机分区并粘贴切换至另一时间之前（默认为10分钟）。
source 资源