简体   繁体   English

如何在 spark kafka 流中创建消费者组并将消费者分配给消费者组

[英]How can make consumer group in spark kafka stream and assign comsumers to consumer group

I have one topic having name topic_1 and created 4 partitions.我有一个名为 topic_1 的主题并创建了 4 个分区。 I need to read parallel in Kafka spark stream.我需要在 Kafka Spark 流中并行读取。 so I need to make one consumer group and consumers.所以我需要制作一个消费者群体和消费者。

Can you plz help how can I do this?你能帮助我怎么做吗?

For now Kafka spark stream, one time taking one request from Kafka.现在 Kafka Spark 流,一次从 Kafka 接收一个请求。

Assuming you are using KafkaUtils from Spark, it automatically will take advantage of the number of Spark Executors * Cores per Executor.假设您使用 Spark 的KafkaUtils ,它会自动利用 Spark Executor 的数量 * 每个 Executor 的核心数。

So, if you have 2 Spark Executors, with 2 Cores for each Executor, Spark will automatically consume 4 topic partitions in parallel.因此,如果您有 2 个 Spark Executor,每个 Executor 有 2 个核心,Spark 将自动并行消耗 4 个主题分区。

In Kafka Spark Streaming integration, the number of input tasks are determined by the number of partitions in the topic.在 Kafka Spark Streaming 集成中,输入任务的数量由主题中的分区数量决定。 If your topic has 4 partitions, Spark Streaming will spawn 4 tasks for each batch.如果您的主题有 4 个分区,Spark Streaming 将为每个批次生成 4 个任务。

If you have 1 Executor with 1 Core, then the core will sequentially executes the 4 tasks (no paralellism).如果您有 1 个 Executor 和 1 个核心,那么核心将依次执行 4 个任务(无并行性)。 Whereas if you have 2 Executor with 1 Core each, then each core will sequentially executes 2 tasks (so parallelism is 2).而如果您有 2 个 Executor,每个 Executor 有 1 个核心,那么每个核心将依次执行 2 个任务(因此并行度为 2)。

With 4 partitions you should configure any of the following, to achieve max consumer parallellism:对于 4 个分区,您应该配置以下任何一项,以实现最大消费者并行度:

  • 1 Executor with 4 Cores 1 个 4 核 Executor
  • 2 Executor with 2 Cores each 2 个 Executor,每个有 2 个内核
  • 4 Executor with 1 Core each 4 个 Executor,每个 1 个核心

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM