简体   繁体   English

如何在Kafka的Spark流中共享Kafka分区?

[英]How Kafka partitions are shared in Spark streaming with Kafka?

I am wondering how the Kafka partitions are shared among the SimpleConsumer being run from inside the executor processes. 我想知道如何从执行程序进程内部运行的SimpleConsumer之间共享Kafka分区。 I know how the high level Kafka consumers are sharing the parititions across differernt consumers in the consumer group. 我知道高级Kafka消费者是如何在消费者组中的不同消费者之间共享分区的。 But how does that happen when Spark is using the Simple consumer ? 但是,当Spark使用Simple使用者时,该怎么办? There will be multiple executors for the streaming jobs across machines. 跨计算机的流作业将有多个执行程序。

All Spark executors should also be part of the same consumer group. 所有Spark执行者也应属于同一消费者组。 Spark is using roughly the same Java API for Kafka consumers, it's just the scheduling that's distributing it into multiple machines Spark为Kafka使用者使用了大致相同的Java API,只是将其分发到多台机器中的调度

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM