简体   繁体   English

Spark流媒体应用程序订阅相同的kafka主题

[英]Spark streaming applications subscribing to same kafka topic

I am new to spark and kafka and I have a slightly different usage pattern of spark streaming with kafka. 我是spark和kafka的新手,我和kafka的火花流的使用模式略有不同。 I am using 我在用

spark-core_2.10 - 2.1.1
spark-streaming_2.10 - 2.1.1
spark-streaming-kafka-0-10_2.10 - 2.0.0
kafka_2.10 - 0.10.1.1

Continuous event data is being streamed to a kafka topic which I need to process from multiple spark streaming applications. 连续事件数据正在流式传输到kafka主题,我需要从多个火花流应用程序处理这个主题。 But when I run the spark streaming apps, only one of them receives the data. 但是当我运行spark streaming应用程序时,只有其中一个接收数据。

     Map<String, Object> kafkaParams = new HashMap<String, Object>();

     kafkaParams.put("bootstrap.servers", "localhost:9092");
     kafkaParams.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
     kafkaParams.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); 
     kafkaParams.put("auto.offset.reset", "latest");
     kafkaParams.put("group.id", "test-consumer-group");
     kafkaParams.put("enable.auto.commit", "true");
     kafkaParams.put("auto.commit.interval.ms", "1000");
     kafkaParams.put("session.timeout.ms", "30000");

     Collection<String> topics =  Arrays.asList("4908100105999_000005");;
     JavaInputDStream<ConsumerRecord<String, String>> stream =  org.apache.spark.streaming.kafka010.KafkaUtils.createDirectStream(
                    ssc,
                    LocationStrategies.PreferConsistent(),
                    ConsumerStrategies.<String, String> Subscribe(topics, kafkaParams) );

      ... //spark processing

I have two spark streaming applications, usually the first one I submit consumes the kafka messages. 我有两个火花流应用程序,通常我提交的第一个消耗kafka消息。 Second application just waits for messages and never proceeds. 第二个应用程序只是等待消息而永远不会继续。 As I read, kafka topics can be subscribed from multiple consumers, is it not true for spark streaming ? 正如我所读到的,kafka主题可以从多个消费者订阅,对于火花流是不是真的? Or there is something I am missing with kafka topic and its configuration ? 或者kafka主题及其配置中缺少一些东西?

Thanks in advance . 提前致谢 。

You can create different streams with same groupids. 您可以使用相同的groupid创建不同的流。 Here are more details from the online documentation for 0.8 integrations, there are two approaches: 以下是0.8集成在线文档的更多细节,有两种方法:

Approach 1: Receiver-based Approach 方法1:基于接收者的方法

Multiple Kafka input DStreams can be created with different groups and topics for parallel receiving of data using multiple receivers. 可以使用不同的组和主题创建多个Kafka输入DStream,以使用多个接收器并行接收数据。

Approach 2: Direct Approach (No Receivers) 方法2:直接接近(无接收器)

No need to create multiple input Kafka streams and union them. 无需创建多个输入Kafka流并将它们联合起来。 With directStream, Spark Streaming will create as many RDD partitions as there are Kafka partitions to consume, which will all read data from Kafka in parallel. 使用directStream,Spark Streaming将创建与要使用的Kafka分区一样多的RDD分区,这些分区将并行地从Kafka读取数据。 So there is a one-to-one mapping between Kafka and RDD partitions, which is easier to understand and tune. 因此,Kafka和RDD分区之间存在一对一的映射,这更容易理解和调整。

You can read more at Spark Streaming + Kafka Integration Guide 0.8 您可以在Spark Streaming + Kafka Integration Guide 0.8中阅读更多内容

From your code looks like you are using 0.10, refer Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 从你的代码看起来你使用0.10,参考Spark Streaming + Kafka集成指南(Kafka经纪人版本0.10.0)

Even thought it is using spark streaming api, everything is controlled by kafka properties so depends on group id you specify in properties file, you can start multiple streams with different group id's. 即使认为它使用spark streaming api,一切都由kafka属性控制,因此取决于您在属性文件中指定的组ID,您可以启动具有不同组ID的多个流。

Cheers ! 干杯!

Number of consumers [Under a consumer group], cannot exceed the number of partitions in the topic. 消费者数量[在消费者群体下],不能超过主题中的分区数量。 If you want to consume the messages in parallel, then you will need to introduce suitable number of partitions and create receivers to process each partition. 如果要并行使用消息,则需要引入适当数量的分区并创建接收器来处理每个分区。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark Streaming - 写入 Kafka 主题 - Spark Streaming - write to Kafka topic Kafka主题分区为Spark流媒体 - Kafka topic partitions to Spark streaming 如何将 Spark Streaming DF 写入 Kafka 主题 - How to write spark streaming DF to Kafka topic 将Spark结构化流输出写入Kafka主题 - Writing Spark Structured Streaming Output to a Kafka Topic 使用scala从kafka主题流式传输Spark - Spark streaming from kafka topic using scala Spark Streaming 将数据写入 Kafka 主题 - Spark Streaming Writing Data To Kafka Topic 2个Spark应用程序无法使用相同的组ID从相同的Kafka Topic并行消费 - 2 spark applications can't consume from same Kafka Topic parrallel using same Group ID 使用 Spark 结构化流从许多不同的 kafka 代理中消费相同主题的最佳方式是什么? - What is the best way to consume the same topic from many different kafka brokers with spark structured streaming? 没有输出到Kafka主题:Spark结构化流和Kafka集成 - No output to Kafka topic: Spark Structured Streaming and Kafka Integration 为什么两个Spark Streaming作业从具有相同组ID的同一Kafka主题中提取消息,而不平衡负载但获得相同的消息? - Why did two spark streaming jobs pull messages from the same Kafka topic with same group id not balancing load but getting same messages?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM