如何使用直接流在Kafka Spark Streaming中指定使用者组

Question

How to specify consumer group id for kafka spark streaming using direct stream API. 如何使用直接流API为kafka spark流指定使用者组ID。

HashMap<String, String> kafkaParams = new HashMap<String, String>();
kafkaParams.put("metadata.broker.list", brokers);
kafkaParams.put("auto.offset.reset", "largest");
kafkaParams.put("group.id", "app1");

    JavaPairInputDStream<String, String> messages = KafkaUtils.createDirectStream(
            jssc, 
            String.class, 
            String.class,
            StringDecoder.class, 
            StringDecoder.class, 
            kafkaParams, 
            topicsSet
    );

though i have specified the configuration not sure if missing something. 虽然我已经指定配置不确定是否遗漏了什么。 using spark1.3 使用spark1.3

kafkaParams.put("group.id", "app1");

Answer 1

The direct stream API use the low level Kafka API, and as so doesn't use consumer groups in anyway. 直接流API使用低级Kafka API，因此无论如何都不使用消费者组。 If you want to use consumer groups with Spark Streaming, you'll have to use the receiver based API. 如果要将消费者组与Spark Streaming一起使用，则必须使用基于接收器的API。

Full details are available in the doc ! doc中提供了完整的详细信息！

Answer 2

createDirectStream in spark-streaming-kafka-0-8 does not support group mode, because it's using the low-level Kafka API. spark-streaming-kafka-0-8中的createDirectStream不支持组模式，因为它使用的是低级Kafka API。

But spark-streaming-kafka-0-10 supports group mode. 但spark-streaming-kafka-0-10支持群组模式。

Consumer Configs 消费者配置

In 0.9.0.0 we introduced the new Java consumer as a replacement for the older Scala-based simple and high-level consumers. 在0.9.0.0中，我们引入了新的Java消费者作为旧的基于Scala的简单和高级消费者的替代品。 The configs for both new and old consumers are described below. 新老消费者的配置如下所述。

In the New Consumer Configs , it has the group.id item. 在New Consumer Configs ，它具有group.id项。

The Spark Streaming integration for Kafka 0.10 is using the new API. Spark Streaming integration for Kafka 0.10的Spark Streaming integration for Kafka 0.10正在使用新的API。 https://spark.apache.org/docs/2.1.1/streaming-kafka-0-10-integration.html https://spark.apache.org/docs/2.1.1/streaming-kafka-0-10-integration.html

The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 Direct Stream approach. Kafka 0.10的Spark Streaming集成在设计上与0.8 Direct Stream方法类似。 It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. 它提供简单的并行性，Kafka分区和Spark分区之间的1：1对应关系，以及对偏移和元数据的访问。 However, because the newer integration uses the new Kafka consumer API instead of the simple API, there are notable differences in usage. 但是，由于较新的集成使用新的Kafka使用者API而不是简单的API，因此使用方法存在显着差异。

I've tested the group mode in spark-streaming-kafka-0-10 , it does work. 我已经在spark-streaming-kafka-0-10测试了组模式，它确实有效。

如何使用直接流在Kafka Spark Streaming中指定使用者组

问题描述

2 个解决方案

解决方案1
6 已采纳 2016-04-10 12:24:40

解决方案2
0 2018-09-01 09:23:57

如何使用直接流在Kafka Spark Streaming中指定使用者组

问题描述

2 个解决方案

解决方案1 6 已采纳 2016-04-10 12:24:40

解决方案2 0 2018-09-01 09:23:57

解决方案1
6 已采纳 2016-04-10 12:24:40

解决方案2
0 2018-09-01 09:23:57