繁体   English   中英

不消耗Spark流式传输Kafka消息

[英]Spark streaming Kafka messages not consumed

我想使用Spark(1.6.2)Streaming从Kafka(broker v 0.10.2.1 )中的主题接收消息。

我正在使用Receiver方法。 代码如下代码:

public static void main(String[] args) throws Exception
{
    SparkConf sparkConf = new SparkConf().setAppName("SimpleStreamingApp");
    JavaStreamingContext javaStreamingContext = new JavaStreamingContext(sparkConf, new Duration(5000));
    //
    Map<String, Integer> topicMap = new HashMap<>();
    topicMap.put("myTopic", 1);
    //
    String zkQuorum = "host1:port1,host2:port2,host3:port3";
    //
    Map<String, String> kafkaParamsMap = new HashMap<>();
    kafkaParamsMap.put("bootstraps.server", zkQuorum);
    kafkaParamsMap.put("metadata.broker.list", zkQuorum);
    kafkaParamsMap.put("zookeeper.connect", zkQuorum);
    kafkaParamsMap.put("group.id", "group_name");
    kafkaParamsMap.put("security.protocol", "SASL_PLAINTEXT");
    kafkaParamsMap.put("security.mechanism", "GSSAPI");
    kafkaParamsMap.put("ssl.kerberos.service.name", "kafka");
    kafkaParamsMap.put("key.deserializer", "kafka.serializer.StringDecoder");
    kafkaParamsMap.put("value.deserializer", "kafka.serializer.DefaultDecoder");
    //
    JavaPairReceiverInputDStream<byte[], byte[]> stream = KafkaUtils.createStream(javaStreamingContext,
                            byte[].class, byte[].class,
                            DefaultDecoder.class, DefaultDecoder.class,
                            kafkaParamsMap,
                            topicMap,
                            StorageLevel.MEMORY_ONLY());

    VoidFunction<JavaPairRDD<byte[], byte[]>> voidFunc = new VoidFunction<JavaPairRDD<byte[], byte[]>> ()
    {
       public void call(JavaPairRDD<byte[], byte[]> rdd) throws Exception
       {
          List<Tuple2<byte[], byte[]>> all = rdd.collect();
          System.out.println("size of red: " + all.size());
       }
    }

    stream.forEach(voidFunc);

    javaStreamingContext.start();
    javaStreamingContext.awaitTermination();
}

访问Kafka是kerberized 当我发射

spark-submit --verbose --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=jaas.conf" --files jaas.conf,privKey.der --principal <accountName> --keytab <path to keytab file> --master yarn --jars <comma separated path to all jars> --class <fully qualified java main class> <path to jar file containing main class>

  1. VerifiableProperties从卡夫卡类记录列入为性能警告消息kafkaParams的HashMap:
 INFO KafkaReceiver: connecting to zookeeper: <the correct zookeeper quorum provided in kafkaParams map> VerifiableProperties: Property auto.offset.reset is overridden to largest VerifiableProperties: Property enable.auto.commit is not valid. VerifiableProperties: Property sasl.kerberos.service.name is not valid VerifiableProperties: Property key.deserializer is not valid ... VerifiableProperties: Property zookeeper.connect is overridden to .... 

我认为因为这些属性不被接受,所以它可能会影响流处理。

**当我以群集模式--master yarn启动时,这些警告消息不会出现**

  1. 之后,我看到配置后每5秒重复一次以下日志:

    INFO BlockRDD: Removing RDD 4 from persistence list

    INFO KafkaInputDStream: Removing blocks of RDD BlockRDD[4] at createStream at ...

    INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()

    INFO ... INFO BlockManager: Removing RDD 4

但是,我没有看到在控制台上打印任何实际消息

问题:为什么我的代码不打印任何实际消息?

我的gradle依赖是:

compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.6.2'
compile group: 'org.apache.spark', name: 'spark-streaming_2.10', version: '1.6.2'
compile group: 'org.apache.spark', name: 'spark-streaming-kafka_2.10', version: '1.6.2'

stream是JavaPairReceiverInputDStream的一个对象。 将其转换为Dstream并使用foreachRDD打印从Kafka消耗的消息

Spark 1.6.2不支持kafka 0.10,只支持kafka0.8。 对于kafka 0.10,你应该使用spark 2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM