简体   繁体   English

从 kafka Spark 流接收时获取空值

[英]Getting empty values while receiving from kafka Spark streaming

i am very new to Spark streaming and i am implementing small exercise like sending XML data from kafka and need to receive that streaming data through spark streaming.我对Spark 流非常陌生,我正在实施一些小练习,例如从kafka发送XML数据,并且需要通过Spark 流接收该数据 I tried in all possible ways.. but every time i am getting empty values.我尝试了所有可能的方法......但每次我都得到空值。

There is no problem in Kafka side, only problem is receiving the Streaming data from Spark side. Kafka端没有问题,唯一的问题是从 Spark 端接收Streaming 数据。

Here is the code how i am implementing:这是我如何实施的代码:

package com.package;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaStreamingContext;

public class SparkStringConsumer {

    public static void main(String[] args) {

        SparkConf conf = new SparkConf()
                .setAppName("kafka-sandbox")
                .setMaster("local[*]");
        JavaSparkContext sc = new JavaSparkContext(conf);
        JavaStreamingContext ssc = new JavaStreamingContext(sc, new Duration(2000));

        Map<String, String> kafkaParams = new HashMap<>();
        kafkaParams.put("metadata.broker.list", "localhost:9092");
        Set<String> topics = Collections.singleton("mytopic");

        JavaPairInputDStream<String, String> directKafkaStream = KafkaUtils.createDirectStream(ssc,
        String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topics);
        directKafkaStream.foreachRDD(rdd -> {
        System.out.println("--- New RDD with " + rdd.partitions().size()
            + " partitions and " + rdd.count() + " records");
        rdd.foreach(record -> System.out.println(record._2));
        });


        ssc.start();
        ssc.awaitTermination();
    }
}

And i am using following versions:我正在使用以下版本:

**Zookeeper 3.4.6 **动物园管理员 3.4.6

Scala 2.11斯卡拉 2.11

Spark 2.0火花2.0

Kafka 0.8.2**卡夫卡 0.8.2**

You can like this:你可以喜欢这个:

directKafkaStream.foreachRDD(rdd ->{            
            rdd.foreachPartition(item ->{
                while (item.hasNext()) {    
                    System.out.println(">>>>>>>>>>>>>>>>>>>>>>>>>>>"+item.next());
}
}
});

itme.next() contains key value pairs. itme.next() 包含键值对。 and you can get values by using item.next()._2您可以使用 item.next()._2 获取值

Your spark streaming application looks ok.您的 Spark 流应用程序看起来不错。 I tested it and it is printing kafka messages.我测试了它,它正在打印 kafka 消息。 You can also try below "Message Received" print statement to verify the kafka messages.您也可以尝试使用下面的“Message Received”打印语句来验证 kafka 消息。

    directKafkaStream.foreachRDD(rdd -> {
    System.out.println("Message Received "+rdd.values().take(5));
    System.out.println("--- New RDD with " + rdd.partitions().size()
        + " partitions and " + rdd.count() + " records");
    rdd.foreach(record -> System.out.println(record._2));
    });

If you are using Zookeeper then set that as well to kafka param如果您使用的是 Zookeeper,那么也将其设置为 kafka 参数

kafkaParams.put("zookeeper.connect","localhost:2181");

Following import statements I am not seeing in your program so adding here.在导入语句之后,我在您的程序中没有看到,所以在此处添加。

import org.apache.spark.streaming.kafka.KafkaUtils;
import kafka.serializer.StringDecoder;

Please also verify if you can consume messages on topic "mytopic" using command line kafka-console-consumer.还请验证您是否可以使用命令行 kafka-console-consumer 使用关于“mytopic”主题的消息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM