简体   繁体   English

来自Kafka的Spark流中的Null值

[英]Null value in spark streaming from Kafka

I have a simple program because I'm trying to receive data using kafka . 我有一个简单的程序,因为我正尝试使用kafka接收数据。 When I start a kafka producer and I send data, for example: "Hello", I get this when I print the message: (null, Hello) . 当我启动一个kafka生产者并发送数据时,例如:“ Hello”,我在打印消息时得到此消息: (null, Hello) And I don't know why this null appears. 而且我不知道为什么出现此null。 Is there any way to avoid this null? 有什么办法可以避免这个空值? I think it's due to Tuple2<String, String> , the first parameter, but I only want to print the second parameter. 我认为这是由于Tuple2<String, String>的第一个参数引起的,但是我只想打印第二个参数。 And another thing, when I print that using System.out.println("inside map "+ message); 还有另一件事,当我使用System.out.println("inside map "+ message);打印时System.out.println("inside map "+ message); it does not appear any message, does someone know why? 它没有出现任何消​​息,有人知道为什么吗? Thanks. 谢谢。

public static void main(String[] args){

    SparkConf sparkConf = new SparkConf().setAppName("org.kakfa.spark.ConsumerData").setMaster("local[4]");
    // Substitute 127.0.0.1 with the actual address of your Spark Master (or use "local" to run in local mode
    sparkConf.set("spark.cassandra.connection.host", "127.0.0.1");
    // Create the context with 2 seconds batch size
    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));

    Map<String, Integer> topicMap = new HashMap<>();
    String[] topics = KafkaProperties.TOPIC.split(",");
    for (String topic: topics) {
        topicMap.put(topic, KafkaProperties.NUM_THREADS);
    }
    /* connection to cassandra */
    CassandraConnector connector = CassandraConnector.apply(sparkConf);
    System.out.println("+++++++++++ cassandra connector created ++++++++++++++++++++++++++++");

    /* Receive kafka inputs */
    JavaPairReceiverInputDStream<String, String> messages =
            KafkaUtils.createStream(jssc, KafkaProperties.ZOOKEEPER, KafkaProperties.GROUP_CONSUMER, topicMap);
    System.out.println("+++++++++++++ streaming-kafka connection done +++++++++++++++++++++++++++");

    JavaDStream<String> lines = messages.map(
            new Function<Tuple2<String, String>, String>() {
                public String call(Tuple2<String, String> message) {
                    System.out.println("inside map "+ message);
                    return message._2();
                }
            }
    );

    messages.print();
    jssc.start();
    jssc.awaitTermination();
}

Q1) Null values: Messages in Kafka are Keyed, that means they all have a (Key, Value) structure. Q1)空值:Kafka中的消息是键控的,这意味着它们都具有(Key,Value)结构。 When you see (null, Hello) is because the producer published a (null,"Hello") value in a topic. 您看到(null, Hello)的原因是生产者在主题中发布了(null,"Hello")值。 If you want to omit the key in your process, map the original Dtream to remove the key: kafkaDStream.map( new Function<String,String>() {...}) 如果要在过程中省略键,请映射原始Dtream以删除键: kafkaDStream.map( new Function<String,String>() {...})

Q2) System.out.println("inside map "+ message); Q2) System.out.println("inside map "+ message); does not print. 不打印。 A couple of classical reasons: 几个经典原因:

  1. Transformations are applied in the executors, so when running in a cluster, that output will appear in the executors and not on the master. 转换是在执行程序中应用的,因此在集群中运行时,该输出将显示在执行程序中,而不是在主服务器上。

  2. Operations are lazy and DStreams need to be materialized for operations to be applied. 操作是惰性的,必须实现DStream才能应用操作。

In this specific case, the JavaDStream<String> lines is never materialized ie not used for an output operation. 在这种特定情况下, JavaDStream<String> lines永远不会实现,即不用于输出操作。 Therefore the map is never executed. 因此, map永远不会执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM