繁体   English   中英

如何在火花流中映射kafka主题名称和相应记录

[英]How to map kafka topic names and respective records in spark streaming

我正在按照下面的kafka主题流式传输;

JavaPairInputDStream<String, String> directKafkaStream = 
    KafkaUtils.createDirectStream(jssc,
                                  String.class, 
                                  String.class,
                                  StringDecoder.class,
                                  StringDecoder.class,
                                  kafkaParams, 
                                  topicSet);

directKafkaStream.print();   

对于一个主题,输出如下所示:

(null,"04/15/2015","18:44:14")
(null,"04/15/2015","18:44:15")
(null,"04/15/2015","18:44:16")
(null,"04/15/2015","18:44:17")  

如何映射主题名称和记录。
例如:主题是“callData”,它应该像下面那样,等等

(callData,"04/15/2015","18:44:14")
(callData,"04/15/2015","18:44:15")
(callData,"04/15/2015","18:44:16")
(callData,"04/15/2015","18:44:17")  

如何映射主题名称和记录?

为了提取分区信息, 您需要使用接受Function的重载,该Function接收MessageAndMetadata<K, V>并返回您想要转换的类型。

它看起来像这样:

Map<TopicAndPartition, Long> map = new HashMap<>();
map.put(new TopicAndPartition("topicname", 0), 1L);

JavaInputDStream<Map.Entry> stream = KafkaUtils.createDirectStream(
        javaContext,
        String.class,
        String.class,
        StringDecoder.class,
        StringDecoder.class,
        Map.Entry.class, // <--- This is the record return type from the transformation.
        kafkaParams,
        map,
        messageAndMetadata -> 
            new AbstractMap.SimpleEntry<>(messageAndMetadata.topic(),
                                          messageAndMetadata.message()));

注意我使用Map.Entry作为Java替换Scala中的Tuple2 您可以提供自己的类,它也具有PartitionMessage属性,并使用它进行转换。 请注意,kafka输入流的类型现在是JavaInputDStream<Map.Entry> ,因为转换正在返回。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM