[英]How to get a JavaDStream of an Object in Spark Kafka Connector?
我正在使用Spark Kafka連接器從Kafka集群獲取數據。 從中,我將數據作為JavaDStream<String>
。 我如何以JavaDStream<EventLog>
的JavaDStream<EventLog>
獲取數據,其中EventLog
是Java Bean?
public static JavaDStream<EventLog> fetchAndValidateData(String zkQuorum, String group, Map<String, Integer> topicMap) {
SparkConf sparkConf = new SparkConf().setAppName("JavaKafkaWordCount");
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, new Duration(2000));
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jssc, zkQuorum, group, topicMap);
JavaDStream<String> lines = messages.map(new Function<Tuple2<String, String>, String>() {
@Override
public String call(Tuple2<String, String> tuple2) {
return tuple2._2();
}
});
jssc.start();
jssc.awaitTermination();
return lines;
}
我的目標是將這些數據保存到Cassandra中,該表的規格與EventLog
相同。 Spark Cassandra連接器在插入語句中接受JavaRDD<EventLog>
,如下所示: javaFunctions(rdd).writerBuilder("ks", "event", mapToRow(EventLog.class)).saveToCassandra();
。 我想從Kafka獲取這些JavaRDD<EventLog>
。
使用重載的createStream方法,可以在其中傳遞鍵/值類型和解碼器類。
例:
createStream(jssc, String.class, EventLog.class, StringDecoder.class, EventLogDecoder.class,
kafkaParams, topicsMap, StorageLevel.MEMORY_AND_DISK_SER_2());
上面應該給你JavaPairDStream<String, EventLog>
JavaDStream<EventLog> lines = messages.map(new Function<Tuple2<String, EventLog>, EventLog>() {
@Override
public EventLog call(Tuple2<String, EventLog> tuple2) {
return tuple2._2();
}
});
EventLogDecoder應該實現kafka.serializer.Decoder。 下面是json解碼器的示例。
public class EventLogDecoder implements Decoder<EventLog> {
public EventLogDecoder(VerifiableProperties verifiableProperties) {
}
@Override
public EventLog fromBytes(byte[] bytes) {
ObjectMapper objectMapper = new ObjectMapper();
try {
return objectMapper.readValue(bytes, EventLog.class);
} catch (IOException e) {
//do something
}
return null;
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.