[英]Can't access the data in Kafka Spark Streaming globally
I am trying to Streaming the data from Kafka to Spark 我正在尝试将数据从Kafka流式传输到Spark
JavaPairInputDStream<String, String> directKafkaStream = KafkaUtils.createDirectStream(ssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams, topics);
Here i am iterating over the JavaPairInputDStream to process the RDD's. 在这里,我正在遍历JavaPairInputDStream以处理RDD。
directKafkaStream.foreachRDD(rdd ->{
rdd.foreachPartition(items ->{
while (items.hasNext()) {
String[] State = items.next()._2.split("\\,");
System.out.println(State[2]+","+State[3]+","+State[4]+"--");
};
});
});
I can able to fetch the data in foreachRDD and my requirement is have to access State Array globally. 我可以在foreachRDD中获取数据,而我的要求是必须全局访问状态数组。 When i am trying to access the State Array globally i am getting Exception
当我尝试全局访问状态数组时,出现异常
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
Any suggestions ? 有什么建议么 ? Thanks.
谢谢。
This is more of a joining your lookup table with streaming RDD to get all the items that have a matching 'code' and 'violationCode' fields. 这更多的是将您的查询表与流RDD结合在一起,以获取具有匹配的“ code”和“ violationCode”字段的所有项目。
The flow should be like this. 流程应该是这样的。
Note Below code is incomplete. 注意下面的代码不完整。 Please complete all the TODO comments.
请完成所有待办事项注释。
JavaPairDStream<String, String> streamPair = directKafkaStream.mapToPair(new PairFunction<Tuple2<String, String>, String, String>() {
@Override
public Tuple2<String, String> call(Tuple2<String, String> tuple2) throws Exception {
System.out.println("Tuple2 Message is----------" + tuple2._2());
String[] state = tuple2._2.split("\\,");
return new Tuple2<>(state[4], tuple2._2()); //pair <ViolationCode, data>
}
});
streamPair.foreachRDD(new Function<JavaPairRDD<String, String>, Void>() {
JavaPairRDD<String, String> hivePairRdd = null;
@Override
public Void call(JavaPairRDD<String, String> stringStringJavaPairRDD) throws Exception {
if (hivePairRdd == null) {
hivePairRdd = initHiveRdd();
}
JavaPairRDD<String, Tuple2<String, String>> joinedRdd = stringStringJavaPairRDD.join(hivePairRdd);
System.out.println(joinedRdd.take(10));
//todo process joinedRdd here and save the results.
joinedRdd.count(); //to trigger an action
return null;
}
});
}
public static JavaPairRDD<String, String> initHiveRdd() {
JavaRDD<String> hiveTableRDD = null; //todo code to create RDD from hive table
JavaPairRDD<String, String> hivePairRdd = hiveTableRDD.mapToPair(new PairFunction<String, String, String>() {
@Override
public Tuple2<String, String> call(String row) throws Exception {
String code = null; //TODO process 'row' and get 'code' field
return new Tuple2<>(code, row);
}
});
return hivePairRdd;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.