[英]Spark streaming transform function
I am having compilation errors in the transform
function for spark streaming. 我在火花流的
transform
功能中遇到编译错误。 Specifically seem to be missing finalizing the DStream
variable or something similar. 具体而言,似乎似乎没有敲定
DStream
变量或类似内容。 I have copied from the amplab tutorials so slightly confused... 我从amplab教程复制而来,所以有点困惑...
Here is the code, the problem is in the transform
function towards the end. 这是代码,问题出在最后的
transform
函数中。
Here is the error: 这是错误:
[ERROR] /home/nipun/ngla-stable/online/src/main/java/org/necla/ngla/spark_streaming/Type4ViolationChecker.java:[120,63] error:
no suitable method found for transform(<anonymous Function<JavaPairRDD<Long,Integer>,JavaPairRDD<Long,Integer>>>)
[INFO] 1 error
Code: 码:
public class Type4ViolationChecker {
private static final Pattern NEWSPACE = Pattern.compile("\n");
public static Long generateTSKey(String line) throws ParseException{
JSONObject obj = new JSONObject(line);
String time = obj.getString("mts");
DateFormat formatter = new SimpleDateFormat("yyyy / MM / dd HH : mm : ss");
Date date = (Date)formatter.parse(time);
long since = date.getTime();
long key = (long)(since/10000) * 10000;
return key;
}
public static void main(String[] args) {
Type4ViolationChecker obj = new Type4ViolationChecker();
SparkConf sparkConf = new SparkConf().setAppName("Type4ViolationChecker");
final JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(10000));
JavaReceiverInputDStream<String> lines = ssc.socketTextStream(args[0], Integer.parseInt(args[1]), StorageLevels.MEMORY_AND_DISK_SER);
JavaDStream<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
@Override
public Iterable<String> call(String x) {
return Lists.newArrayList(NEWSPACE.split(x));
}
});
words.persist();
JavaDStream<String> matched = words.filter(new Function<String, Boolean>() {
public Boolean call(String line) {
return line.contains("pattern");
}});
JavaPairDStream<Long, Integer> keyValStream = matched.mapToPair(
new PairFunction<String, Long, Integer>(){
/**
* Here we are converting the string to a key value tuple
* Key -> time bucket calculated using the 1970 GMT date as anchor, and dividing by the polling interval
* Value -> is the original message
*/
@Override
public Tuple2<Long, Integer> call(String arg0)
throws Exception {
// TODO Auto-generated method stub
return new Tuple2<Long,Integer>(generateTSKey(arg0),1);
}
});
JavaPairDStream<Long, Integer> tsStream = keyValStream.reduceByKey(
new Function2<Integer,Integer,Integer>(){
public Integer call(Integer i1, Integer i2){
return i1+ i2;
}});
JavaPairDStream<Long,Integer> sortedtsStream = tsStream.transform(
new Function<JavaPairRDD<Long, Integer>, JavaPairRDD<Long,Integer>>() {
@Override
public JavaPairRDD<Long, Integer> call(JavaPairRDD<Long, Integer> longIntegerJavaPairRDD) throws Exception {
return longIntegerJavaPairRDD.sortByKey(false);
}
});
//sortedtsStream.print();
ssc.start();
ssc.awaitTermination();
}
}
Thanks to @GaborBakos for providing the answer... The following seems to work! 感谢@GaborBakos提供的答案...以下似乎有效! Had to use transformtoPair, instead of transform
必须使用transformtoPair,而不是transform
JavaPairDStream<Long,Integer> sortedtsStream = tsStream.transformToPair(
new Function<JavaPairRDD<Long, Integer>, JavaPairRDD<Long,Integer>>() {
@Override
public JavaPairRDD<Long, Integer> call(JavaPairRDD<Long, Integer> longIntegerJavaPairRDD) throws Exception {
return longIntegerJavaPairRDD.sortByKey(true);
}
});
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.