简体   繁体   English

Spark流转换功能

[英]Spark streaming transform function

I am having compilation errors in the transform function for spark streaming. 我在火花流的transform功能中遇到编译错误。 Specifically seem to be missing finalizing the DStream variable or something similar. 具体而言,似乎似乎没有敲定DStream变量或类似内容。 I have copied from the amplab tutorials so slightly confused... 我从amplab教程复制而来,所以有点困惑...

Here is the code, the problem is in the transform function towards the end. 这是代码,问题出在最后的transform函数中。

Here is the error: 这是错误:

[ERROR] /home/nipun/ngla-stable/online/src/main/java/org/necla/ngla/spark_streaming/Type4ViolationChecker.java:[120,63] error:
 no suitable method found for transform(<anonymous Function<JavaPairRDD<Long,Integer>,JavaPairRDD<Long,Integer>>>)
[INFO] 1 error

Code: 码:

public class Type4ViolationChecker {

    private static final Pattern NEWSPACE = Pattern.compile("\n");

    public static Long generateTSKey(String line) throws ParseException{

        JSONObject obj = new JSONObject(line);
        String time = obj.getString("mts");
        DateFormat formatter = new SimpleDateFormat("yyyy / MM / dd HH : mm : ss");
        Date date = (Date)formatter.parse(time);

        long since = date.getTime();
        long key = (long)(since/10000) * 10000;

        return key;
    }

    public static void main(String[] args) {

        Type4ViolationChecker obj = new Type4ViolationChecker();

        SparkConf sparkConf = new SparkConf().setAppName("Type4ViolationChecker");
        final JavaStreamingContext ssc = new JavaStreamingContext(sparkConf, new Duration(10000));

        JavaReceiverInputDStream<String> lines = ssc.socketTextStream(args[0], Integer.parseInt(args[1]), StorageLevels.MEMORY_AND_DISK_SER);

        JavaDStream<String> words = lines.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public Iterable<String> call(String x) {
                return Lists.newArrayList(NEWSPACE.split(x));
            }
        });

        words.persist();

        JavaDStream<String> matched = words.filter(new Function<String, Boolean>() {
            public Boolean call(String line) {
                return line.contains("pattern");
            }});

        JavaPairDStream<Long, Integer> keyValStream = matched.mapToPair(
                new PairFunction<String, Long, Integer>(){

                    /**
                     * Here we are converting the string to a key value tuple
                     * Key -> time bucket calculated using the 1970 GMT date as anchor, and dividing by the polling interval
                     * Value -> is the original message
                     */
                    @Override
                    public Tuple2<Long, Integer> call(String arg0)
                            throws Exception {
                        // TODO Auto-generated method stub
                        return new Tuple2<Long,Integer>(generateTSKey(arg0),1);
                    }

                });

        JavaPairDStream<Long, Integer> tsStream = keyValStream.reduceByKey(
                new Function2<Integer,Integer,Integer>(){
                    public Integer call(Integer i1, Integer i2){
                        return i1+ i2;
                    }});

        JavaPairDStream<Long,Integer> sortedtsStream = tsStream.transform(
                new Function<JavaPairRDD<Long, Integer>, JavaPairRDD<Long,Integer>>() {

                    @Override
                    public JavaPairRDD<Long, Integer> call(JavaPairRDD<Long, Integer> longIntegerJavaPairRDD) throws Exception {
                        return longIntegerJavaPairRDD.sortByKey(false);
                    }
                });

        //sortedtsStream.print();

        ssc.start();
        ssc.awaitTermination();

    }
}

Thanks to @GaborBakos for providing the answer... The following seems to work! 感谢@GaborBakos提供的答案...以下似乎有效! Had to use transformtoPair, instead of transform 必须使用transformtoPair,而不是transform

    JavaPairDStream<Long,Integer> sortedtsStream = tsStream.transformToPair(
            new Function<JavaPairRDD<Long, Integer>, JavaPairRDD<Long,Integer>>() {
                @Override
                public JavaPairRDD<Long, Integer> call(JavaPairRDD<Long, Integer> longIntegerJavaPairRDD) throws Exception {
                    return longIntegerJavaPairRDD.sortByKey(true);
                }
            });

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM