简体   繁体   中英

Calculations on Spark RDD without using Iterations

I'm trying to implement MAP (Mean Average Precision), and so far everything works, However I reached the stage where I need to make the calculations on the RDD. (without using iterations, rdd.collect() isn't an option)

here's the final generated RDD (Actual and predicted ratings along with index) that on it I would like to do the calculations :

JavaPairRDD<Tuple2<Double, Double>, Long> actualAndPredictedSorted = actual.join(predictions).mapToPair(
                new PairFunction<Tuple2<Tuple2<Integer,Integer>,Tuple2<Double,Double>>, Double, Double>() {
                    public Tuple2<Double,Double> call(Tuple2<Tuple2<Integer,Integer>,Tuple2<Double,Double>> t) {
                        return new Tuple2 < Double, Double > (t._2._2, t._2._1);
                    }
        }).sortByKey(false).zipWithIndex();

As well below you can find an image explaining how the calculation is done. for example an entry will get calculated(green considered as a hit) if user's actual rating in the rdd is above 3/5

在此处输入图片说明

I hope I explained myself!

You need filtering, not iterating.

It can be achieved by

  1. Filtering ( Keeping ratings only which meet the conditions).
  2. Adding all of them
  3. Dividing by number of entries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM