Calculations on Spark RDD without using Iterations

Question

I'm trying to implement MAP (Mean Average Precision), and so far everything works, However I reached the stage where I need to make the calculations on the RDD. (without using iterations, rdd.collect() isn't an option)

here's the final generated RDD (Actual and predicted ratings along with index) that on it I would like to do the calculations :

JavaPairRDD<Tuple2<Double, Double>, Long> actualAndPredictedSorted = actual.join(predictions).mapToPair(
                new PairFunction<Tuple2<Tuple2<Integer,Integer>,Tuple2<Double,Double>>, Double, Double>() {
                    public Tuple2<Double,Double> call(Tuple2<Tuple2<Integer,Integer>,Tuple2<Double,Double>> t) {
                        return new Tuple2 < Double, Double > (t._2._2, t._2._1);
                    }
        }).sortByKey(false).zipWithIndex();

As well below you can find an image explaining how the calculation is done. for example an entry will get calculated(green considered as a hit) if user's actual rating in the rdd is above 3/5

I hope I explained myself!

Answer 1

You need filtering, not iterating.

It can be achieved by

Filtering ( Keeping ratings only which meet the conditions).
Adding all of them
Dividing by number of entries.

Calculations on Spark RDD without using Iterations

Question

1 answers

solution1
0 2016-06-20 18:22:59

Calculations on Spark RDD without using Iterations

Question

1 answers

solution1 0 2016-06-20 18:22:59

solution1
0 2016-06-20 18:22:59