[英]Calculations on Spark RDD without using Iterations
I'm trying to implement MAP (Mean Average Precision), and so far everything works, However I reached the stage where I need to make the calculations on the RDD. 我正在尝试实现MAP(平均平均精度),到目前为止一切正常,但是我到达了需要在RDD上进行计算的阶段。 (without using iterations,
rdd.collect()
isn't an option) (不使用迭代,则不能选择
rdd.collect()
)
here's the final generated RDD (Actual and predicted ratings along with index) that on it I would like to do the calculations : 这是最终生成的RDD(实际和预期收视率以及指数),我要在其上进行计算:
JavaPairRDD<Tuple2<Double, Double>, Long> actualAndPredictedSorted = actual.join(predictions).mapToPair(
new PairFunction<Tuple2<Tuple2<Integer,Integer>,Tuple2<Double,Double>>, Double, Double>() {
public Tuple2<Double,Double> call(Tuple2<Tuple2<Integer,Integer>,Tuple2<Double,Double>> t) {
return new Tuple2 < Double, Double > (t._2._2, t._2._1);
}
}).sortByKey(false).zipWithIndex();
As well below you can find an image explaining how the calculation is done. 同样在下面,您可以找到一张图像,解释如何进行计算。 for example an entry will get calculated(green considered as a hit) if user's actual rating in the rdd is above 3/5
例如,如果用户在rdd中的实际评分高于3/5,则该条目将被计算(绿色视为命中)
I hope I explained myself! 我希望我自己解释一下!
You need filtering, not iterating. 您需要过滤,而不是迭代。
It can be achieved by 可以通过
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.