简体   繁体   English

RDD数据上的迭代Apache Spark

[英]Iteration on RDD data Apache Spark

i have a data looks like below consists of latitude and longitude values 我有一个数据看起来像下面的由纬度和经度值组成

45.25,23.45
22.15,19.35
33.24,12.45
15.67,21.22

I need to construct a matrix based on Euclidean distance between two points 我需要基于两点之间的欧几里得距离构造一个矩阵 在此处输入图片说明

as there are 4 points, we will get and 4x4 matrix 因为有4点,我们将得到4x4矩阵

p1p1 p1p2 p1p3 p1p4
p2p1 p2p2 ........
..................
..........p4p3 p4p4

Now the question is how can we perform iterations in Apache Spark java ( as below code which is implemented in java ) 现在的问题是,我们如何在Apache Spark Java中执行迭代(如下面在Java中实现的代码)

int nrows = latit.size();
int ncols = longit.size();
double[][] w = new double[nrows][ncols];
for(int i=0;i<nrows;i++) {
       for(int j=0;j<ncols;j++) {
                temp1 = latit.get(i) - latit.get(j);
                temp2 = longit.get(i) - longit.get(j);
                temp3 = Math.pow(temp1, 2) + Math.pow(temp2, 2);
               w[i][j] = Math.sqrt(temp3);
           }
       }

Please suggest the suitable way to store data into RDD and performing iterations in Java API 请提出将数据存储到RDD并在Java API中执行迭代的合适方法

In Spark you would translate this into a set of transformations and actions. 在Spark中,您可以将其转换为一组转换和操作。 Given pointsRDD containing the location data, then you can obtain the euclidean distance as: 给定包含位置数据的pointsRDD ,则可以得出欧式距离:

points.cartesian(points).map{case ((x1, y1),(x2,y2)) => math.sqrt((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1))}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM