[英]spark(scala) three separated rdd[org.apache.spark.mllib.linalg.Vector] to a single rdd[Vector]
i have three separated rdd[mllib....vectors] and i need to combine them as a one rdd[mllib vector]. 我有三个分开的rdd [mllib .... vectors],我需要将它们合并为一个rdd [mllib vector]。
val vvv = my_ds.map(x=>(scaler.transform(Vectors.dense(x(0))),Vectors.dense((x(1)/bv_max_2).toArray),Vectors.dense((x(2)/bv_max_1).toArray)))
more info: scaler => StandardScaler bv_max_... is nothing but the DenseVector from breeze lib in case for normalizing (x/max(x)) 更多信息:缩放器=> StandardScaler bv_max _...只是来自微风lib的DenseVector,以进行标准化(x / max(x))
now i need to make them all as one i get ([1.],[2.],[3.]) and [[1.],[2.],[3.]] but i need [1.,2.,3.] as one vector 现在我需要将它们全部都设为[[1。],[2。],[3。])和[[1。],[2。],[3。]],但我需要[1。 ,2.,3。]作为一个向量
finally i found ... i dont know if this is the best. 终于我找到了……我不知道这是不是最好的。
i had 3d data set and i needed to perform x/max(x) normalization on two dimensions and apply standardScaler to another dimension. 我有3D数据集,我需要在两个维度上执行x / max(x)归一化并将standardScaler应用于另一个维度。 my problem was that in the end i had 3 separated Vectors like: eg [ [1.0],[4,0],[5.0] ] [ [2.0], [5.0], [6.0]]................but i needed [1.0,4.0,5.0] which can be passed to KMeans.
我的问题是最后我有3个独立的向量,例如:[[1.0],[4,0],[5.0]] [[2.0],[5.0],[6.0]] ....... .........但我需要[1.0,4.0,5.0]可以传递给KMeans。 i changed the above code as :
我将上面的代码更改为:
val vvv = dsx.map(x=>scaler.transform(Vectors.dense(x.days_d)).toArray ++ (x.freq_d/bv_max_freq).toArray ++ (x.food_d/bv_max_food).toArray).map(x=>Vectors.dense(x(0),x(1),x(2)))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.