将三个分离的rdd [org.apache.spark.mllib.linalg.Vector]火花化为单个rdd [Vector]

Question

i have three separated rdd[mllib....vectors] and i need to combine them as a one rdd[mllib vector]. 我有三个分开的rdd [mllib .... vectors]，我需要将它们合并为一个rdd [mllib vector]。

val vvv = my_ds.map(x=>(scaler.transform(Vectors.dense(x(0))),Vectors.dense((x(1)/bv_max_2).toArray),Vectors.dense((x(2)/bv_max_1).toArray)))

more info: scaler => StandardScaler bv_max_... is nothing but the DenseVector from breeze lib in case for normalizing (x/max(x)) 更多信息：缩放器=> StandardScaler bv_max _...只是来自微风lib的DenseVector，以进行标准化（x / max（x））

now i need to make them all as one i get ([1.],[2.],[3.]) and [[1.],[2.],[3.]] but i need [1.,2.,3.] as one vector 现在我需要将它们全部都设为[[1。]，[2。]，[3。]）和[[1。]，[2。]，[3。]]，但我需要[1。，2.，3。]作为一个向量

Answer 1

finally i found ... i dont know if this is the best. 终于我找到了……我不知道这是不是最好的。

i had 3d data set and i needed to perform x/max(x) normalization on two dimensions and apply standardScaler to another dimension. 我有3D数据集，我需要在两个维度上执行x / max（x）归一化并将standardScaler应用于另一个维度。 my problem was that in the end i had 3 separated Vectors like: eg [ [1.0],[4,0],[5.0] ] [ [2.0], [5.0], [6.0]]................but i needed [1.0,4.0,5.0] which can be passed to KMeans. 我的问题是最后我有3个独立的向量，例如：[[1.0]，[4,0]，[5.0]] [[2.0]，[5.0]，[6.0]] ....... .........但我需要[1.0,4.0,5.0]可以传递给KMeans。 i changed the above code as : 我将上面的代码更改为：

val vvv = dsx.map(x=>scaler.transform(Vectors.dense(x.days_d)).toArray ++ (x.freq_d/bv_max_freq).toArray ++ (x.food_d/bv_max_food).toArray).map(x=>Vectors.dense(x(0),x(1),x(2)))

将三个分离的rdd [org.apache.spark.mllib.linalg.Vector]火花化为单个rdd [Vector]

问题描述

1 个解决方案

解决方案1
0 2016-12-07 18:54:17

将三个分离的rdd [org.apache.spark.mllib.linalg.Vector]火花化为单个rdd [Vector]

问题描述

1 个解决方案

解决方案1 0 2016-12-07 18:54:17

解决方案1
0 2016-12-07 18:54:17