简体   繁体   English

将三个分离的rdd [org.apache.spark.mllib.linalg.Vector]火花化为单个rdd [Vector]

[英]spark(scala) three separated rdd[org.apache.spark.mllib.linalg.Vector] to a single rdd[Vector]

i have three separated rdd[mllib....vectors] and i need to combine them as a one rdd[mllib vector]. 我有三个分开的rdd [mllib .... vectors],我需要将它们合并为一个rdd [mllib vector]。

val vvv = my_ds.map(x=>(scaler.transform(Vectors.dense(x(0))),Vectors.dense((x(1)/bv_max_2).toArray),Vectors.dense((x(2)/bv_max_1).toArray)))

more info: scaler => StandardScaler bv_max_... is nothing but the DenseVector from breeze lib in case for normalizing (x/max(x)) 更多信息:缩放器=> StandardScaler bv_max _...只是来自微风lib的DenseVector,以进行标准化(x / max(x))

now i need to make them all as one i get ([1.],[2.],[3.]) and [[1.],[2.],[3.]] but i need [1.,2.,3.] as one vector 现在我需要将它们全部都设为[[1。],[2。],[3。])和[[1。],[2。],[3。]],但我需要[1。 ,2.,3。]作为一个向量

finally i found ... i dont know if this is the best. 终于我找到了……我不知道这是不是最好的。

i had 3d data set and i needed to perform x/max(x) normalization on two dimensions and apply standardScaler to another dimension. 我有3D数据集,我需要在两个维度上执行x / max(x)归一化并将standardScaler应用于另一个维度。 my problem was that in the end i had 3 separated Vectors like: eg [ [1.0],[4,0],[5.0] ] [ [2.0], [5.0], [6.0]]................but i needed [1.0,4.0,5.0] which can be passed to KMeans. 我的问题是最后我有3个独立的向量,例如:[[1.0],[4,0],[5.0]] [[2.0],[5.0],[6.0]] ....... .........但我需要[1.0,4.0,5.0]可以传递给KMeans。 i changed the above code as : 我将上面的代码更改为:

val vvv = dsx.map(x=>scaler.transform(Vectors.dense(x.days_d)).toArray ++ (x.freq_d/bv_max_freq).toArray ++ (x.food_d/bv_max_food).toArray).map(x=>Vectors.dense(x(0),x(1),x(2)))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Scala将org.apache.spark.mllib.linalg.Vector RDD转换为Spark中的DataFrame - Convert an org.apache.spark.mllib.linalg.Vector RDD to a DataFrame in Spark using Scala 将RDD [org.apache.spark.sql.Row]转换为RDD [org.apache.spark.mllib.linalg.Vector] - Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector] 将Spark数据帧转换为org.apache.spark.rdd.RDD [org.apache.spark.mllib.linalg.Vector] - Convert Spark Data Frame to org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] 如何有效计算Spark中RDD [org.apache.spark.mllib.linalg.Vector]的中位数? - How to calculate median over RDD[org.apache.spark.mllib.linalg.Vector] in Spark efficiently? 如何将 RDD[org.apache.spark.sql.Row] 转换为 RDD[org.apache.spark.mllib.linalg.Vector] - How to convert RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector] org.apache.spark.mllib.linalg.Vector到DataFrame标量 - org.apache.spark.mllib.linalg.Vector to DataFrame scala 如何在RDD“ org.apache.spark.rdd.RDD [(Long,org.apache.spark.mllib.linalg.Vector)]的每一行上应用” Sum(vi * ln(vi))” - How to apply “Sum(vi * ln(vi))” on each row of an RDD “org.apache.spark.rdd.RDD[(Long, org.apache.spark.mllib.linalg.Vector)]” 无法在 Spark 2.0 中的数据集 [(scala.Long, org.apache.spark.mllib.linalg.Vector)] 上运行 LDA - Can't run LDA on Dataset[(scala.Long, org.apache.spark.mllib.linalg.Vector)] in Spark 2.0 Spark ClassCastException:无法将 JavaRDD 转换为 org.apache.spark.mllib.linalg.Vector - Spark ClassCastException: JavaRDD cannot be cast to org.apache.spark.mllib.linalg.Vector 使用Apache Spark中的Scala - MLLib转换LabeledPoint中的Vector的RDD - Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM