PySpark：将RDD [DenseVector]转换为数据框

Question

I have the following RDD: 我有以下RDD：

rdd.take(5) gives me: rdd.take（5）给我：

[DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
 DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
 DenseVector([5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0, 4.0, 9.0]),
 DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
 DenseVector([9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699])]

I would like to make it a data frame which should look like: 我想使它成为一个数据框架，看起来像：

-------------------------------------------------------------------
| features                                                        |
-------------------------------------------------------------------
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------| 
| [5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0, 4.0, 9.0]             |
|-----------------------------------------------------------------|
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|

Is this possible? 这可能吗？ I tried to use df_new = sqlContext.createDataFrame(rdd,['features']) , but it didn't work. 我尝试使用df_new = sqlContext.createDataFrame(rdd,['features']) ，但是没有用。 Does anyone have any suggestion? 有人有什么建议吗？ Thanks! 谢谢！

Answer 1

Map to tuples first: 首先映射到tuples ：

rdd.map(lambda x: (x, )).toDF(["features"])

Just keep in mind that as of Spark 2.0 there are two different Vector implementation an ml algorithms require pyspark.ml.Vector . 请记住，从Spark 2.0开始， ml算法需要pyspark.ml.Vector实现两种不同的Vector实现。

PySpark：将RDD [DenseVector]转换为数据框

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-09-17 14:48:37

PySpark：将RDD [DenseVector]转换为数据框

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-09-17 14:48:37

解决方案1
4 已采纳 2016-09-17 14:48:37