![](/img/trans.png)
[英]How to convert ArrayType to DenseVector in PySpark DataFrame?
[英]PySpark: convert RDD[DenseVector] to dataframe
我有以下RDD:
rdd.take(5)给我:
[DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
DenseVector([5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0, 4.0, 9.0]),
DenseVector([9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699]),
DenseVector([9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699])]
我想使它成为一个数据框架,看起来像:
-------------------------------------------------------------------
| features |
-------------------------------------------------------------------
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [5.0, 20.0, 0.3444, 0.3295, 54.3122, 4.0, 4.0, 9.0] |
|-----------------------------------------------------------------|
| [9.2463, 1.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
| [9.2463, 2.0, 0.392, 0.3381, 162.6437, 7.9432, 8.3397, 11.7699] |
|-----------------------------------------------------------------|
这可能吗? 我尝试使用df_new = sqlContext.createDataFrame(rdd,['features'])
,但是没有用。 有人有什么建议吗? 谢谢!
首先映射到tuples
:
rdd.map(lambda x: (x, )).toDF(["features"])
请记住,从Spark 2.0开始, ml
算法需要pyspark.ml.Vector
实现两种不同的Vector
实现。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.