如何在Spark DataFrame的列值的开头和结尾处消除引号？

Question

I exported dataframe into csv format. 我将数据框导出为csv格式。 Some of its columns datatype changed from vector to string. 它的某些列数据类型从矢量更改为字符串。 The column values changed from [0.350562388776,0.203056015074,-0.313145598397] to '[0.350562388776,0.203056015074,-0.313145598397]' 列值从[0.350562388776,0.203056015074，-0.313145598397]更改为'[0.350562388776,0.203056015074，-0.313145598397]'

I tried to convert it as a vector for which i used - 我试图将其转换为我曾经使用过的向量-

from pyspark.ml.linalg import Vectors, VectorUDT
from pyspark.sql.functions import udf
list_to_vector_udf = udf(lambda l: Vectors.dense(l), VectorUDT())
vectors = df.select(
list_to_vector_udf(df["result1"]).alias("res1"),
list_to_vector_udf(df["result2"]).alias("res2")
)

The column's datatype has changed from string to vector but as i applied vectorassembler it is giving an error ValueError: could not convert string to float: [0.389866781754-0.180391363533-0.212950805169] . 列的数据类型已从字符串更改为向量，但是当我应用vectorassembler时，它给出了错误ValueError：无法将字符串转换为float：[0.389866781754-0.180391363533-0.212950805169] 。 I searched for its solutions, got the solutions for this errors but nothing worked for me. 我搜索了它的解决方案，得到了针对此错误的解决方案，但对我没有任何帮助。

Answer 1

This is generally is not a good approach, nevertheless try to just eval the result (result is kind of trusted, right?) 通常这不是一个好方法，但是请尝试评估结果（结果是值得信赖的，对吗？）

>>> a = eval('[1,2,3'])
>>> print(a)
[1,2,3]

Be aware though, that you are probably using this lib in a wrong way. 但是请注意，您可能以错误的方式使用了该库。

如何在Spark DataFrame的列值的开头和结尾处消除引号？

问题描述

1 个解决方案

解决方案1
0 2017-05-04 10:04:59

如何在Spark DataFrame的列值的开头和结尾处消除引号？

问题描述

1 个解决方案

解决方案1 0 2017-05-04 10:04:59

解决方案1
0 2017-05-04 10:04:59