[英]How to get rid of quotes in start and end of column values of spark dataframe?
I exported dataframe into csv format. 我将数据框导出为csv格式。 Some of its columns datatype changed from vector to string.
它的某些列数据类型从矢量更改为字符串。 The column values changed from [0.350562388776,0.203056015074,-0.313145598397] to '[0.350562388776,0.203056015074,-0.313145598397]'
列值从[0.350562388776,0.203056015074,-0.313145598397]更改为'[0.350562388776,0.203056015074,-0.313145598397]'
I tried to convert it as a vector for which i used - 我试图将其转换为我曾经使用过的向量-
from pyspark.ml.linalg import Vectors, VectorUDT
from pyspark.sql.functions import udf
list_to_vector_udf = udf(lambda l: Vectors.dense(l), VectorUDT())
vectors = df.select(
list_to_vector_udf(df["result1"]).alias("res1"),
list_to_vector_udf(df["result2"]).alias("res2")
)
The column's datatype has changed from string to vector but as i applied vectorassembler it is giving an error ValueError: could not convert string to float: [0.389866781754-0.180391363533-0.212950805169] . 列的数据类型已从字符串更改为向量,但是当我应用vectorassembler时,它给出了错误ValueError:无法将字符串转换为float:[0.389866781754-0.180391363533-0.212950805169] 。 I searched for its solutions, got the solutions for this errors but nothing worked for me.
我搜索了它的解决方案,得到了针对此错误的解决方案,但对我没有任何帮助。
This is generally is not a good approach, nevertheless try to just eval the result (result is kind of trusted, right?) 通常这不是一个好方法,但是请尝试评估结果(结果是值得信赖的,对吗?)
>>> a = eval('[1,2,3'])
>>> print(a)
[1,2,3]
Be aware though, that you are probably using this lib in a wrong way. 但是请注意,您可能以错误的方式使用了该库。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.