简体   繁体   English

如何在Spark DataFrame的列值的开头和结尾处消除引号?

[英]How to get rid of quotes in start and end of column values of spark dataframe?

I exported dataframe into csv format. 我将数据框导出为csv格式。 Some of its columns datatype changed from vector to string. 它的某些列数据类型从矢量更改为字符串。 The column values changed from [0.350562388776,0.203056015074,-0.313145598397] to '[0.350562388776,0.203056015074,-0.313145598397]' 列值从[0.350562388776,0.203056015074,-0.313145598397]更改为'[0.350562388776,0.203056015074,-0.313145598397]'

I tried to convert it as a vector for which i used - 我试图将其转换为我曾经使用过的向量-

from pyspark.ml.linalg import Vectors, VectorUDT
from pyspark.sql.functions import udf
list_to_vector_udf = udf(lambda l: Vectors.dense(l), VectorUDT())
vectors = df.select(
list_to_vector_udf(df["result1"]).alias("res1"),
list_to_vector_udf(df["result2"]).alias("res2")
)

The column's datatype has changed from string to vector but as i applied vectorassembler it is giving an error ValueError: could not convert string to float: [0.389866781754-0.180391363533-0.212950805169] . 列的数据类型已从字符串更改为向量,但是当我应用vectorassembler时,它给出了错误ValueError:无法将字符串转换为float:[0.389866781754-0.180391363533-0.212950805169] I searched for its solutions, got the solutions for this errors but nothing worked for me. 我搜索了它的解决方案,得到了针对此错误的解决方案,但对我没有任何帮助。

This is generally is not a good approach, nevertheless try to just eval the result (result is kind of trusted, right?) 通常这不是一个好方法,但是请尝试评估结果(结果是值得信赖的,对吗?)

>>> a = eval('[1,2,3'])
>>> print(a)
[1,2,3]

Be aware though, that you are probably using this lib in a wrong way. 但是请注意,您可能以错误的方式使用了该库。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 按开始值和结束值对 pandas dataframe 列进行切片 - Slice pandas dataframe column by start and end values 如何在 Panda dataframe 中根据开始时间和结束时间将多列值连接成一列 - How to concatenate multiple column values into a single column in Panda dataframe based on start and end time 如何在 Pandas DataFrame 的列中标记一系列非空和非 0 值的开始/结束? - How to mark start/end of a series of non-null and non-0 values in a column of a Pandas DataFrame? 如何绘制x轴的熊猫数据框列,而x轴由另外两个提供起始值和终止值的列定义? - How to plot pandas dataframe column with x-axis defined by two other columns giving start and end values? Python dataframe 获取连续值的索引开始和结束 - Python dataframe get index start and end of successive values 如何摆脱 Pandas dataframe 中的索引列名称? - How to get rid of the index column name in a Pandas dataframe? "如何摆脱数据框列中的括号?(Python)" - How can I get rid of the parentheses in the dataframe column?(Python) 如何将数据框列转换为列表,并将列表中的值转换为双引号 - How to convert a dataframe column to list and values in the list to be enclosed with double quotes 如何摆脱 pandas Dataframe 中不符合阈值的值(然后绘制它) - How to get rid of values that doesnt meet threshold in pandas Dataframe (then plotting it) 如何用某些字符替换列的开头和结尾 python dataframe - How to replace the start and end of a column with certain characters python dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM