[英]Column is not iterable - apache spark dataframe - python
I have a column int_rate
of type string in my spark dataframe and all its value are like 9.5%, 7.0%, etc
. 我的spark数据
int_rate
有一个类型为string的int_rate
列,其所有值都像9.5%, 7.0%, etc
。
Here is an image of how the column looks 这是该列外观的图像
Now I know that there is a way in which I can convert type string to float in python, but it is only applicable when the value would have be 9.5
without %
symbol. 现在,我知道有一种方法可以将类型字符串转换为float在python中,但是仅适用于值不为
%
符号为9.5
情况。 I tried the following method: 我尝试了以下方法:
df.int_rate = [x.strip('%') for x in df.int_rate]
given on this link to remove the %
symbol, but it throws an error saying: 在此链接上给出以删除
%
符号,但是会引发错误消息:
Column is not iterable
列不可迭代
I also tried the other methods listed on the link , but nothing seems to work. 我也尝试了链接上列出的其他方法,但似乎无济于事。 Can someone please help me to get rid of the
%
symbol and convert my column to type float? 有人可以帮我摆脱
%
符号,然后将我的列转换为float类型吗?
One possible solution: 一种可能的解决方案:
from pyspark.sql.functions import expr
df = spark.createDataFrame(["9.5%", "7.0%"], "string").toDF("int_rate")
df.withColumn("int_rate", expr("rtrim('%', int_rate)").cast("float")).show()
and another 还有另一个
from pyspark.sql.functions import regexp_replace
df.withColumn("int_rate", regexp_replace("int_rate", "%$", "").cast("float"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.