[英]Unable to assign new value to a column in pyspark dataframe using column attribute
I have a pyspark dataframe event1
.我有一个 pyspark dataframe event1
。 It has many columns and one of them is eventAction
having categorical values like 'conversion', 'check-out', etc.它有很多列,其中之一是eventAction
具有分类值,如“转换”、“签出”等。
I wanted to convert this column in a way that 'conversion' becomes 1 and other categories become 0 in eventAction
column.我想以一种方式转换此列,使eventAction
列中的“转换”变为 1,其他类别变为 0。
This is what I tried:这是我尝试过的:
event1.eventAction = event1.select(F.when(F.col('eventAction') == 'conversion', 1).otherwise(0))
event1.show()
But I don't see any change in eventAction
column when .show()
is executed.但是当执行.show()
时,我没有看到eventAction
列有任何变化。
Spark dataframes are immutable, so you cannot change the column directly using the .
Spark 数据帧是不可变的,因此您不能直接使用.
notation.符号。 You need to create a new dataframe that replaces the existing column using withColumn
.您需要使用withColumn
创建一个新的 dataframe 替换现有列。
import pyspark.sql.functions as F
event1 = event1.withColumn(
'eventAction',
F.when(F.col('eventAction') == 'conversion', 1).otherwise(0)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.