简体   繁体   English

无法使用列属性为 pyspark dataframe 中的列分配新值

[英]Unable to assign new value to a column in pyspark dataframe using column attribute

I have a pyspark dataframe event1 .我有一个 pyspark dataframe event1 It has many columns and one of them is eventAction having categorical values like 'conversion', 'check-out', etc.它有很多列,其中之一是eventAction具有分类值,如“转换”、“签出”等。

I wanted to convert this column in a way that 'conversion' becomes 1 and other categories become 0 in eventAction column.我想以一种方式转换此列,使eventAction列中的“转换”变为 1,其他类别变为 0。

This is what I tried:这是我尝试过的:

event1.eventAction = event1.select(F.when(F.col('eventAction') == 'conversion', 1).otherwise(0))
event1.show()

But I don't see any change in eventAction column when .show() is executed.但是当执行.show()时,我没有看到eventAction列有任何变化。

Spark dataframes are immutable, so you cannot change the column directly using the . Spark 数据帧是不可变的,因此您不能直接使用. notation.符号。 You need to create a new dataframe that replaces the existing column using withColumn .您需要使用withColumn创建一个新的 dataframe 替换现有列。

import pyspark.sql.functions as F

event1 = event1.withColumn(
    'eventAction', 
    F.when(F.col('eventAction') == 'conversion', 1).otherwise(0)
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM