This line of code is not working the way I thought it would:
val df2 = df1
.withColumn("email_age", when('age_of_email <= 60, 1))
.withColumn("email_age", when('age_of_email <= 120, 2))
.withColumn("email_age", when('age_of_email <= 180, 3).otherwise(4))
I have thousands of lines in df1 with age_of_email
that are less than 60 and/or less than 120, but all my lines are getting categorized as 3 or 4 :
Any insight into why this is happening?
As people have said in the comments, using withColumn
with a column name that is already in the dataframe will replace that column.
I think for what you want to achieve you might either use different column names for each categorization or simply concatenate the when()
in a single column like
val df2 = df1.withColumn("email_age", when('age_of_email <= 60, 1)
.when('age_of_email <= 120, 2)
.when('age_of_email <= 180, 3)
.otherwise(4))
I guess you're aware that the categories are subsets of category 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.