简体   繁体   中英

Error With Multiple withColumn in Apache Spark

This line of code is not working the way I thought it would:

val df2 = df1
  .withColumn("email_age", when('age_of_email <= 60, 1))
  .withColumn("email_age", when('age_of_email <= 120, 2))
  .withColumn("email_age", when('age_of_email <= 180, 3).otherwise(4))

I have thousands of lines in df1 with age_of_email that are less than 60 and/or less than 120, but all my lines are getting categorized as 3 or 4 :

Any insight into why this is happening?

As people have said in the comments, using withColumn with a column name that is already in the dataframe will replace that column.

I think for what you want to achieve you might either use different column names for each categorization or simply concatenate the when() in a single column like

val df2 = df1.withColumn("email_age", when('age_of_email <= 60, 1)
                                     .when('age_of_email <= 120, 2)
                                     .when('age_of_email <= 180, 3)
                                     .otherwise(4))

I guess you're aware that the categories are subsets of category 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM