I am new to PySaprk but have some experience with R.
Question: I wanted to assign a name to the height (numbers) listed in ONE column. I started writing code as below:
w = Window.partitionBy("student_id")
df_enc_hw = df_enc_hw.withColumn("stuname", \
when(lower(col("height")) <= 4, "under_ht")
.when(lower(col("height")) > 4 < 5, "ok_ht")
.when(lower(col("height")) >=5 < 6, "normal_ht")
.when(lower(col("height")) >=6, "abnor_ht"))
But the following error:
633
634 def __nonzero__(self):
--> 635 raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
636 "'~' for 'not' when building DataFrame boolean expressions.")
637 __bool__ = __nonzero__
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.
Thank you for your help K
You should split up your conditionals into separate expressions like this:
df_enc_hw = df_enc_hw.withColumn("stuname", \
when(lower(col("height")) <= 4, "under_ht")
.when((lower(col("height")) > 4) & (lower(col("height")) < 5), "ok_ht")
.when((lower(col("height")) >=5) & (lower(col("height")) < 6), "normal_ht")
.when(lower(col("height")) >=6, "abnor_ht"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.