简体   繁体   中英

How to give multiple conditions in pyspark dataframe filter?

I have to apply a filter with multiple conditions using OR on a pyspark dataframe.

I am trying to create a separate dataframe. Date value must be less than max_date or Date must be None.

How to do it?

I tried below 3 options but they all failed.

df.filter(df['Date'] < max_date or df['Date'] == None).createOrReplaceTempView("Final_dataset")

final_df = df.filter(df['Date'] != max_date | df['Date'] is None)

final_df = df.filter(df['Date'] != max_date or df['Date'] is None)
final_df = df.filter((df.Date < max_date) | (df.Date.isNull()))

Regular logical python operators don't work in Pyspark conditions; you need to use bitwise operators. They can also be a bit tricky so you might need extra parenthesis to disambiguate the expression.

Have a look here: Boolean operators vs Bitwise operators

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM