I have to apply a filter with multiple conditions using OR on a pyspark dataframe.
I am trying to create a separate dataframe. Date value must be less than max_date or Date must be None.
How to do it?
I tried below 3 options but they all failed.
df.filter(df['Date'] < max_date or df['Date'] == None).createOrReplaceTempView("Final_dataset")
final_df = df.filter(df['Date'] != max_date | df['Date'] is None)
final_df = df.filter(df['Date'] != max_date or df['Date'] is None)
final_df = df.filter((df.Date < max_date) | (df.Date.isNull()))
Regular logical python operators don't work in Pyspark conditions; you need to use bitwise operators. They can also be a bit tricky so you might need extra parenthesis to disambiguate the expression.
Have a look here: Boolean operators vs Bitwise operators
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.