简体   繁体   English

如何在 pyspark dataframe 过滤器中给出多个条件?

[英]How to give multiple conditions in pyspark dataframe filter?

I have to apply a filter with multiple conditions using OR on a pyspark dataframe.我必须在 pyspark dataframe 上使用 OR 应用具有多个条件的过滤器。

I am trying to create a separate dataframe. Date value must be less than max_date or Date must be None.我正在尝试创建一个单独的 dataframe。日期值必须小于 max_date 或日期必须为无。

How to do it?怎么做?

I tried below 3 options but they all failed.我尝试了以下 3 个选项,但都失败了。

df.filter(df['Date'] < max_date or df['Date'] == None).createOrReplaceTempView("Final_dataset")

final_df = df.filter(df['Date'] != max_date | df['Date'] is None)

final_df = df.filter(df['Date'] != max_date or df['Date'] is None)
final_df = df.filter((df.Date < max_date) | (df.Date.isNull()))

Regular logical python operators don't work in Pyspark conditions;常规逻辑 python 运算符在 Pyspark 条件下不起作用; you need to use bitwise operators.您需要使用按位运算符。 They can also be a bit tricky so you might need extra parenthesis to disambiguate the expression.它们也可能有点棘手,因此您可能需要额外的括号来消除表达式的歧义。

Have a look here: Boolean operators vs Bitwise operators在这里看看: Boolean 运算符与按位运算符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM