简体   繁体   English

如何过滤pyspark dataframe中的日期

[英]How to filter the dates in a pyspark dataframe

I have a pyspark dataframe:我有一个 pyspark dataframe:

Year    Month
2021    06/01/2021
2021    06/01/2021
2021    07/01/2021
2021    07/01/2021
2021    0/01/2021
2021    0/01/2021

I need a dataframe for specific months along with '0/01/2021'.我需要特定月份的 dataframe 以及“0/01/2021”。 Tried with below code:尝试使用以下代码:

df=df.filter((col('Month')=='07/01/2021') & (col('Month')=='0/01/2021'))
display(df)

My required dataframe is:我需要的 dataframe 是:

Year    Month
2021    07/01/2021
2021    07/01/2021
2021    0/01/2021
2021    0/01/2021

But I'm getting: Query returned no results as result.但我得到: Query returned no results The 'Month' column is in string format. “月份”列采用字符串格式。 How to filter for these dates?如何过滤这些日期?

That's normal.这很正常。 You are asking for each line that the value equal both 07/01/2021 AND ( & ) 0/01/2021.您要求每一行的值都等于 07/01/2021 AND ( & ) 0/01/2021。
What you are the lines where month = 07/01/2021 OR ( | ) 0/01/2021:你是什么行,其中 month = 07/01/2021 OR ( | ) 0/01/2021:

from pyspark.sql.functions import col

a = [
    (2021, "06/01/2021"),
    (2021, "06/01/2021"),
    (2021, "07/01/2021"),
    (2021, "07/01/2021"),
    (2021, "0/01/2021"),
    (2021, "0/01/2021"),
]

b = "Year", "Month"

df = spark.createDataFrame(a, b)
df = df.filter((col("Month") == "07/01/2021") | (col("Month") == "0/01/2021"))
# 
df.show()
+----+----------+                                                               
|Year|     Month|
+----+----------+
|2021|07/01/2021|
|2021|07/01/2021|
|2021| 0/01/2021|
|2021| 0/01/2021|
+----+----------+

you can also write is like this:你也可以这样写:

df.filter(col("Month").isin("07/01/2021", "0/01/2021")).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM