简体   繁体   中英

How to filter the dates in a pyspark dataframe

I have a pyspark dataframe:

Year    Month
2021    06/01/2021
2021    06/01/2021
2021    07/01/2021
2021    07/01/2021
2021    0/01/2021
2021    0/01/2021

I need a dataframe for specific months along with '0/01/2021'. Tried with below code:

df=df.filter((col('Month')=='07/01/2021') & (col('Month')=='0/01/2021'))
display(df)

My required dataframe is:

Year    Month
2021    07/01/2021
2021    07/01/2021
2021    0/01/2021
2021    0/01/2021

But I'm getting: Query returned no results as result. The 'Month' column is in string format. How to filter for these dates?

That's normal. You are asking for each line that the value equal both 07/01/2021 AND ( & ) 0/01/2021.
What you are the lines where month = 07/01/2021 OR ( | ) 0/01/2021:

from pyspark.sql.functions import col

a = [
    (2021, "06/01/2021"),
    (2021, "06/01/2021"),
    (2021, "07/01/2021"),
    (2021, "07/01/2021"),
    (2021, "0/01/2021"),
    (2021, "0/01/2021"),
]

b = "Year", "Month"

df = spark.createDataFrame(a, b)
df = df.filter((col("Month") == "07/01/2021") | (col("Month") == "0/01/2021"))
# 
df.show()
+----+----------+                                                               
|Year|     Month|
+----+----------+
|2021|07/01/2021|
|2021|07/01/2021|
|2021| 0/01/2021|
|2021| 0/01/2021|
+----+----------+

you can also write is like this:

df.filter(col("Month").isin("07/01/2021", "0/01/2021")).show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM