I have a pyspark dataframe:
Year Month
2021 06/01/2021
2021 06/01/2021
2021 07/01/2021
2021 07/01/2021
2021 0/01/2021
2021 0/01/2021
I need a dataframe for specific months along with '0/01/2021'. Tried with below code:
df=df.filter((col('Month')=='07/01/2021') & (col('Month')=='0/01/2021'))
display(df)
My required dataframe is:
Year Month
2021 07/01/2021
2021 07/01/2021
2021 0/01/2021
2021 0/01/2021
But I'm getting: Query returned no results
as result. The 'Month' column is in string format. How to filter for these dates?
That's normal. You are asking for each line that the value equal both 07/01/2021 AND ( &
) 0/01/2021.
What you are the lines where month = 07/01/2021 OR ( |
) 0/01/2021:
from pyspark.sql.functions import col
a = [
(2021, "06/01/2021"),
(2021, "06/01/2021"),
(2021, "07/01/2021"),
(2021, "07/01/2021"),
(2021, "0/01/2021"),
(2021, "0/01/2021"),
]
b = "Year", "Month"
df = spark.createDataFrame(a, b)
df = df.filter((col("Month") == "07/01/2021") | (col("Month") == "0/01/2021"))
#
df.show()
+----+----------+
|Year| Month|
+----+----------+
|2021|07/01/2021|
|2021|07/01/2021|
|2021| 0/01/2021|
|2021| 0/01/2021|
+----+----------+
you can also write is like this:
df.filter(col("Month").isin("07/01/2021", "0/01/2021")).show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.