[英]How to filter the dates in a pyspark dataframe
I have a pyspark dataframe:我有一个 pyspark dataframe:
Year Month
2021 06/01/2021
2021 06/01/2021
2021 07/01/2021
2021 07/01/2021
2021 0/01/2021
2021 0/01/2021
I need a dataframe for specific months along with '0/01/2021'.我需要特定月份的 dataframe 以及“0/01/2021”。 Tried with below code:
尝试使用以下代码:
df=df.filter((col('Month')=='07/01/2021') & (col('Month')=='0/01/2021'))
display(df)
My required dataframe is:我需要的 dataframe 是:
Year Month
2021 07/01/2021
2021 07/01/2021
2021 0/01/2021
2021 0/01/2021
But I'm getting: Query returned no results
as result.但我得到:
Query returned no results
。 The 'Month' column is in string format. “月份”列采用字符串格式。 How to filter for these dates?
如何过滤这些日期?
That's normal.这很正常。 You are asking for each line that the value equal both 07/01/2021 AND (
&
) 0/01/2021.您要求每一行的值都等于 07/01/2021 AND (
&
) 0/01/2021。
What you are the lines where month = 07/01/2021 OR ( |
) 0/01/2021:你是什么行,其中 month = 07/01/2021 OR (
|
) 0/01/2021:
from pyspark.sql.functions import col
a = [
(2021, "06/01/2021"),
(2021, "06/01/2021"),
(2021, "07/01/2021"),
(2021, "07/01/2021"),
(2021, "0/01/2021"),
(2021, "0/01/2021"),
]
b = "Year", "Month"
df = spark.createDataFrame(a, b)
df = df.filter((col("Month") == "07/01/2021") | (col("Month") == "0/01/2021"))
#
df.show()
+----+----------+
|Year| Month|
+----+----------+
|2021|07/01/2021|
|2021|07/01/2021|
|2021| 0/01/2021|
|2021| 0/01/2021|
+----+----------+
you can also write is like this:你也可以这样写:
df.filter(col("Month").isin("07/01/2021", "0/01/2021")).show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.