简体   繁体   English

根据Spark Scala中输入的字符串日期过滤数据框

[英]Filter a dataframe based on the string date input in spark scala

I have a table with a column 'date' and the date format is yyyyMMdd. 我有一个带有“日期”列的表,日期格式为yyyyMMdd。 I need to filter this dataframe and return a dataframe with only rows with dates greater than an input, For eg: Return all the rows where date is greater than "20180715". 我需要过滤此数据框并返回仅包含日期大于输入的行的数据框,例如:返回日期大于“ 20180715”的所有行。 I did the following. 我做了以下。

scala> df.groupBy("date").count.show(50,false)  
+--------+----------+                                                              
|date    |count     |  
+--------+----------+  
|20180707|200       |  
|20180715|1429586969| 
|20180628|1425490080| 
|20180716|1429819708|  
+--------+----------+ 

scala> var con = df.filter(to_date(df("date"),"yyyyMMdd").gt(lit("20180715")))

scala> con.count
res4: Long = 0

scala> var con = df.filter(to_date(df("date"),"yyyyMMdd").gt(lit("20170715")))

scala> con.count
res1: Long = 4284896957 

When I input the date as "20170715", it counts all the records, whereas if the date is "20180715", the filter condition does not work. 当我将日期输入为“ 20170715”时,它将对所有记录进行计数,而如果日期为“ 20180715”,则过滤条件将不起作用。 What is the correct way to compare with a string date. 与字符串日期进行比较的正确方法是什么?

Changing the format of the input string passed to the lit function, solved this issue. 更改传递给lit函数的输入字符串的格式可以解决此问题。

scala> var con = df.filter(to_date(df("date"),"yyyyMMdd").gt(lit("2018-07-15"))) 

scala> con.count 
res6: Long = 1429819708

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM