根据Spark Scala中输入的字符串日期过滤数据框

Question

I have a table with a column 'date' and the date format is yyyyMMdd. 我有一个带有“日期”列的表，日期格式为yyyyMMdd。 I need to filter this dataframe and return a dataframe with only rows with dates greater than an input, For eg: Return all the rows where date is greater than "20180715". 我需要过滤此数据框并返回仅包含日期大于输入的行的数据框，例如：返回日期大于“ 20180715”的所有行。 I did the following. 我做了以下。

scala> df.groupBy("date").count.show(50,false)  
+--------+----------+                                                              
|date    |count     |  
+--------+----------+  
|20180707|200       |  
|20180715|1429586969| 
|20180628|1425490080| 
|20180716|1429819708|  
+--------+----------+ 

scala> var con = df.filter(to_date(df("date"),"yyyyMMdd").gt(lit("20180715")))

scala> con.count
res4: Long = 0

scala> var con = df.filter(to_date(df("date"),"yyyyMMdd").gt(lit("20170715")))

scala> con.count
res1: Long = 4284896957

When I input the date as "20170715", it counts all the records, whereas if the date is "20180715", the filter condition does not work. 当我将日期输入为“ 20170715”时，它将对所有记录进行计数，而如果日期为“ 20180715”，则过滤条件将不起作用。 What is the correct way to compare with a string date. 与字符串日期进行比较的正确方法是什么？

Answer 1

Changing the format of the input string passed to the lit function, solved this issue. 更改传递给lit函数的输入字符串的格式可以解决此问题。

scala> var con = df.filter(to_date(df("date"),"yyyyMMdd").gt(lit("2018-07-15"))) 

scala> con.count 
res6: Long = 1429819708

根据Spark Scala中输入的字符串日期过滤数据框

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-07-19 04:32:09

根据Spark Scala中输入的字符串日期过滤数据框

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-07-19 04:32:09

解决方案1
1 已采纳 2018-07-19 04:32:09