简体   繁体   English

Scala:按小时范围过滤字符串日期?

[英]Scala: filter a string date by an hour range?

I am using Scala and trying to filter a dataset on a string column with a date and time value.我正在使用 Scala 并尝试使用日期和时间值过滤字符串列上的数据集。 I have been looking at several posts and trying to use SimpleDateFormat techniques to no avail.我一直在查看几篇文章并尝试使用SimpleDateFormat技术无济于事。

My sample data is:我的样本数据是:

+----------------------+
|my_date_str           |
+----------------------+
|12/11/2018 08:01:55 AM|
|12/11/2018 08:33:22 PM|
|12/13/2018 09:25:28 PM|
|12/17/2018 07:27:36 PM|
+----------------------+

I'd like to keep rows between 7pm and 9pm (date does not matter, only time).我想在晚上 7 点到 9 点之间保留行(日期无关紧要,只有时间)。 I would expect to keep these two rows of the four:我希望保留四行中的这两行:

12/17/2018 07:27:36 PM
12/11/2018 08:33:22 PM

I can hack this together using substring functions but I imagine there is a better way using a to_date or a unix function (I tried converting to seconds with unix_timestamp() then extrapolating the time somehow?), isolating the time, and checking the hour value.我可以使用substring函数一起破解它,但我想有更好的方法使用to_dateunix function (我尝试转换为秒,用unix_timestamp()以某种方式检查时间,然后隔离时间,然后? .

// Filter down to rows between 7 and 9 and PM
my_data.withColumn("hour_str", substring($"my_date_str", 12, 8))
    .filter( (substring($"my_date_str", -2, 2) === "PM") && ($"my_date_str" >= "07:00:00") && ($"my_date_str" <= "09:00:00") )
    .show(truncate=false)

Too many failed attempts to include but these are a couple posts I used:包含太多失败的尝试,但这些是我使用的几个帖子:
How to convert unix timestamp to date in Spark 如何在 Spark 中将 unix 时间戳转换为日期
How to convert String to date time in Scala? 如何在 Scala 中将字符串转换为日期时间?

If it's not clear, question is how to effectively filter a string date by an hour range?如果不清楚,问题是如何按小时范围有效地过滤字符串日期?

You need "normal" timestamp, not unix_timestamp.您需要“正常”时间戳,而不是 unix_timestamp。

Something like this should work, though I am somewhat rusty on the exact incantantions:像这样的东西应该可以工作,尽管我对确切的咒语有些生疏:

df
.withColumn(
  "hour", 
  hour(to_timestamp($"foo", "MM/dd/yyy hh:mm:ss a"))
).filter($"hour" between (19,20))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM