简体   繁体   English

日期和时间列中的spark scala split timestamp列

[英]spark scala split timestamp column in date column and time column

I have a problem with split of a timestamp column into Date and time columns. 将时间戳列拆分为日期和时间列时遇到问题。 First the time doesn't consider 24h format ... Second the date is false and i don't understand why 首先,时间不考虑24h格式...其次,日期不正确,我也不明白为什么

here is my output 这是我的输出

+----------+----------+-------------------+---------+
|      Date| Timestamp|               Time|EventTime|
+----------+----------+-------------------+---------+
|2018-00-30|1540857600|2018-10-30 00:00:00| 12:00:00|
|2018-00-30|1540857610|2018-10-30 00:00:10| 12:00:10|
|2018-00-30|1540857620|2018-10-30 00:00:20| 12:00:20|
|2018-00-30|1540857630|2018-10-30 00:00:30| 12:00:30|
|2018-00-30|1540857640|2018-10-30 00:00:40| 12:00:40|
|2018-00-30|1540857650|2018-10-30 00:00:50| 12:00:50|
|2018-01-30|1540857660|2018-10-30 00:01:00| 12:01:00|
|2018-01-30|1540857670|2018-10-30 00:01:10| 12:01:10|
|2018-01-30|1540857680|2018-10-30 00:01:20| 12:01:20|
|2018-01-30|1540857690|2018-10-30 00:01:30| 12:01:30|
|2018-01-30|1540857700|2018-10-30 00:01:40| 12:01:40|

and my code : 和我的代码:

  val df = data_input
    .withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
    .withColumn("Date", date_format(col("Time"), "yyyy-mm-dd"))
    .withColumn("EventTime", date_format(col("Time"), "hh:mm:ss"))

first I convert the unix Timestamp column to Time column, and then i want to split Time.. 首先,我将unix的“时间戳记”列转换为“时间”列,然后我想分割时间。

Thank you in advance 先感谢您

You are using the wrong format codes. 您使用了错误的格式代码。 Specifically "mm" in your date is for minutes, and "hh" is for 12 hour values. 具体来说,日期中的“ mm”代表分钟,“ hh”代表12小时值。 Instead you want "MM" and "HH". 相反,您需要“ MM”和“ HH”。 Like this: 像这样:

val df = data_input
    .withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
    .withColumn("Date", date_format(col("Time"), "yyyy-MM-dd"))
    .withColumn("EventTime", date_format(col("Time"), "HH:mm:ss"))

For reference, here are the date format codes that you can use: SimpleDateFormat 作为参考,以下是可以使用的日期格式代码: SimpleDateFormat

You can avoid the confusion with simple casting 您可以通过简单的铸造避免混淆

import org.apache.spark.sql.functions._

val df = data_input
    .withColumn("Time", $"Timestamp".cast("timestamp"))
    .withColumn("Date", $"Time".cast("date"))
    .withColumn("EventTime", date_format($"Time", "H:m:s"))

+----------+-------------------+----------+---------+
|Timestamp |               Time|      Date|EventTime|
+----------+-------------------+----------+---------+
|1540857600|2018-10-30 00:00:00|2018-10-30|    0:0:0|
|1540857610|2018-10-30 00:00:10|2018-10-30|   0:0:10|
|1540857620|2018-10-30 00:00:20|2018-10-30|   0:0:20|
|1540857630|2018-10-30 00:00:30|2018-10-30|   0:0:30|
|1540857640|2018-10-30 00:00:40|2018-10-30|   0:0:40|
|1540857650|2018-10-30 00:00:50|2018-10-30|   0:0:50|
|1540857660|2018-10-30 00:01:00|2018-10-30|    0:1:0|
|1540857670|2018-10-30 00:01:10|2018-10-30|   0:1:10|
|1540857680|2018-10-30 00:01:20|2018-10-30|   0:1:20|
|1540857690|2018-10-30 00:01:30|2018-10-30|   0:1:30|
|1540857700|2018-10-30 00:01:40|2018-10-30|   0:1:40|
+----------+-------------------+----------+---------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM