简体   繁体   中英

spark scala split timestamp column in date column and time column

I have a problem with split of a timestamp column into Date and time columns. First the time doesn't consider 24h format ... Second the date is false and i don't understand why

here is my output

+----------+----------+-------------------+---------+
|      Date| Timestamp|               Time|EventTime|
+----------+----------+-------------------+---------+
|2018-00-30|1540857600|2018-10-30 00:00:00| 12:00:00|
|2018-00-30|1540857610|2018-10-30 00:00:10| 12:00:10|
|2018-00-30|1540857620|2018-10-30 00:00:20| 12:00:20|
|2018-00-30|1540857630|2018-10-30 00:00:30| 12:00:30|
|2018-00-30|1540857640|2018-10-30 00:00:40| 12:00:40|
|2018-00-30|1540857650|2018-10-30 00:00:50| 12:00:50|
|2018-01-30|1540857660|2018-10-30 00:01:00| 12:01:00|
|2018-01-30|1540857670|2018-10-30 00:01:10| 12:01:10|
|2018-01-30|1540857680|2018-10-30 00:01:20| 12:01:20|
|2018-01-30|1540857690|2018-10-30 00:01:30| 12:01:30|
|2018-01-30|1540857700|2018-10-30 00:01:40| 12:01:40|

and my code :

  val df = data_input
    .withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
    .withColumn("Date", date_format(col("Time"), "yyyy-mm-dd"))
    .withColumn("EventTime", date_format(col("Time"), "hh:mm:ss"))

first I convert the unix Timestamp column to Time column, and then i want to split Time..

Thank you in advance

You are using the wrong format codes. Specifically "mm" in your date is for minutes, and "hh" is for 12 hour values. Instead you want "MM" and "HH". Like this:

val df = data_input
    .withColumn("Time", to_timestamp(from_unixtime(col("Timestamp"))))
    .withColumn("Date", date_format(col("Time"), "yyyy-MM-dd"))
    .withColumn("EventTime", date_format(col("Time"), "HH:mm:ss"))

For reference, here are the date format codes that you can use: SimpleDateFormat

You can avoid the confusion with simple casting

import org.apache.spark.sql.functions._

val df = data_input
    .withColumn("Time", $"Timestamp".cast("timestamp"))
    .withColumn("Date", $"Time".cast("date"))
    .withColumn("EventTime", date_format($"Time", "H:m:s"))

+----------+-------------------+----------+---------+
|Timestamp |               Time|      Date|EventTime|
+----------+-------------------+----------+---------+
|1540857600|2018-10-30 00:00:00|2018-10-30|    0:0:0|
|1540857610|2018-10-30 00:00:10|2018-10-30|   0:0:10|
|1540857620|2018-10-30 00:00:20|2018-10-30|   0:0:20|
|1540857630|2018-10-30 00:00:30|2018-10-30|   0:0:30|
|1540857640|2018-10-30 00:00:40|2018-10-30|   0:0:40|
|1540857650|2018-10-30 00:00:50|2018-10-30|   0:0:50|
|1540857660|2018-10-30 00:01:00|2018-10-30|    0:1:0|
|1540857670|2018-10-30 00:01:10|2018-10-30|   0:1:10|
|1540857680|2018-10-30 00:01:20|2018-10-30|   0:1:20|
|1540857690|2018-10-30 00:01:30|2018-10-30|   0:1:30|
|1540857700|2018-10-30 00:01:40|2018-10-30|   0:1:40|
+----------+-------------------+----------+---------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM