简体   繁体   中英

get date difference from the columns in dataframe and get seconds -Spark scala

I have a dataframe with two date columns .Now I need to get the difference and the results should be seconds

UNIX_TIMESTAMP(SUBSTR(date1, 1, 19)) - UNIX_TIMESTAMP(SUBSTR(date2, 1, 19)) AS delta

that hive query I am trying to convert into dataframe query using scala

df.select(col("date").substr(1,19)-col("poll_date").substr(1,19))

from here I am not able to convert into seconds , Can any body help on this .Thanks in advance

Using DataFrame API, you can calculate the date difference in seconds simply by subtracting one column from the other in unix_timestamp :

val df = Seq(
  ("2018-03-05 09:00:00", "2018-03-05 09:01:30"),
  ("2018-03-06 08:30:00", "2018-03-08 15:00:15")
).toDF("date1", "date2")

df.withColumn("tsdiff", unix_timestamp($"date2") - unix_timestamp($"date1")).
  show

// +-------------------+-------------------+------+
// |              date1|              date2|tsdiff|
// +-------------------+-------------------+------+
// |2018-03-05 09:00:00|2018-03-05 09:01:30|    90|
// |2018-03-06 08:30:00|2018-03-08 15:00:15|196215|
// +-------------------+-------------------+------+

You could perform the calculation in Spark SQL as well, if necessary:

df.createOrReplaceTempView("dfview")

spark.sql("""
  select date1, date2, (unix_timestamp(date2) - unix_timestamp(date1)) as tsdiff
  from dfview
""")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM