简体   繁体   English

Spark Scala-计算动态时间戳记间隔

[英]Spark scala - calculating dynamic timestamp interval

have dataframe with a timestamp column (timestamp type) called "maxTmstmp" and another column with hours, represented as integers called "WindowHours". 具有带有时间戳列(时间戳类型)的数据帧,称为“ maxTmstmp”,另一列具有小时,表示为整数,称为“ WindowHours”。 I would like to dynamically subtract timestamp and integer columns to get lower timestamp . 我想动态减去timestamp和integer列以获得较低的timestamp

My data and desired effect ("minTmstmp" column): 我的数据和预期效果(“ minTmstmp”列):

+-----------+-------------------+-------------------+
|WindowHours|          maxTmstmp|          minTmstmp|
|           |                   |(maxTmstmp - Hours)|
+-----------+-------------------+-------------------+
|          1|2016-01-01 23:00:00|2016-01-01 22:00:00|
|          2|2016-03-01 12:00:00|2016-03-01 10:00:00|
|          8|2016-03-05 20:00:00|2016-03-05 12:00:00|
|         24|2016-04-12 11:00:00|2016-04-11 11:00:00|
+-----------+-------------------+-------------------+

 root
     |-- WindowHours: integer (nullable = true)
     |-- maxTmstmp: timestamp (nullable = true)

I have already found an expressions with hours interval solution, but it isn't dynamic. 我已经找到一个带有小时间隔的表达式,但它不是动态的。 Code below doesn't work as intended. 下面的代码无法正常工作。

standards.
      .withColumn("minTmstmp", $"maxTmstmp" - expr("INTERVAL 10 HOURS"))
      .show()

Operate on Spark 2.4 and scala. 在Spark 2.4和Scala上运行。

One simple way would be to convert maxTmstmp to unix time , subtract the value of WindowHours in seconds from it, and convert the result back to Spark Timestamp , as shown below: 一种简单的方法是将maxTmstmp转换为unix timemaxTmstmp减去以秒为单位的WindowHours的值,然后将结果转换回Spark Timestamp ,如下所示:

import java.sql.Timestamp
import org.apache.spark.sql.functions._
import spark.implicits._

val df = Seq(
  (1, Timestamp.valueOf("2016-01-01 23:00:00")),
  (2, Timestamp.valueOf("2016-03-01 12:00:00")),
  (8, Timestamp.valueOf("2016-03-05 20:00:00")),
  (24, Timestamp.valueOf("2016-04-12 11:00:00"))
).toDF("WindowHours", "maxTmstmp")

df.withColumn("minTmstmp",
    from_unixtime(unix_timestamp($"maxTmstmp") - ($"WindowHours" * 3600))
  ).show
// +-----------+-------------------+-------------------+
// |WindowHours|          maxTmstmp|          minTmstmp|
// +-----------+-------------------+-------------------+
// |          1|2016-01-01 23:00:00|2016-01-01 22:00:00|
// |          2|2016-03-01 12:00:00|2016-03-01 10:00:00|
// |          8|2016-03-05 20:00:00|2016-03-05 12:00:00|
// |         24|2016-04-12 11:00:00|2016-04-11 11:00:00|
// +-----------+-------------------+-------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM