简体   繁体   English

Spark 2.3 (Scala) - 将时间戳列从 UTC 转换为另一列中指定的时区

[英]Spark 2.3 (Scala) - Convert a timestamp column from UTC to timezone specified in another column

I have a data frame with data like this:我有一个包含如下数据的数据框:

    +----------------------+------------+
    | utc_timestamp        | tz_locale  |
    +----------------------+------------+
    |2021-07-16T10:00:00Z  | US/Eastern |
    |2021-07-19T15:00:00Z  | US/Central |
    +----------------------+------------+

I want to convert the timestamps from UTC (TZ 0) to the local time based on the value in the tz_locale column:我想根据tz_locale列中的值将时间戳从 UTC (TZ 0) 转换为本地时间:

    +----------------------+------------+
    | utc_timestamp        | tz_locale  |
    +----------------------+------------+
    |2021-07-16T06:00:00Z  | US/Eastern |
    |2020-12-19T09:00:00Z  | US/Central |
    +----------------------+------------+

I tried writing it like this:我试着这样写:

val new_df = df.withColumn("utc_timestamp", from_utc_timestamp(df.col("utc_timestamp"), df.col("tz_locale")))

It appears from_utc_timestamp wants a String constant for the second argument, so it apparently only works to convert the entire column to the same timezone.看来from_utc_timestamp一个 String 常量作为第二个参数,所以它显然只能将整个列转换为相同的时区。 But I need to convert each row dynamically based on the value of another column in that row.但是我需要根据该行中另一列的值动态转换每一行。

I think this is possible in newer versions of Spark ( from_utc_timestamp is overloaded with a version that takes (DataFrame.col, DataFrame.col) ), but I am on 2.3 and upgrading is not an option.我认为这在较新版本的 Spark 中是可能的( from_utc_timestamp因采用(DataFrame.col, DataFrame.col)的版本而过载),但我使用的是 2.3 并且升级不是一种选择。 How can this be done in Spark 2.3?如何在 Spark 2.3 中做到这一点? It seems like a fairly common task but I can't figure it out, and couldn't find anything using search.这似乎是一项相当常见的任务,但我无法弄清楚,并且无法使用搜索找到任何内容。

For Spark 2.3 or older, you can take advantage of using the less type-constrained SQL expression via expr :对于 Spark 2.3 或更早版本,您可以通过expr使用类型约束较少的 SQL 表达式:

df.withColumn("utc_timestamp", expr("from_utc_timestamp(utc_timestamp, tz_locale)")).show

+-------------------+----------+
|      utc_timestamp| tz_locale|
+-------------------+----------+
|2021-07-15 23:00:00|US/Eastern|
|2021-07-19 03:00:00|US/Central|
+-------------------+----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM