简体   繁体   中英

change the timestamp to UTC format in spark using scala

The question is kind of similar with the problem: Change the timestamp to UTC format in Pyspark

Basically, it is convert timestamp string format ISO8601 with offset to UTC timestamp string( 2017-08-01T14:30:00+05:30 -> 2017-08-01T09:00:00+00:00 ) using scala .

I am kind of new to scala/java, I checked spark library which they dont have a way to convert without knowing the timezone, which I dont have a idea of timezone unless (I parse it in ugly way or using java/scala lib?) Can someone help?

UPDATE: The better way to do this: setup timezone session in spark, and use df.cast(DataTypes.TimestampType) to do the timezone shift

You can use the java.time primitives to parse and convert your timestamp.

scala> import java.time.{OffsetDateTime, ZoneOffset}
import java.time.{OffsetDateTime, ZoneOffset}

scala> val datetime = "2017-08-01T14:30:00+05:30"
datetime: String = 2017-08-01T14:30:00+05:30

scala> OffsetDateTime.parse(datetime).withOffsetSameInstant(ZoneOffset.UTC)
res44: java.time.OffsetDateTime = 2017-08-01T09:00Z

org.apache.spark.sql.functions.to_utc_timestamp :

 def to_utc_timestamp(ts: Column, tz: String): Column 

Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1' would yield '2017-07-14 01:40:00.0'.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM