简体   繁体   中英

Spark SQL converting string to timestamp

I'm new to Spark SQL and am trying to convert a string to a timestamp in a spark data frame. I have a string that looks like '2017-08-01T02:26:59.000Z' in a column called time_string

My code to convert this string to timestamp is

CAST (time_string AS Timestamp)

But this gives me a timestamp of 2017-07-31 19:26:59

Why is it changing the time? Is there a way to do this without changing the time?

Thanks for any help!

You could use unix_timestamp function to convert the utc formatted date to timestamp

val df2 = Seq(("a3fac", "2017-08-01T02:26:59.000Z")).toDF("id", "eventTime")

df2.withColumn("eventTime1", unix_timestamp($"eventTime", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast(TimestampType))

Output:

+-------------+---------------------+
|userid       |eventTime            |
+-------------+---------------------+
|a3fac        |2017-08-01 02:26:59.0|
+-------------+---------------------+

Hope this helps!

Solution on Java

There are some Spark SQL functions which let you to play with the date format.

Conversion example : 20181224091530 -> 2018-12-24 09:15:30

Solution (Spark SQL statement) :

SELECT
 ...
 to_timestamp(cast(DECIMAL_DATE as string),'yyyyMMddHHmmss') as `TIME STAMP DATE`,
 ...
FROM some_table

You can use the SQL statements by using an instance of org.apache.spark.sql.SparkSession . For example if you want to execute an sql statement, Spark provide the following solution:

...
// You have to create an instance of SparkSession
sparkSession.sql(sqlStatement); 
...

Notes:

  • You have to convert the decimal to string and after you can achieve the parsing to timestamp format
  • You can play with the format the get however format you want...
  1. In spark sql you can use to_timestamp and then format it as your requirement. select date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as from

  2. Here 'timestamp' with value is 2019/02/23 12:00:00 and it is StringType column in 'event' table. To convert into TimestampType apply to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss). It is need to make sure the format for timestamp is same as your column value. Then you apply date_format to convert it as per your requirement.

> select date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as timeStamp from event

Using SQL syntax:

select date_format(to_timestamp(ColumnTimestamp, "MM/dd/yyyy hh:mm:ss aa"), "yyyy-MM-dd") as ColumnDate 
from database_name.table_name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM