简体   繁体   English

Spark SQL 将字符串转换为时间戳

[英]Spark SQL converting string to timestamp

I'm new to Spark SQL and am trying to convert a string to a timestamp in a spark data frame.我是 Spark SQL 的新手,正在尝试将字符串转换为 spark 数据框中的时间戳。 I have a string that looks like '2017-08-01T02:26:59.000Z' in a column called time_string我在名为'2017-08-01T02:26:59.000Z'的列中有一个看起来像'2017-08-01T02:26:59.000Z'的字符串

My code to convert this string to timestamp is我将此字符串转换为时间戳的代码是

CAST (time_string AS Timestamp)

But this gives me a timestamp of 2017-07-31 19:26:59但这给了我2017-07-31 19:26:59的时间戳

Why is it changing the time?为什么时间会变? Is there a way to do this without changing the time?有没有办法在不改变时间的情况下做到这一点?

Thanks for any help!感谢您的帮助!

You could use unix_timestamp function to convert the utc formatted date to timestamp您可以使用unix_timestamp函数将 utc 格式的日期转换为时间戳

val df2 = Seq(("a3fac", "2017-08-01T02:26:59.000Z")).toDF("id", "eventTime")

df2.withColumn("eventTime1", unix_timestamp($"eventTime", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast(TimestampType))

Output:输出:

+-------------+---------------------+
|userid       |eventTime            |
+-------------+---------------------+
|a3fac        |2017-08-01 02:26:59.0|
+-------------+---------------------+

Hope this helps!希望这有帮助!

Solution on Java Java 上的解决方案

There are some Spark SQL functions which let you to play with the date format.有一些 Spark SQL 函数可以让您使用日期格式。

Conversion example : 20181224091530 -> 2018-12-24 09:15:30转换示例: 20181224091530 -> 2018-12-24 09:15:30

Solution (Spark SQL statement) :解决方案(Spark SQL 语句):

SELECT
 ...
 to_timestamp(cast(DECIMAL_DATE as string),'yyyyMMddHHmmss') as `TIME STAMP DATE`,
 ...
FROM some_table

You can use the SQL statements by using an instance of org.apache.spark.sql.SparkSession .您可以通过使用org.apache.spark.sql.SparkSession的实例来使用 SQL 语句。 For example if you want to execute an sql statement, Spark provide the following solution:比如要执行一条sql语句,Spark提供如下解决方案:

...
// You have to create an instance of SparkSession
sparkSession.sql(sqlStatement); 
...

Notes:注意事项:

  • You have to convert the decimal to string and after you can achieve the parsing to timestamp format您必须将十进制转换为字符串,然后才能实现解析为时间戳格式
  • You can play with the format the get however format you want...您可以使用您想要的格式来获取任何格式...
  1. In spark sql you can use to_timestamp and then format it as your requirement.在 spark sql 中,您可以使用 to_timestamp 然后将其格式化为您的要求。 select date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as from选择 date_format(to_timestamp(,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as from

  2. Here 'timestamp' with value is 2019/02/23 12:00:00 and it is StringType column in 'event' table.这里的“时间戳”值为 2019/02/23 12:00:00,它是“事件”表中的 StringType 列。 To convert into TimestampType apply to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss).转换成 TimestampType 应用 to_timestamp(timestamp, 'yyyy/MM/dd HH:mm:ss)。 It is need to make sure the format for timestamp is same as your column value.需要确保时间戳的格式与您的列值相同。 Then you apply date_format to convert it as per your requirement.然后您应用 date_format 来根据您的要求进行转换。

> select date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") as timeStamp from event > 选择 date_format(to_timestamp(timestamp,'yyyy/MM/dd HH:mm:ss'),"yyyy-MM-dd HH:mm:ss") 作为事件的时间戳

Using SQL syntax: 使用SQL语法:

select date_format(to_timestamp(ColumnTimestamp, "MM/dd/yyyy hh:mm:ss aa"), "yyyy-MM-dd") as ColumnDate 
from database_name.table_name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM