简体   繁体   中英

Spark convert TimestampType to String of format yyyyMMddHHmm

In my dataframe I have a column of TimestampType format of '2019-03-16T16:54:42.968Z' I would like to convert this to a StringType column with a format of '201903161654' instead of a generic timestamp column. This is because I need to partition several directories based on the string formatted timestamp, if I partition on the timestamp column it creates special characters when creating the directory.

Is there any api I can use in spark to convert the Timestamp column to a string type with the format above?

Use the date_format function: date_format(date/timestamp/string ts, string fmt) .

Converts a date/timestamp/string to a value of string in the format specified by the date format fmt . Supported formats are Java SimpleDateFormat formats . The second argument fmt should be constant. Example: date_format('2015-04-08', 'y') is "2015" .

Using unix_timestamp and date_format functions

scala> val dt = Seq("2019-03-16T16:54:42.968Z").toDF("ts_str")
dt: org.apache.spark.sql.DataFrame = [ts_str: string]

scala> dt.select(unix_timestamp('ts_str,"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")).show
+----------------------------------------------------+
|unix_timestamp(ts_str, yyyy-MM-dd'T'HH:mm:ss.SSS'Z')|
+----------------------------------------------------+
|                                          1552735482|
+----------------------------------------------------+


scala> dt.select(date_format(unix_timestamp('ts_str,"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'").cast("timestamp"), "yyyyMMddHHmm").as("res") ).show
+------------+
|         res|
+------------+
|201903161654|
+------------+


scala>

or just using to_timestamp and date_format functions

scala> val dt = Seq("2019-03-16T16:54:42.968Z").toDF("ts_str")
dt: org.apache.spark.sql.DataFrame = [ts_str: string]

scala> dt.withColumn("ts_str",to_timestamp('ts_str,"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'")).show(false)
+-------------------+
|ts_str             |
+-------------------+
|2019-03-16 16:54:42|
+-------------------+


scala> dt.withColumn("ts_str",date_format(to_timestamp('ts_str,"yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"), "yyyyMMddHHmm" )).show(false)
+------------+
|ts_str      |
+------------+
|201903161654|
+------------+


scala>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM