简体   繁体   中英

Python Spark Dataframe: Conversion of string column into timestamp

I have time column which has object data-type and contains value in the following format:

0 days 01:30:00.0

I want to change the above time column into the format of HH:mm:ss, so that it looks like in following format:

01:30:00

I used the following code to convert into the desired format:

df.withColumn('TIME_timestamp',F.to_timestamp("time", "HH:mm:ss")).show()

However, it returned the null column. Output produced by the above code is:

 time|              TIME_timestamp|
+-------------------+--------------+
0 days 00:00:00.0 |          null|
0 days 00:30:00.0 |          null|
0 days 01:00:00.0 |          null|
0 days 01:30:00.0 |          null|
0 days 02:00:00.0 |          null|
0 days 02:30:00.0 |          null|
+-------------------+---------------+

Could anyone guide where am I making the mistake?

You can use substring to get your timevalues into the to_timestamp function. Timestamp will give its default date, as date is needed for timestamp.

df.withColumn("Time_timestamp", F.to_timestamp(F.substring("time",7,9),"HH:mm:ss")).show()

+-----------------+-------------------+
|             time|     Time_timestamp|
+-----------------+-------------------+
|0 days 00:00:00.0|1970-01-01 00:00:00|
|0 days 00:30:00.0|1970-01-01 00:30:00|
|0 days 01:00:00.0|1970-01-01 01:00:00|
|0 days 01:30:00.0|1970-01-01 01:30:00|
+-----------------+-------------------+

I would suggest you keep it in seconds format using unix_timestamp to lose the date part.

df.withColumn("Time_timestamp", F.unix_timestamp(F.substring("time",7,9),"HH:mm:ss")).show()

+-----------------+--------------+
|             time|Time_timestamp|
+-----------------+--------------+
|0 days 00:00:00.0|             0|
|0 days 00:30:00.0|          1800|
|0 days 01:00:00.0|          3600|
|0 days 01:30:00.0|          5400|
+-----------------+--------------+

Or just keep it as string:

df.withColumn("Time_timestamp", F.substring("time",7,9)).show()

+-----------------+--------------+
|             time|Time_timestamp|
+-----------------+--------------+
|0 days 00:00:00.0|      00:00:00|
|0 days 00:30:00.0|      00:30:00|
|0 days 01:00:00.0|      01:00:00|
|0 days 01:30:00.0|      01:30:00|
+-----------------+--------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM