I have a pyspark
dataframe like the following:
df.show(5)
+----------+
| t_start|
+----------+
|1506125172|
|1506488793|
|1506242331|
|1506307472|
|1505613973|
+----------+
I would like to get the hour and the day of each unix timestamp. This what I am doing:
df = df.withColumn("datetime", F.from_unixtime("t_start", "dd/MM/yyyy HH:mm:ss"))
df = df.withColumn("hour", F.date_trunc('hour',F.to_timestamp("datetime","yyyy-MM-dd HH:mm:ss")))
df.show(5)
+----------+-------------------+----+
| t_start| datetime|hour|
+----------+-------------------+----+
|1506125172|23/09/2017 00:06:12|null|
|1506488793|27/09/2017 05:06:33|null|
|1506242331|24/09/2017 08:38:51|null|
|1506307472|25/09/2017 02:44:32|null|
|1505613973|17/09/2017 02:06:13|null|
+----------+-------------------+----+
And I got null
in the column hour
You can use the hour()
function to extract the hour unit from a timestamp column. (Also, change your date format. It is in dd/MM/yyyy
)
from pyspark.sql import functions as F
from pyspark.sql.functions import *
df.withColumn("hour", hour(F.to_timestamp("datetime","dd/MM/yyyy HH:mm:ss"))).show()
+----------+-------------------+----+
| t_start| datetime|hour|
+----------+-------------------+----+
|1506125172|23/09/2017 00:06:12| 0|
|1506488793|27/09/2017 05:06:33| 5|
|1506242331|24/09/2017 08:38:51| 8|
|1506307472|25/09/2017 02:44:32| 2|
|1505613973|17/09/2017 02:06:13| 2|
+----------+-------------------+----+
You can use the hour
function with from_unixtime
simply.
from pyspark.sql.functions import *
df.withColumn('hour', hour(from_unixtime('t_start'))).show()
+----------+----+
| t_start|hour|
+----------+----+
|1506125172| 0|
|1506488793| 5|
|1506242331| 8|
|1506307472| 2|
|1505613973| 2|
+----------+----+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.