[英]Get average date value from pyspark dataframe
I have a df with product data with the following schema我有一个带有以下架构的产品数据的 df
root
|-- Creator: string (nullable = true)
|-- Created_datetime: timestamp (nullable = true)
|-- Last_modified_datetime: timestamp (nullable = true)
|-- Product_name: string (nullable = true)
the columns Created_datetime
looks the following Created_datetime
列如下所示
+-------------------+
| Created_datetime|
+-------------------+
|2019-10-12 17:09:18|
|2019-12-03 07:02:07|
|2020-01-16 23:10:08|
Now I would like to extract the average value (or the closest value to the avg existing) in the Created_datetime
column.现在我想提取Created_datetime
列中的平均值(或最接近现有平均值的值)。 How can this be achieved?如何做到这一点?
When you calculate the average of a timestamp
column, it will give you the average unix timestamp (long)
value.当您计算timestamp
列的平均值timestamp
,它将为您提供平均unix timestamp (long)
值。 Cast it back to a timestamp
:将其转换回timestamp
:
from pyspark.sql.functions import *
from pyspark.sql import functions as F
df.agg(F.avg("Created_datetime").cast("timestamp").alias("avg_created_datetime")).show()
+--------------------+
|avg_created_datetime|
+--------------------+
| 2019-11-30 23:27:11|
+--------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.