从pyspark数据框中获取平均日期值

Question

I have a df with product data with the following schema我有一个带有以下架构的产品数据的 df

root
 |-- Creator: string (nullable = true)
 |-- Created_datetime: timestamp (nullable = true)
 |-- Last_modified_datetime: timestamp (nullable = true)
 |-- Product_name: string (nullable = true)

the columns Created_datetime looks the following Created_datetime列如下所示

+-------------------+
|   Created_datetime|
+-------------------+
|2019-10-12 17:09:18|
|2019-12-03 07:02:07|
|2020-01-16 23:10:08|

Now I would like to extract the average value (or the closest value to the avg existing) in the Created_datetime column.现在我想提取Created_datetime列中的平均值（或最接近现有平均值的值）。 How can this be achieved?如何做到这一点？

Answer 1

When you calculate the average of a timestamp column, it will give you the average unix timestamp (long) value.当您计算timestamp列的平均值timestamp ，它将为您提供平均unix timestamp (long)值。 Cast it back to a timestamp :将其转换回timestamp ：

from pyspark.sql.functions import *
from pyspark.sql import functions as F

df.agg(F.avg("Created_datetime").cast("timestamp").alias("avg_created_datetime")).show()
+--------------------+                                                          
|avg_created_datetime|
+--------------------+
| 2019-11-30 23:27:11|
+--------------------+

从pyspark数据框中获取平均日期值

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-10-14 19:13:31

从pyspark数据框中获取平均日期值

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-10-14 19:13:31

解决方案1
2 已采纳 2020-10-14 19:13:31