简体   繁体   English

从pyspark数据框中获取平均日期值

[英]Get average date value from pyspark dataframe

I have a df with product data with the following schema我有一个带有以下架构的产品数据的 df

root
 |-- Creator: string (nullable = true)
 |-- Created_datetime: timestamp (nullable = true)
 |-- Last_modified_datetime: timestamp (nullable = true)
 |-- Product_name: string (nullable = true)

the columns Created_datetime looks the following Created_datetime列如下所示

+-------------------+
|   Created_datetime|
+-------------------+
|2019-10-12 17:09:18|
|2019-12-03 07:02:07|
|2020-01-16 23:10:08|

Now I would like to extract the average value (or the closest value to the avg existing) in the Created_datetime column.现在我想提取Created_datetime列中的平均值(或最接近现有平均值的值)。 How can this be achieved?如何做到这一点?

When you calculate the average of a timestamp column, it will give you the average unix timestamp (long) value.当您计算timestamp列的平均值timestamp ,它将为您提供平均unix timestamp (long)值。 Cast it back to a timestamp :将其转换回timestamp

from pyspark.sql.functions import *
from pyspark.sql import functions as F

df.agg(F.avg("Created_datetime").cast("timestamp").alias("avg_created_datetime")).show()
+--------------------+                                                          
|avg_created_datetime|
+--------------------+
| 2019-11-30 23:27:11|
+--------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pyspark - 从数据框中获取数组类型的值 - pyspark - get value of array type from dataframe 如何获得 PySpark dataframe 中最后 5 行的平均值 - How to get average of last 5 rows in a PySpark dataframe 如何从pyspark的数据框列中获取第一个值和最后一个值? - how to get first value and last value from dataframe column in pyspark? 使用来自同一列的平均值填充Pyspark数据帧列空值 - Fill Pyspark dataframe column null values with average value from same column PySpark dataframe 转换 - 从 JSON 获取价值部分 - PySpark dataframe transformation - to get value part from JSON 如何使用 where 子句从 pyspark dataframe 中获取值 - How to get a value from one pyspark dataframe using where clause 从 PySpark 的 Dataframe 列中获取最后一个/分隔的值 - Get last / delimited value from Dataframe column in PySpark 从一个 PySpark 数据框中获取 ArrayType 列并在另一个数据框中获取相应的值 - Take ArrayType column from one PySpark dataframe and get corresponding value in another dataframe Pyspark:如何根据日期获取特定文件以从文件列表加载到 dataframe - Pyspark : how to get specific file based on date to load into dataframe from list of file Pyspark Dataframe 中的日期格式 - Date format in Pyspark Dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM