简体   繁体   English

如何在 pyspark 数据框中创建日期时间列?

[英]How to create datetime columns in a pyspark dataframe?

I have a pyspark dataframe that looks like the following我有一个 pyspark 数据框,如下所示

    df
    
       year   month   day 
       2017    9       3 
       2015    5      16

I would like to create a column as datetime like the following我想创建一个列作为datetime ,如下所示

    df
    
       year   month   day           date
       2017    9       3    2017-09-03 00:00:00
       2015    5      16    2017-05-16 00:00:00

You can use concat_ws to concat and convert to date using to_date您可以使用concat_ws来连接并使用to_date转换为date

from pyspark.sql.functions import *
df = spark.createDataFrame([[2017,9,3 ],[2015,5,16]],['year', 'month','date'])
df = df.withColumn('timestamp',to_date(concat_ws('-', df.year, df.month,df.date)))
df.show()

+----+-----+----+----------+
|year|month|date| timestamp|
+----+-----+----+----------+
|2017|    9|   3|2017-09-03|
|2015|    5|  16|2015-05-16|
+----+-----+----+----------+

Schema:架构:

df.printSchema()
root
 |-- year: long (nullable = true)
 |-- month: long (nullable = true)
 |-- date: long (nullable = true)
 |-- timestamp: date (nullable = true)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 pyspark 中创建具有两个 dataframe 列的字典? - How to create a dictionary with two dataframe columns in pyspark? 如何拆分 pyspark dataframe 并创建新列 - How to split pyspark dataframe and create new columns 如何从 PySpark 中另一个数据帧的列中创建数据帧中的列 - How to create columns in a dataframe out of columns of another dataframe in PySpark 如何自动检测 Pyspark dataframe 中包含日期时间的列 - How to automatically detect columns that contain datetime in a Pyspark dataframe 如何创建Pyspark UDF以将新列添加到数据框 - How to create a Pyspark UDF for adding new columns to a dataframe 如何处理pyspark数据框列 - How to process pyspark dataframe columns How to create a function that checks if values in 2 columns of a PySpark dataframe matches values in the same 2 columns of another dataframe? - How to create a function that checks if values in 2 columns of a PySpark dataframe matches values in the same 2 columns of another dataframe? 如何在pyspark数据帧中将多列即时间、年、月和日期转换为日期时间格式 - How to convert multiple columns i.e time ,year,month and date into datetime format in pyspark dataframe 如何从 pyspark dataframe 中的日期时间中提取小时数? - How to extract hours from datetime in a pyspark dataframe? "如何在 pyspark 中创建数据框的副本?" - How to create a copy of a dataframe in pyspark?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM