简体   繁体   English

在 PySpark 中将 isodate 字符串转换为日期格式

[英]Convert an isodate string into date format in PySpark

I'm using PySpark to develop a Machine Learning project.我正在使用 PySpark 开发机器学习项目。 I have a lot of records with a field that stores a date taken from MongoDB.我有很多记录,其中一个字段存储了从 MongoDB 获取的日期。 This date is a string but contains a date in an isodate format.此日期是一个字符串,但包含一个 isodate 格式的日期。

How can I convert it to one of the date formats allowed by Apache Spark?如何将其转换为 Apache Spark 允许的日期格式之一? In case is possible, I would need to convert the whole column which contains this date field.如果可能,我需要转换包含此日期字段的整个列。

Here's an example of this field in a JSON format:这是 JSON 格式的此字段的示例:

"date": "2020-11-09T07:27:57.078Z"

Just cast the column to a timestamp using df.select(F.col('date').cast('timestamp')) .只需使用df.select(F.col('date').cast('timestamp'))将该列转换为时间戳。 If you want date type, cast to date instead.如果您想要日期类型,请改为转换为日期。

import pyspark.sql.functions as F

df = spark.createDataFrame([['2020-11-09T07:27:57.078Z']]).toDF('date')
df.show()
+------------------------+
|date                    |
+------------------------+
|2020-11-09T07:27:57.078Z|
+------------------------+

>>> df.printSchema()
root
 |-- date: string (nullable = true)

# cast to timestamp
df2 = df.select(F.col('date').cast('timestamp'))

>>> df2.printSchema()
root
 |-- date: timestamp (nullable = true)

df2.show()
+-----------------------+
|date                   |
+-----------------------+
|2020-11-09 07:27:57.078|
+-----------------------+

# cast to date
df3 = df.select(F.col('date').cast('date'))

>>> df3.printSchema()
root
 |-- date: date (nullable = true)

df3.show()
+----------+
|      date|
+----------+
|2020-11-09|
+----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM