简体   繁体   English

使用 PySpark 将列数据类型从字符串转换为日期返回 null 值

[英]Converting column data type from string to date with PySpark returns null values

I was trying to change the datatype of a column ( Disponibility ) from string type to date, but every time it shows this column converted as null values (for example: 23/01/2022 becomes null....)我试图将列( Disponibility )的数据类型从字符串类型更改为日期,但每次它显示此列转换为 null 值(例如:23/01/2022 变为 null ......)

This is my code:这是我的代码:

dfwdate = dfworkers2.withColumn("Disponibility", to_date("Disponibility")) \
.show(truncate=False)   
to_date('Disponibility', 'dd/MM/yyyy')

You have correctly chosen the function to_date .您已正确选择 function to_date It's just that in your case it requires another parameter - the actual format of your date string 'dd-MM-yyyy' .只是在您的情况下,它需要另一个参数-日期字符串'dd-MM-yyyy'的实际格式。 By default, to_date expects to get the format 'yyyy-MM-dd' .默认情况下, to_date期望获得格式'yyyy-MM-dd' Since your column is not of this format, you get null returned.由于您的列不是这种格式,您会得到 null 返回。

Full example:完整示例:

from pyspark.sql import functions as F
df = spark.createDataFrame([('23/01/2022',)], ['Disponibility'])

df.show()
# +-------------+
# |Disponibility|
# +-------------+
# |   23/01/2022|
# +-------------+
print(df.dtypes)
# [('Disponibility', 'string')]

df = df.withColumn('Disponibility', F.to_date('Disponibility', 'dd/MM/yyyy'))

df.show()
# +-------------+
# |Disponibility|
# +-------------+
# |   2022-01-23|
# +-------------+
print(df.dtypes)
# [('Disponibility', 'date')]

You additionally need to supply the date_format with to_date , various format references can be found Spark date pattern documentation page您还需要提供date_formatto_date ,可以在Spark 日期模式文档页面中找到各种格式参考

Date Conversion Examples日期转换示例

s = StringIO("""
date_str
2022-03-01
2022-05-20
2022-06-21
2022-10-22
""")

df = pd.read_csv(s,delimiter=',')

sparkDF = sql.createDataFrame(df)\
             .withColumn('date_parsed',F.to_date(F.col('date_str'), 'yyyy-MM-dd'))\
             .drop('date_str')

sparkDF.show()

+-----------+
|date_parsed|
+-----------+
| 2022-03-01|
| 2022-05-20|
| 2022-06-21|
| 2022-10-22|
+-----------+

sparkDF.printSchema()

root
 |-- date_parsed: date (nullable = true)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM