[英]Parse different formats of date in string format to date format pyspark when clause
Need to convert this format to需要将此格式转换为
name,code,DATE_invoice
Ram,E01,09/29/2018
Mara,E02,07/14/2017
Test,E03,01/01/18
this:这:
name,code,DATE_invoice
Ram,E01,2018-09-29
Mara,E02,2017-07-14
Test,E03,2018-01-01
If the column is already a date, this should do the job:如果该列已经是日期,这应该可以完成工作:
df = df.withColumn('DATE_invoice', date_format(col("DATE_invoice"), "yyyy-MM-dd")))
To Parse different data formats you can utilise to_date along with coalesce要解析不同的数据格式,您可以使用to_date和coalesce
You can utilise the same approach towards multiple patterns within your dataset, and example can be found here您可以对数据集中的多个模式使用相同的方法,可以在 此处找到示例
input_str = """
Ram,E01,09/29/2018,
Mara,E02,07/14/2017,
Test,E03,01/01/18
""".split(",")
input_values = list(map(lambda x: x.strip() if x.strip() != 'null' else None, input_str))
cols = list(map(lambda x: x.strip() if x.strip() != 'null' else None, "name,code,DATE_invoice".split(",")))
n = len(input_values)
n_col = 3
input_list = [tuple(input_values[i:i+n_col]) for i in range(0,n,n_col)]
sparkDF = sql.createDataFrame(input_list, cols)
sparkDF.show()
+----+----+------------+
|name|code|DATE_invoice|
+----+----+------------+
| Ram| E01| 09/29/2018|
|Mara| E02| 07/14/2017|
|Test| E03| 01/01/18|
+----+----+------------+
sql.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
sparkDF.withColumn('p1',F.to_date(F.col('DATE_invoice'),"MM/dd/yyyy"))\
.withColumn('p2',F.to_date(F.col('DATE_invoice'),"MM/dd/yy"))\
.withColumn('DATE_invoice_parsed',F.coalesce(F.col('p1'),F.col('p2')))\
.drop(*['p1','p2'])\
.show(truncate=False)
+----+----+------------+-------------------+
|name|code|DATE_invoice|DATE_invoice_parsed|
+----+----+------------+-------------------+
|Ram |E01 |09/29/2018 |2018-09-29 |
|Mara|E02 |07/14/2017 |2017-07-14 |
|Test|E03 |01/01/18 |0018-01-01 |
+----+----+------------+-------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.