[英]Pyspark dataframe change format if regex applies
我目前正在转换 dataframe 中的一些日期数据,如下所示:
+-----------+------------+
|first_col|sec_col-------|
+---------+--------------+
|a--------|28-04-2021 |
|a--------|01-03-2017 |
|a--------|"Feb 23, 2012"|
|a--------|"May 01, 2019"|
+---------+--------------+
我现在想将最后两行转换为更好的日期格式,如下所示:23-Feb-2012 我想用正则表达式来做,但下面的代码不起作用:
from pyspark.sql import functions as f
from pyspark.sql.functions import regexp_replace, regexp_extract
#(a lot of stuff happens here which is not important for the question so I let it out)
input_df = input_df.withColumn("sec_col", input_df.sec_col.cast("String"))
.withColumn("sec_col2",
f.when(input_df.sec_col.rlike("\"\w{3} \d{2}, \d{4}\""),
f.concat(regexp_extract("sec_col","\"(\w{3}) (\d{2}), (\d{4})\"",2),f.lit("-"), regexp_extract("sec_col","\"(\w{3}) (\d{2}), (\d{4})\"",1),f.lit("-"),regexp_extract("sec_col","\"(\w{3}) (\d{2}), (\d{4})\"",3))))
.otherwise(f.col("sec_col"))
任何人都可以帮忙吗?
您可以使用to_date直接将其转换为日期,而不是尝试使用regex
解析数据格式,因为您已经知道要解析和获取第一个非空值的date_format
sparkDF = sql.createDataFrame([("28-04-2021",),
("01-03-2017",),
("Feb 23, 2012",),
("May 01, 2019",)
]
,['timestamp'])
sparkDF.show()
+------------+
| timestamp|
+------------+
| 28-04-2021|
| 01-03-2017|
|Feb 23, 2012|
|May 01, 2019|
+------------+
sparkDF = sparkDF.withColumn('p1',F.to_date(F.col('timestamp'),"MMM dd, yyyy"))\
.withColumn('p2',F.to_date(F.col('timestamp'),"dd-MM-yyyy"))
+------------+----------+----------+
| timestamp| p1| p2|
+------------+----------+----------+
| 28-04-2021| null|2021-04-28|
| 01-03-2017| null|2017-03-01|
|Feb 23, 2012|2012-02-23| null|
|May 01, 2019|2019-05-01| null|
+------------+----------+----------+
sparkDF = sparkDF.withColumn('timestamp_parsed',F.coalesce(F.col('p1'),F.col('p2')))\
.drop(*['p1','p2'])
sparkDF.show()
+------------+----------------+
| timestamp|timestamp_parsed|
+------------+----------------+
| 28-04-2021| 2021-04-28|
| 01-03-2017| 2017-03-01|
|Feb 23, 2012| 2012-02-23|
|May 01, 2019| 2019-05-01|
+------------+----------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.