[英]Regex pattern for checking all type of Date format
I want to check for date values present in which column of dataframe and convert the column to datetime because column type can be object initially, but dates can be in any format as below.我想检查 dataframe 的哪一列中存在的日期值,并将该列转换为日期时间,因为列类型最初可以是 object,但日期可以是以下任何格式。 So I am looking for a regex pattern which will match all date type formats.
所以我正在寻找一个匹配所有日期类型格式的正则表达式模式。
Can someone please suggest a regex pattern which will match all date formats?有人可以建议一个匹配所有日期格式的正则表达式模式吗?
I have tried below code:我试过下面的代码:
for columnIndex, colName in enumerate(df):
df2 = pd.DataFrame()
df2['test'] = df[colName]
count = 0
for i, j in df2.iteritems():
for k in j:
if re.match("[0-9]{2}/[0-9]{2}/[0-9]{4}", str(k)):
count = count+1
if(count>5):
df[colName] = pd.to_datetime(df[colName])
print(df.dtypes)
Considering the following dataframe df
with all date formats indicated by OP in the question考虑以下 dataframe
df
以及问题中 OP 指示的所有日期格式
df = pd.DataFrame({'date': ['04/10/2022', '10/04/2022', '2022/04/10', '2022/10/04', '2022-12-20 00:00:00', '04-10-2022']})
[Out]:
date
0 04/10/2022
1 10/04/2022
2 2022/04/10
3 2022/10/04
4 2022-12-20 00:00:00
5 04-10-2022
Assuming the goal is to convert to datetime, one can use pandas.to_datetime
.假设目标是转换为日期时间,可以使用
pandas.to_datetime
。 This has the parameter infer_datetime_format
that one can use as follows这具有可以按如下方式使用的参数
infer_datetime_format
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
[Out]:
date
0 2022-04-10
1 2022-10-04
2 2022-04-10
3 2022-10-04
4 2022-12-20
5 2022-04-10
For this case, it does the work.对于这种情况,它完成了工作。
Note:笔记:
Why not simply use pandas.to_datetime
without providing any format?为什么不简单地使用
pandas.to_datetime
而不提供任何格式?
for col in df.columns:
df[col] = pd.to_datetime(df[col])
print(df)
Col1 Col2 Col3 Col4
0 2022-04-10 NaT NaT NaT
1 NaT 2022-10-04 NaT NaT
2 NaT NaT 2022-04-10 NaT
3 2022-10-04 NaT NaT NaT
4 NaT NaT NaT 2022-12-20
5 2022-04-10 NaT NaT NaT
Col1 Col2 Col3 Col4
0 04/10/2022 NaN NaN NaN
1 NaN 10/04/2022 NaN NaN
2 NaN NaN 2022/04/10 NaN
3 2022/10/04 NaN NaN NaN
4 NaN NaN NaN 2022-12-20 00:00:00
5 04-10-2022 NaN NaN NaN
Here is an idea.这是一个想法。 With this code you will match all the formats, however you can't distinguish between days and month if the date is, say
05/05/2022
.使用此代码,您将匹配所有格式,但是如果日期是
05/05/2022
,则您无法区分天数和月份。 But that is an issue that goes beyond the scope of the question.但这是一个超出问题 scope 的问题。
The regexp I came up with looks for groups of one or more numbers [0-9]+
separated by either the dash or the slash '[/-]', and I used the backslash to escape the special symbols.我想出的正则表达式查找由破折号或斜杠“[/-]”分隔的一个或多个数字
[0-9]+
的组,我使用反斜杠来转义特殊符号。
dates="""04/10/2022
10/04/2022
2022/04/10
2022/10/04
2022-12-20 00:00:00
04-10-2022"""
import re
dre = re.compile(r"([0-9]+)[\/\-]([0-9]+)[\/\-]([0-9]+)")
for date in dates.split("\n"):
m = dre.match(date)
print( m.group(1) , m.group(2) , m.group(3) )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.