简体   繁体   English

用于检查所有类型的日期格式的正则表达式模式

[英]Regex pattern for checking all type of Date format

I want to check for date values present in which column of dataframe and convert the column to datetime because column type can be object initially, but dates can be in any format as below.我想检查 dataframe 的哪一列中存在的日期值,并将该列转换为日期时间,因为列类型最初可以是 object,但日期可以是以下任何格式。 So I am looking for a regex pattern which will match all date type formats.所以我正在寻找一个匹配所有日期类型格式的正则表达式模式。

  1. 04/10/2022 04/10/2022
  2. 10/04/2022 10/04/2022
  3. 2022/04/10 2022/04/10
  4. 2022/10/04 2022/10/04
  5. 2022-12-20 00:00:00 2022-12-20 00:00:00
  6. 04-10-2022 04-10-2022

Can someone please suggest a regex pattern which will match all date formats?有人可以建议一个匹配所有日期格式的正则表达式模式吗?

I have tried below code:我试过下面的代码:

    for columnIndex, colName in enumerate(df):

        df2 = pd.DataFrame()
        df2['test'] = df[colName]
        count = 0
        for i, j in df2.iteritems():
            for k in j:
                if re.match("[0-9]{2}/[0-9]{2}/[0-9]{4}", str(k)):
                    count = count+1
        if(count>5):
            df[colName] = pd.to_datetime(df[colName])
        print(df.dtypes)

Considering the following dataframe df with all date formats indicated by OP in the question考虑以下 dataframe df以及问题中 OP 指示的所有日期格式

df = pd.DataFrame({'date': ['04/10/2022', '10/04/2022', '2022/04/10', '2022/10/04', '2022-12-20 00:00:00', '04-10-2022']})

[Out]:
                  date
0           04/10/2022
1           10/04/2022
2           2022/04/10
3           2022/10/04
4  2022-12-20 00:00:00
5           04-10-2022

Assuming the goal is to convert to datetime, one can use pandas.to_datetime .假设目标是转换为日期时间,可以使用pandas.to_datetime This has the parameter infer_datetime_format that one can use as follows这具有可以按如下方式使用的参数infer_datetime_format

df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)

[Out]:

        date
0 2022-04-10
1 2022-10-04
2 2022-04-10
3 2022-10-04
4 2022-12-20
5 2022-04-10

For this case, it does the work.对于这种情况,它完成了工作。


Note:笔记:

Why not simply use pandas.to_datetime without providing any format?为什么不简单地使用pandas.to_datetime而不提供任何格式?

for col in df.columns:
    df[col] = pd.to_datetime(df[col])

# Output: #Output:

print(df)
        Col1       Col2       Col3       Col4
0 2022-04-10        NaT        NaT        NaT
1        NaT 2022-10-04        NaT        NaT
2        NaT        NaT 2022-04-10        NaT
3 2022-10-04        NaT        NaT        NaT
4        NaT        NaT        NaT 2022-12-20
5 2022-04-10        NaT        NaT        NaT

# Input used: # 使用的输入:

         Col1        Col2        Col3                 Col4
0  04/10/2022         NaN         NaN                  NaN
1         NaN  10/04/2022         NaN                  NaN
2         NaN         NaN  2022/04/10                  NaN
3  2022/10/04         NaN         NaN                  NaN
4         NaN         NaN         NaN  2022-12-20 00:00:00
5  04-10-2022         NaN         NaN                  NaN      

Here is an idea.这是一个想法。 With this code you will match all the formats, however you can't distinguish between days and month if the date is, say 05/05/2022 .使用此代码,您将匹配所有格式,但是如果日期是05/05/2022 ,则您无法区分天数和月份。 But that is an issue that goes beyond the scope of the question.但这是一个超出问题 scope 的问题。

The regexp I came up with looks for groups of one or more numbers [0-9]+ separated by either the dash or the slash '[/-]', and I used the backslash to escape the special symbols.我想出的正则表达式查找由破折号或斜杠“[/-]”分隔的一个或多个数字[0-9]+的组,我使用反斜杠来转义特殊符号。

dates="""04/10/2022
10/04/2022
2022/04/10
2022/10/04
2022-12-20 00:00:00
04-10-2022"""

import re
dre = re.compile(r"([0-9]+)[\/\-]([0-9]+)[\/\-]([0-9]+)")

for date in dates.split("\n"):
    m = dre.match(date)
    print( m.group(1) , m.group(2) , m.group(3) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM