解析 Python 中乱七八糟的日期字符串

Question

R has a very nice workflow that allows user to set the date/month/year order but otherwise handles messiness of user-input date strings: R 有一个非常好的工作流程，允许用户设置日期/月份/年份顺序，但可以处理用户输入日期字符串的混乱情况：

date_str = c('05/03/2022', '14/03/2022', '14.03.2022', '14/03.2022')
lubridate::parse_date_time(date_str, orders = 'dmy')
#> [1] "2022-03-05 UTC" "2022-03-14 UTC" "2022-03-14 UTC" "2022-03-14 UTC"

The closest I've found in Python is:我在 Python 中找到的最接近的是：

from dateparser import parse
date_str = ['05/03/2022', '14/03/2022', '14.03.2022', '14/03.2022']
list(map(lambda l: parse(l, date_formats = ['dmy']), date_str))
[datetime.datetime(2022, 5, 3, 0, 0),
 datetime.datetime(2022, 3, 14, 0, 0),
 datetime.datetime(2022, 3, 14, 0, 0),
 datetime.datetime(2022, 3, 14, 0, 0)]

which handles messiness but transposes day/month in the first observation, I think because date_formats prioritises explicitly defined formats and otherwise reverts to the (silly) default US month-day-year format?它处理混乱但在第一次观察中转换日/月，我认为是因为date_formats优先考虑明确定义的格式，否则恢复为（愚蠢的）默认美国月-日-年格式？

Is there a nice implementation in Python that can be relied upon to handle messiness as well as assume a date/month ordering? Python 中是否有一个很好的实现可以依赖它来处理混乱以及假设日期/月份排序？

Answer 1

Well, if dateparser otherwise does what you like, you can gently wrap it to prioritize the format you like:好吧，如果dateparser以其他方式做你喜欢的事情，你可以轻轻地包装它以优先考虑你喜欢的格式：

import dateparser
import datetime
import re

dmy_re = re.compile(r"^(?P<day>\d+)/(?P<month>\d+)/(?P<year>\d+)$")


def parse_with_dmy_priority(ds):
    dmy_match = dmy_re.match(ds)
    if dmy_match:
        return datetime.datetime(**{k: int(v) for (k, v) in dmy_match.groupdict().items()})
    return dateparser.parse(ds)


in_data = ['05/03/2022', '14/03/2022', '14.03.2022', '14/03.2022']
print([parse_with_dmy_priority(d) for d in in_data])

[
  datetime.datetime(2022, 3, 5, 0, 0), 
  datetime.datetime(2022, 3, 14, 0, 0),
  datetime.datetime(2022, 3, 14, 0, 0), 
  datetime.datetime(2022, 3, 14, 0, 0),
]

This generalizes nicely too:这也很好地概括了：

def parse_date(ds, regexps=()):
    for regexp in regexps:
        match = regexp.match(ds)
        if match:
            return datetime.datetime(**{k: int(v) for (k, v) in match.groupdict().items()})
    return dateparser.parse(ds)


print([parse_date(d, regexps=[dmy_re]) for d in in_data])

解析 Python 中乱七八糟的日期字符串

问题描述

1 个解决方案

解决方案1
2 2022-03-24 14:56:01

解析 Python 中乱七八糟的日期字符串

问题描述

1 个解决方案

解决方案1 2 2022-03-24 14:56:01

解决方案1
2 2022-03-24 14:56:01