[英]Extract different format date from a string in Python
I have several strings, and have identified some formats of date on them, and would like to recognize date on each string 我有几个字符串,并且已经确定了它们上日期的某些格式,并且想识别每个字符串上的日期
an_2011_02_12_azar.mp3 ->this is yyyy_mm_dd
20121112_Marcel.mp3 ->this is yyyymmdd
cdani_270607.mp3 ->this is ddmmyy
lica_07_03_15.mp3 ->this is dd_mm_yy
to do so I have: 为此,我有:
foo = """
an_2011_02_12_azar.mp3
20121112_Marcel.mp3
cdani_270607.mp3
lica_07_03_15.mp3
"""
try:
lines = foo.split('\n')
except AttributeError:
lines = x
for line in lines:
print(line)
#deals with 2011_02_12 format
match = re.search(r'\d{4}_\d{2}_\d{2}', line)
date = datetime.datetime.strptime(match.group(), '%Y_%m_%d').date()
print(date)
How to apply several regular expressions so it can recognize dates? 如何应用几个正则表达式以便可以识别日期?
If you remove the underscores: 如果删除下划线:
datestr = line.replace('_', '')
then there would be only two date formats to deal with: yyyymmdd
or ddmmyy
. 那么只有两种日期格式可以处理: yyyymmdd
或ddmmyy
。 Furthermore, every date string would consist of 6 to 8 digits which you could find using the regex pattern r'\\d{8}|\\d{6}'
: 此外,每个日期字符串都将包含6到8位数字,您可以使用正则表达式模式r'\\d{8}|\\d{6}'
:
datestr = re.search(r'\d{8}|\d{6}', datestr).group()
The datestr
could then be parsed with either 然后可以用以下任一方法解析datestr
date = DT.datetime.strptime(datestr, '%d%m%y')
or 要么
date = DT.datetime.strptime(datestr, '%Y%m%d')
The pattern r'\\d{8}|\\d{6}'
would also capture some possibly non-date-like strings, such digits which represent invalid dates. 模式r'\\d{8}|\\d{6}'
还将捕获一些可能不类似于日期的字符串,例如表示无效日期的数字。 We could deal with those cases by using try..except
to catch ValueErrors
. 我们可以使用try..except
来捕获ValueErrors
来处理这些情况。
import re
import datetime as DT
foo = """\
an_2011_02_12_azar.mp3
20121112_Marcel.mp3
cdani_270607.mp3
lica_07_03_15.mp3
an_2011_13_12_azar.mp3
"""
for line in foo.splitlines():
datestr = line.replace('_', '')
datestr = re.search(r'\d{8}|\d{6}', datestr).group()
try:
# %y matches 2-digit years
date = DT.datetime.strptime(datestr, '%d%m%y')
except ValueError:
try:
# %Y matches 4-digit years
date = DT.datetime.strptime(datestr, '%Y%m%d')
except ValueError:
# handle the error case
date = None
print('{:23} --> {}'.format(line, date))
yields 产量
an_2011_02_12_azar.mp3 --> 2011-02-12 00:00:00
20121112_Marcel.mp3 --> 2012-11-12 00:00:00
cdani_270607.mp3 --> 2007-06-27 00:00:00
lica_07_03_15.mp3 --> 2015-03-07 00:00:00
an_2011_13_12_azar.mp3 --> None
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.