简体   繁体   中英

Using Python and Regex to extract different formats of dates

I have the following code to match the dates

import re
date_reg_exp2 = re.compile(r'\d{2}([-/.])(\d{2}|[a-zA-Z]{3})\1(\d{4}|\d{2})|\w{3}\s\d{2}[,.]\s\d{4}')
matches_list = date_reg_exp2.findall("23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015")
print matches_list

The output I expect is

["23-SEP-2015","23-09-2015","23-09-15","Sep 23, 2015"]

What I am getting is:

[('-', 'SEP', '2015'), ('-', '09', '2015'), ('-', '09', '15'), ('', '', '')]

Please check the link for regex here .

The problem you have is that re.findall returns captured texts only excluding Group 0 (the whole match). Since you need the whole match (Group 0), you just need to use re.finditer and grab the group() value:

matches_list = [x.group() for x in date_reg_exp2.finditer("23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015")]

See IDEONE demo

re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings... If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

re.finditer(pattern, string, flags=0)
Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string.

You could try this regex

date_reg_exp2 = re.compile(r'(\d{2}(/|-|\.)\w{3}(/|-|\.)\d{4})|([a-zA-Z]{3}\s\d{2}(,|-|\.|,)?\s\d{4})|(\d{2}(/|-|\.)\d{2}(/|-|\.)\d+)')

Then use re.finditer()

for m in re.finditer(date_reg_exp2,"23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015"):
print m.group()

The Output will be

23-SEP-2015
23-09-2015
23-09-15
Sep 23, 2015

try this

# The first (\d{2}-([A-Z]{3}|\d{2})-(\d{4}|\d{2})) group tries to match the first three types of dates
# rest will match the last type
dates = "23-SEP-2015 and 23-09-2015 and 23-09-15 and Sep 23, 2015"
for x in re.finditer('((\d{2}-([A-Z]{3}|\d{2})-(\d{4}|\d{2}))|([a-zA-Z]{3}\s\d{1,2},\s\d{4}))', dates):
    print x.group(1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM