简体   繁体   中英

Python - Regular Expression to find dates in list

While looping through a csv I'm attempting to normalize dates for a MySQL load (yyyy-mm-dd). I've attempted to search for items that contain two forward slashes, but I've found that's not unique enough to identify a date. Can this be accomplished with a regular expression and looking for items that match a pattern? Any input would be greatly appreciated.

example input:

['1','2','01/02/2015','3','4','1-05-2015','5','Anot/her Ex/ample','6']

example output:

['1','2','2015-01-02','3','4','2015-01-05','5','Anot/her Ex/ample','6']
re.sub(r"(\d+)/(\d+)/(\d+)",r"\3-\1-\2",test_Str)

This should do it for you.

x= ['1','2','01/02/2015','3','4','1-05-2015','5','Anot/her Ex/ample','6']
print [re.sub(r"(\d+)/(\d+)/(\d+)",r"\3-\1-\2",i) for i in x ]

If you know the order of the fields. You can use datetime.strptime:

from datetime import datetime

l =  ['1','2','01/02/2015','3','4','1-05-2015','5','Another Ex/ample','6']

out = []

for ele in l:
    try:
        out.append(datetime.strptime(ele,"%d/%m/%Y").strftime("%Y-%m-%d"))    
    except ValueError:
        out.append(ele)

print(out)

I have no idea how you expect '1-05-2015' to become '2015-01-05' as you are only considering forward slashes for dates:

If you have multiple patterns to test:

out = []

for ele in l:
    for patt in ["%d/%m/%Y","%d-%m-%Y"]:
        try:
            p1 = datetime.strptime(ele,patt).strftime("%Y-%m-%d")
            if p1:
                out.append(p1)
                break
        except ValueError as e:
            print(e)
    else:
        out.append(ele)

print(out)
['1', '2', '2015-01-02', '3', '4', '2015-01-05', '5', 'Anot/her Ex/ample', '6']

You can also filter on length and only try to parse the correct length strings:

for ele in l:
    ln = len(ele)
    if 7 <= ln > 10:
        out.append(ele)
        continue
    for patt in ["%d/%m/%Y", "%d-%m-%Y"]:
        try:
            p1 = datetime.strptime(ele,patt).strftime("%Y-%m-%d")
            if p1:
                out.append(p1)
                break
        except ValueError as e:
            print(e)
    else:
        out.append(ele)

A regex is potentially going to match a lot more than just dates so unless you are 100 percent certain you should at least try casting what the regex returns to a datetime object before adding.

I feel @PadraicCunningham's solution would seem the most resilient, and it can easily be extended as follows to cater for the other case:

from datetime import datetime

l =  ['1','2','01/02/2015','3','4','1-05-2015','5','Another Ex/ample','6']

out = []

for ele in l:

    try:
        out.append(datetime.strptime(ele,"%m/%d/%Y").strftime("%Y-%m-%d")) 
        continue
    except ValueError:
        pass

    try:
        out.append(datetime.strptime(ele,"%m-%d-%Y").strftime("%Y-%m-%d")) 
    except ValueError:
        out.append(ele)

print(out)

This would now print out:

['1', '2', '2015-01-02', '3', '4', '2015-01-05', '5', 'Another Ex/ample', '6']

You should also consider the following test cases. These would result in no changes being made.

l = ['40/05/2015', '13/01/2000', '04/31/2001']

You can try code below:

import re

elements = ['1','2','01/02/2015','3','4','1-05-2015','5','Anot/her Ex/ample','6']
for element in elements:
    matches = re.match(r"(\d{1,2})[\/-](\d{1,2})[\/-](\d{4})", element)
    if (matches is not None):
        print '{:0>2}-{:1>2}-{:2}'.format(matches.group(1), matches.group(2), matches.group(3))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM