While looping through a csv I'm attempting to normalize dates for a MySQL load (yyyy-mm-dd). I've attempted to search for items that contain two forward slashes, but I've found that's not unique enough to identify a date. Can this be accomplished with a regular expression and looking for items that match a pattern? Any input would be greatly appreciated.
example input:
['1','2','01/02/2015','3','4','1-05-2015','5','Anot/her Ex/ample','6']
example output:
['1','2','2015-01-02','3','4','2015-01-05','5','Anot/her Ex/ample','6']
re.sub(r"(\d+)/(\d+)/(\d+)",r"\3-\1-\2",test_Str)
This should do it for you.
x= ['1','2','01/02/2015','3','4','1-05-2015','5','Anot/her Ex/ample','6']
print [re.sub(r"(\d+)/(\d+)/(\d+)",r"\3-\1-\2",i) for i in x ]
If you know the order of the fields. You can use datetime.strptime:
from datetime import datetime
l = ['1','2','01/02/2015','3','4','1-05-2015','5','Another Ex/ample','6']
out = []
for ele in l:
try:
out.append(datetime.strptime(ele,"%d/%m/%Y").strftime("%Y-%m-%d"))
except ValueError:
out.append(ele)
print(out)
I have no idea how you expect '1-05-2015'
to become '2015-01-05'
as you are only considering forward slashes for dates:
If you have multiple patterns to test:
out = []
for ele in l:
for patt in ["%d/%m/%Y","%d-%m-%Y"]:
try:
p1 = datetime.strptime(ele,patt).strftime("%Y-%m-%d")
if p1:
out.append(p1)
break
except ValueError as e:
print(e)
else:
out.append(ele)
print(out)
['1', '2', '2015-01-02', '3', '4', '2015-01-05', '5', 'Anot/her Ex/ample', '6']
You can also filter on length and only try to parse the correct length strings:
for ele in l:
ln = len(ele)
if 7 <= ln > 10:
out.append(ele)
continue
for patt in ["%d/%m/%Y", "%d-%m-%Y"]:
try:
p1 = datetime.strptime(ele,patt).strftime("%Y-%m-%d")
if p1:
out.append(p1)
break
except ValueError as e:
print(e)
else:
out.append(ele)
A regex is potentially going to match a lot more than just dates so unless you are 100 percent certain you should at least try casting what the regex returns to a datetime object before adding.
I feel @PadraicCunningham's solution would seem the most resilient, and it can easily be extended as follows to cater for the other case:
from datetime import datetime
l = ['1','2','01/02/2015','3','4','1-05-2015','5','Another Ex/ample','6']
out = []
for ele in l:
try:
out.append(datetime.strptime(ele,"%m/%d/%Y").strftime("%Y-%m-%d"))
continue
except ValueError:
pass
try:
out.append(datetime.strptime(ele,"%m-%d-%Y").strftime("%Y-%m-%d"))
except ValueError:
out.append(ele)
print(out)
This would now print out:
['1', '2', '2015-01-02', '3', '4', '2015-01-05', '5', 'Another Ex/ample', '6']
You should also consider the following test cases. These would result in no changes being made.
l = ['40/05/2015', '13/01/2000', '04/31/2001']
You can try code below:
import re
elements = ['1','2','01/02/2015','3','4','1-05-2015','5','Anot/her Ex/ample','6']
for element in elements:
matches = re.match(r"(\d{1,2})[\/-](\d{1,2})[\/-](\d{4})", element)
if (matches is not None):
print '{:0>2}-{:1>2}-{:2}'.format(matches.group(1), matches.group(2), matches.group(3))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.