简体   繁体   中英

Clean regex output required

I am new to regex and can't work around an issue. With this code, I need to extract date given in multiple formats. The regex code is giving me additional quote marks and commas. Is there a way to remove those and extract date only?

Code:

import re

text = '''04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
'''

xx = '(\\d{1,2}\[/-\]\\d{1,2}\[/-\]\\d{2,4})|(\[1|2\]\\d{3})'

matches = re.findall(xx, text)
matches

Output:

[('04/20/2009', ''),
 ('04/20/09', ''),
 ('4/20/09', ''),
 ('4/3/09', ''),
 ('', '2009'),
 ('', '2009'),
 ('', '2009'),
 ('', '2009'),
 ('', '2009')]

This doesn't exactly answer the question, but maybe consider using the dateutil module, which already has a built in option to parse many different formats:

import dateutil

text = '''04/20/2009; 04/20/09; 4/20/09; 4/3/09
Mar-20-2009; Mar 20, 2009; March 20, 2009; Mar. 20, 2009; Mar 20 2009;
'''

# Remove whitespace and split the dates by semicolons
text = text.strip('\n;').replace('\n', ';')

# Parse each date individually
dates = [dateutil.parser.parse(date) for date in text.split(';')]

You can use the join method to concatenate the elements in the matches list into a single string.

For example, you can use the following code to extract the date strings from the matches list and join them into a single string:

date_strings = [date[0] or date[1] for date in matches]
date_string = ' '.join(date_strings)

This will create a new list date_strings that contains only the date strings from the matches list, and then use the join method to concatenate the elements in the list into a single string, separated by a single space character.

From what I understand you're generating a list of tuples, but what you want is just want a text string that's a vertical list of the results?

You can accomplish that by first joining the individual tuple contents together with an empty string, then joining the list of the resulting strings together with a new-line character:

print "\n".join(map(''.join, matches));

04/20/2009
04/20/09
4/20/09
4/3/09
2009
2009
2009
2009
2009

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM