简体   繁体   中英

How to extract minimum and maximum dates from string with regex in Python?

I am trying to extract minimum and maximum dates from a string column in pandas. I have two string formats to extract dates.

First one is:


date_from_string = 'My date format is 7-20 November 2019'

And the second one is:


date_from_string_v2 = 'My date format is 7 October and 7 November 2019'

I want to extract minimum and maximum dates seperately. For example, for the first case:

minimum_date = 20191107
maximum_date = 20191120

or for the second type:

minimum_date = 20191007
maximum_date = 20191107

I have tried a date_converter function code here . I also tried dateutils and datefinder modules. But I could not solve this yet. I need some help for this issue.

Thanks.

Based on your comments, if a string includes just one case and just a single range of dates, a regex could possibly be better than date parser. Date parsers are usually geared at producing a single date, not a range (maybe one of the modules Arkistarvh mentioned does ranges, but I doubt it).

A regex targeted at the strings you supplied would be something like this:

re_month=r'(?:January|February|March|April|May|June|July|August|September|October|November|December)'

re_ranges=r'(?P<range1s>\d{1,2})-(?P<range1e>\d{1,2} +'+re_month+' +\d{4})|(?P<range2s>\d{1,2} +'+re_month+') +and +(?P<range2e>\d{1,2} +'+re_month+' +\d{4})'

#which gives:

>re.search(re_ranges,date_from_string).groups()
('7', '20 November 2019', None, None)
>re.search(re_ranges,date_from_string_v2).groups()
(None, None, '7 October', '7 November 2019')

which can then be parsed by normal date parsers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM