I am trying to extract minimum and maximum dates from a string column in pandas. I have two string formats to extract dates.
First one is:
date_from_string = 'My date format is 7-20 November 2019'
And the second one is:
date_from_string_v2 = 'My date format is 7 October and 7 November 2019'
I want to extract minimum and maximum dates seperately. For example, for the first case:
minimum_date = 20191107
maximum_date = 20191120
or for the second type:
minimum_date = 20191007
maximum_date = 20191107
I have tried a date_converter
function code here . I also tried dateutils
and datefinder
modules. But I could not solve this yet. I need some help for this issue.
Thanks.
Based on your comments, if a string includes just one case and just a single range of dates, a regex could possibly be better than date parser. Date parsers are usually geared at producing a single date, not a range (maybe one of the modules Arkistarvh mentioned does ranges, but I doubt it).
A regex targeted at the strings you supplied would be something like this:
re_month=r'(?:January|February|March|April|May|June|July|August|September|October|November|December)'
re_ranges=r'(?P<range1s>\d{1,2})-(?P<range1e>\d{1,2} +'+re_month+' +\d{4})|(?P<range2s>\d{1,2} +'+re_month+') +and +(?P<range2e>\d{1,2} +'+re_month+' +\d{4})'
#which gives:
>re.search(re_ranges,date_from_string).groups()
('7', '20 November 2019', None, None)
>re.search(re_ranges,date_from_string_v2).groups()
(None, None, '7 October', '7 November 2019')
which can then be parsed by normal date parsers.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.