I'm trying to separate a string date range (Ex. 1 to 30 of July) into a start and end date as a datetime (Ex. 07/01/2019 and 07/30/2019). How do I convert it?
I've tried braking the string into pieces but I believe the only way of doing it using regex.
Examples of strings in columns:
"1 to 30 of July" "10 to 12 of August" "20 of January to 10 of February"
I've used ^(\\d{1,2})\\s([az]{2})\\s(\\d{1,2})\\s([az]{2})\\s(\\w{1,13})
but I' missing the D of M to D of M.
All of them are in 2019
We can use regex with Series.str.extractall
to extract the numbers and the months from your data. Then we finally concat the strings together:
days = df['Date'].str.extractall('(\d+)').unstack()
months = '('+'|'.join(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])+')'
monthnames = df['Date'].str.extractall(months).unstack().ffill(axis=1)
df = days + ' ' + monthnames.ffill(axis=1)
df.columns=['date_start', 'date_end']
Output
date_start date_end
0 1 July 30 July
1 10 August 12 August
2 20 January 10 February
If you want them in date format without month names:
df.apply(lambda x: pd.to_datetime(x, format='%d %B').dt.strftime('%m-%d'))
date_start date_end
0 07-01 07-30
1 08-10 08-12
2 01-20 02-10
The following will extract the days and months:
# update your month list properly
months = ['January', 'February', 'July', 'August']
# pattern
pattern = f'(\d+) (?:of ({m}))?\s?to (\d+).*({m})'
# extract:
s.str.extract(patterns)
Output:
0 1 2 3
0 1 NaN 30 July
1 10 NaN 12 August
2 20 January 10 February
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.