简体   繁体   中英

How to change string date range to start and end date?

I'm trying to separate a string date range (Ex. 1 to 30 of July) into a start and end date as a datetime (Ex. 07/01/2019 and 07/30/2019). How do I convert it?

I've tried braking the string into pieces but I believe the only way of doing it using regex.

Examples of strings in columns:

  "1 to 30 of July" "10 to 12 of August" "20 of January to 10 of February" 

I've used ^(\\d{1,2})\\s([az]{2})\\s(\\d{1,2})\\s([az]{2})\\s(\\w{1,13}) but I' missing the D of M to D of M.

All of them are in 2019

We can use regex with Series.str.extractall to extract the numbers and the months from your data. Then we finally concat the strings together:

days = df['Date'].str.extractall('(\d+)').unstack()

months = '('+'|'.join(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])+')'
monthnames = df['Date'].str.extractall(months).unstack().ffill(axis=1)

df = days + ' ' + monthnames.ffill(axis=1)
df.columns=['date_start', 'date_end']

Output

   date_start     date_end
0      1 July      30 July
1   10 August    12 August
2  20 January  10 February

If you want them in date format without month names:

df.apply(lambda x: pd.to_datetime(x, format='%d %B').dt.strftime('%m-%d'))

  date_start date_end
0      07-01    07-30
1      08-10    08-12
2      01-20    02-10

The following will extract the days and months:

# update your month list properly
months = ['January', 'February', 'July', 'August']

# pattern
pattern = f'(\d+) (?:of ({m}))?\s?to (\d+).*({m})'

# extract:
s.str.extract(patterns)

Output:

    0        1   2         3
0   1      NaN  30      July
1  10      NaN  12    August
2  20  January  10  February

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM