简体   繁体   中英

Splitting members of series using regular expressions with strings

I asked a similar question about a week ago, and tried to mess with that code to suit a different purpose, but couldn't seem to make it work.

I want to split a string using month abbreviations as the delimiters (so, I'd have JAN, FEB, MAR, APR, MAY, JUNE, etc)

I tried using

df['a'] = [re.split(r'[JUNE|JULY]+', x) for x in df['a']

as well as some variations on that (adding .group(0) before for x )

I'm guessing my problem is syntax with the delimiters. Looking at the documentation for regular expressions, I should be able to use strings as delimiters, but can only find a way to do it using re.search .

Have also tried

df['a'] = [re.split[(('JUNE', 'JULY'), x).group(0) for x in df['a']]

the data in the series is something like this:

df['a'] = ['ABCJUNE123', 'DEFJULY456', 'DEGJUNE765', 'DEFJUNE345']

and I want:

df['a'] = ['ABC', 'DEF', 'DEG', 'DEF']

What am I missing from my expression?

Your regex would be,

r'JUNE|JULY'

Example:

>>> re.split(r'JUNE|JULY', 'ABCJUNE123')
['ABC', '123']

[JUNE|JULY]+ regex doesn't represent JUNE or JULY .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM