Splitting members of series using regular expressions with strings

Question

I asked a similar question about a week ago, and tried to mess with that code to suit a different purpose, but couldn't seem to make it work.

I want to split a string using month abbreviations as the delimiters (so, I'd have JAN, FEB, MAR, APR, MAY, JUNE, etc)

I tried using

df['a'] = [re.split(r'[JUNE|JULY]+', x) for x in df['a']

as well as some variations on that (adding .group(0) before for x )

I'm guessing my problem is syntax with the delimiters. Looking at the documentation for regular expressions, I should be able to use strings as delimiters, but can only find a way to do it using re.search .

Have also tried

df['a'] = [re.split[(('JUNE', 'JULY'), x).group(0) for x in df['a']]

the data in the series is something like this:

df['a'] = ['ABCJUNE123', 'DEFJULY456', 'DEGJUNE765', 'DEFJUNE345']

and I want:

df['a'] = ['ABC', 'DEF', 'DEG', 'DEF']

What am I missing from my expression?

Answer 1

Your regex would be,

r'JUNE|JULY'

Example:

>>> re.split(r'JUNE|JULY', 'ABCJUNE123')
['ABC', '123']

[JUNE|JULY]+ regex doesn't represent JUNE or JULY .

Splitting members of series using regular expressions with strings

Question

1 answers

solution1
1 ACCPTED 2014-07-23 15:25:19

Splitting members of series using regular expressions with strings

Question

1 answers

solution1 1 ACCPTED 2014-07-23 15:25:19

solution1
1 ACCPTED 2014-07-23 15:25:19