What I need to do would be something like:
df[col].str.split(my_regexp, re.IGNORECASE, expand=True)
However, the pandas DataFrame.str.split
method doesn't have the possibility to add regexp flags.
Since I need to expand the results, I cannot do something like
df.apply(lambda x: re.split(my_regexp, x[col], flags=re.IGNORECASE), axis=1, result='expand')
because the lists don't have the same length.
What I would need would be a way to either make re.split
returns all lists of the same length, either pass the re.IGNORECASE
in the Series.str.split
method. Or maybe an even better way?
Thank you everyone!
Edit: Here is some data for a better explanation
series = pd.Series([
'First paRt foo second part FOO third part',
'test1 FoO test2',
'hi1 bar HI2',
'This is a Test',
'first baR second BAr third',
'final'
])
Should return with the regexp r'foo|bar'
0 1 2
0 First paRt second part third part
1 test1 test2 None
2 hi1 HI2 None
3 This is a Test None None
4 first second third
5 final None None
series.apply(lambda x: ', '.join(re.split(r'foo|bar', x, flags=re.IGNORECASE)))\
.str.split(', ', expand=True)
Output
0 1 2
0 First paRt second part third part
1 test1 test2 None
2 hi1 HI2 None
3 This is a Test None None
4 first second third
5 final None None
As stated in the comments, broadcast your series to lowercase using str.lower()
and then use str.split
:
series.str.lower().str.split(r'foo|bar', expand=True)
Output
0 1 2
0 first part second part third part
1 test1 test2 None
2 hi1 hi2 None
3 this is a test None None
4 first second third
5 final None None
series.str.lower().str.split(r'foo|bar', expand=True).apply(lambda x: x.str.strip())
Output
0 1 2
0 first part second part third part
1 test1 test2 None
2 hi1 hi2 None
3 this is a test None None
4 first second third
5 final None None
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.