简体   繁体   中英

Pandas str.split without stripping split pattern

Example code:

In [1]: import pandas as pd

In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])

In [3]: serie.str.split('#', expand=True)
Out[3]:
         0     1     2     3
0     this    is     a  test
1  another  test  None  None

Is it possible to split without stripping the split criteria string? Output of the above would be:

Out[3]:
         0     1     2     3
0     this   #is    #a #test
1  another #test  None  None

EDIT 1 : Real use case would be to keep matching pattern, for instance:

serie.str.split(r'\n\*\*\* [A-Z]+', expand=True)

And [AZ]+ are processing steps in my case, which i want to keep for further processing.

You could split by using a positive look ahead. So the split point will be the point just before the postivie look ahead expression.

import pandas as pd

serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(?=#)', expand=True))

OUTPUT

         0      1     2      3
0     this    #is    #a  #test
1  another  #test  None   None

Try str.split('(#[az]+)', expand=True)

Ex:

serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(#[a-z]+)', expand=True)

Just simply add it at each line:

In [1]: import pandas as pd

In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])

In [3]: serie.str.split('#', expand=True) + '#'
Out[3]:
          0      1    2      3
0     this#    is#   a#  test#
1  another#  test#  NaN    NaN

In [4]: '#' + serie.str.split('#', expand=True)
Out[4]:
          0      1    2      3
0     #this    #is   #a  #test
1  #another  #test  NaN    NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM