Example code:
In [1]: import pandas as pd
In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])
In [3]: serie.str.split('#', expand=True)
Out[3]:
0 1 2 3
0 this is a test
1 another test None None
Is it possible to split without stripping the split criteria string? Output of the above would be:
Out[3]:
0 1 2 3
0 this #is #a #test
1 another #test None None
EDIT 1 : Real use case would be to keep matching pattern, for instance:
serie.str.split(r'\n\*\*\* [A-Z]+', expand=True)
And [AZ]+ are processing steps in my case, which i want to keep for further processing.
You could split by using a positive look ahead. So the split point will be the point just before the postivie look ahead expression.
import pandas as pd
serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(?=#)', expand=True))
OUTPUT
0 1 2 3
0 this #is #a #test
1 another #test None None
Try str.split('(#[az]+)', expand=True)
Ex:
serie = pd.Series(['this#is#a#test', 'another#test'])
print(serie.str.split('(#[a-z]+)', expand=True)
Just simply add it at each line:
In [1]: import pandas as pd
In [2]: serie = pd.Series(['this#is#a#test', 'another#test'])
In [3]: serie.str.split('#', expand=True) + '#'
Out[3]:
0 1 2 3
0 this# is# a# test#
1 another# test# NaN NaN
In [4]: '#' + serie.str.split('#', expand=True)
Out[4]:
0 1 2 3
0 #this #is #a #test
1 #another #test NaN NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.