I have a dataframe containing sentences. The first sentence (the title) is followed by the text. They were merged without a space.
I would like to slit the text into two parts (sentence 1 and sentence 2) based on the last occurence of a capital letter following a lowercase letter without a space in between (out of curiosity I would also be interested in a solution based on the first appearance).
The solution is supposed to be stored in the original dataframe.
I tried
re.findall('(?<!\s)[A-ZÄÖÜ](?:[a-zäöüß\s]|(?<=\s)[A-ZÄÖÜ])*')
but could not work it out.
import pandas
from pandas import DataFrame
Sentences = {'Sentence': ['RnB music all nightI love going out','Example sentence with no meaningThe space is missing.','Third exampleAlso numbers 1.23 and signs -. should appear in column 2.', 'BestMusic tonightAt 12:00.']}
df = DataFrame(Sentences,columns= ['Sentence'])
print(df)
As the split is supposed to be carried out at the last occurrence. The words RnB
and BestMusic
in the example given are not supposed to trigger the split.
df.Sentence1 = ['RnB music all night','Example sentence with no meaning','Third example', 'BestMusic tonight']
df.Sentence2 = ['I love going out','The space is missing.', 'Also numbers 1.23 and signs -. should appear in column 2.' ,'At 12:00.']
Here is one way
Yourdf=df.Sentence.str.split(r'(.*[a-z])(?=[A-Z])',n=-1,expand=True)[[1,2]]
Yourdf
Out[610]:
1 2
0 RnB music all night I love going out
1 Example sentence with no meaning The space is missing.
2 Third example Also numbers 1.23 and signs -. should appear i...
3 BestMusic tonight At 12:00.
This only works if AZ is all your capital letters:
pattern = r'(?P<Sentence1>.*)(?P<Sentence2>[A-Z].*)$'
df['Sentence'].str.extract(pattern)
gives:
Sentence1 Sentence2
0 RnB music all night I love going out
1 Example sentence with no meaning The space is missing.
2 Third example Also numbers 1.23 and signs -. should appear i...
3 BestMusic tonight At 12:00.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.