Python String split by specific pattern with Indices

Question

I'm trying to split sentences from different characters, where each word has its own tag, and store with indices, and names can be Mike or Steve with different lengths. Content can be multiple languages like Chinese or Japanese, etc.

content = "A:Hello.B:How are you?A:I'm fine."

which I want to be like:

[0]A:Hello.       , 0:7
[1]B:How are you? , 8:21
[2]A:I'm fine.    ,22:33

Answer 1

You can use re.split as follow:

import re
s = "A:Hello.B:How are you?A:I'm fine."
t = re.split(r'[.?]', s)
print(t)

that gives

['A:Hello', 'B:How are you', "A:I'm fine", '']

Answer 2

You can use re.finditer for the task:

import re

content = "A:Hello.B:How are you?A:I'm fine."

for idx, i in enumerate(re.finditer(r'(.*?[.?])(?=[A-Z]|\Z)', content)):
    print('[{}]{:<20}, {}:{}'.format(idx, i.group(1), i.start(), i.end()-1))

Prints:

[0]A:Hello.            , 0:7
[1]B:How are you?      , 8:21
[2]A:I'm fine.         , 22:32

Python String split by specific pattern with Indices

Question

2 answers

solution1
1 2020-10-31 17:20:39

solution2
1 2020-10-31 17:24:59

Python String split by specific pattern with Indices

Question

2 answers

solution1 1 2020-10-31 17:20:39

solution2 1 2020-10-31 17:24:59

solution1
1 2020-10-31 17:20:39

solution2
1 2020-10-31 17:24:59