here is the code:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
splitter = re.split('Sir|Mrs', text)
I want the text to be split by the words 'Sir' or 'Mrs' unless there is the string 'married to' before it.
Current output:
''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'
Desired output:
''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'
I would use an re.findall
approach here:
text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'\b(?:Sir|Mrs) \w+ \w+(?:, married to (?:Mrs|Sir) \w+ \w+)?', text)
print(matches)
This prints:
['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']
The regex pattern used here says to match:
\b(?:Sir|Mrs) leading Sir/Mrs
\w+ \w+ first and last names
(?:
, married to (?:Mrs|Sir) \w+ \w+ optional 'married to' followed by another name
)? zero or one time
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.