简体   繁体   中英

python - re.split a string with a keyword unless there is a specific keyword preceding it

here is the code:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe" 
splitter = re.split('Sir|Mrs', text)

I want the text to be split by the words 'Sir' or 'Mrs' unless there is the string 'married to' before it.

Current output:

''
'John Doe, married to'
'Jane Doe,'
'Jack Doe,'
'Mary Doe'

Desired output:

''
'John Doe, married to Mrs Jane Doe,'
'Jack Doe,'
'Mary Doe'

I would use an re.findall approach here:

text = "Sir John Doe, married to Mrs Jane Doe, Sir Jack Doe, Mrs Mary Doe"
matches = re.findall(r'\b(?:Sir|Mrs) \w+ \w+(?:, married to (?:Mrs|Sir) \w+ \w+)?', text)
print(matches)

This prints:

['Sir John Doe, married to Mrs Jane Doe', 'Sir Jack Doe', 'Mrs Mary Doe']

The regex pattern used here says to match:

\b(?:Sir|Mrs)                         leading Sir/Mrs
  \w+ \w+                             first and last names
(?:
    , married to (?:Mrs|Sir) \w+ \w+  optional 'married to' followed by another name
)?                                    zero or one time

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM