简体   繁体   中英

Unable to print expected name using regex in python

I am trying to print names along with their prefix, but for a given name it is not working as expected as shown below.

Python version 3.7.7

string4 = 'Mr. Venkat Mr Raj Mr.RK Mr T Mrs Venkat **Mrs. Raj** Ms Githa Ms. Seetha'
re.findall('[Mm][r-sR-S].?\s?[a-zA-Z]*\w', string4)

Output:

['Mr. Venkat',
 'Mr Raj',
 'Mr.RK',
 'Mr T',
 'Mrs Venkat',
 'Mrs',
 'Ms Githa',
 'Ms. Seetha']

I would use the pattern \bMr?s?\.?\s*\w+\b here:

string4 = 'Mr. Venkat Mr Raj Mr.RK Mr T Mrs Venkat Mrs. Raj Ms Githa Ms. Seetha'
names = re.findall(r'\bMr?s?\.?\s*\w+\b', string4)
print(names)

This prints:

['Mr. Venkat', 'Mr Raj', 'Mr.RK', 'Mr T', 'Mrs Venkat', 'Mrs. Raj', 'Ms Githa', 'Ms. Seetha']

The reason your current pattern

[Mm][r-sR-S].?\s?[a-zA-Z]*\w

does not match Mrs. Raj is that the above can only match M followed by r , but s is not in your pattern. The character class [r-sR-S] can only match one letter, not two.

r'\b[Mm][rR]?[sS]?\.?\s*\w+\b'

Bonus: This one works also with Miss

r'\b[Mm][rR]?[iI]?[sS]{0,2}\.?\s*\w+\b'
import re
string4 = 'Mr. Venkat Mr Raj Mr.RK Mr T Mrs Venkat Mrs. Raj Ms Githa Ms. Seetha Miss. A'

names = re.findall(r'\b[Mm][rR]?[iI]?[sS]{0,2}\.?\s*\w+\b', string4)
print(names)

Result

['Mr. Venkat', 'Mr Raj', 'Mr.RK', 'Mr T', 'Mrs Venkat', 'Mrs. Raj', 'Ms Githa', 'Ms. Seetha', 'Miss. A']

Update : based on the comment of @tripleee. To avoid false-positive like M. Name , or Mris with my bonus solution, we should list all possible cases

r'\b(?:Mr|Mrs|Ms|Miss)\.?\s*\w+\b'

This is for me is easier to read than previous regexes but we have to add more case if the upper/lower case is not determined.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM