I am trying to print names along with their prefix, but for a given name it is not working as expected as shown below.
Python version 3.7.7
string4 = 'Mr. Venkat Mr Raj Mr.RK Mr T Mrs Venkat **Mrs. Raj** Ms Githa Ms. Seetha'
re.findall('[Mm][r-sR-S].?\s?[a-zA-Z]*\w', string4)
Output:
['Mr. Venkat',
'Mr Raj',
'Mr.RK',
'Mr T',
'Mrs Venkat',
'Mrs',
'Ms Githa',
'Ms. Seetha']
I would use the pattern \bMr?s?\.?\s*\w+\b
here:
string4 = 'Mr. Venkat Mr Raj Mr.RK Mr T Mrs Venkat Mrs. Raj Ms Githa Ms. Seetha'
names = re.findall(r'\bMr?s?\.?\s*\w+\b', string4)
print(names)
This prints:
['Mr. Venkat', 'Mr Raj', 'Mr.RK', 'Mr T', 'Mrs Venkat', 'Mrs. Raj', 'Ms Githa', 'Ms. Seetha']
The reason your current pattern
[Mm][r-sR-S].?\s?[a-zA-Z]*\w
does not match Mrs. Raj
is that the above can only match M
followed by r
, but s
is not in your pattern. The character class [r-sR-S]
can only match one letter, not two.
r'\b[Mm][rR]?[sS]?\.?\s*\w+\b'
Bonus: This one works also with Miss
r'\b[Mm][rR]?[iI]?[sS]{0,2}\.?\s*\w+\b'
import re
string4 = 'Mr. Venkat Mr Raj Mr.RK Mr T Mrs Venkat Mrs. Raj Ms Githa Ms. Seetha Miss. A'
names = re.findall(r'\b[Mm][rR]?[iI]?[sS]{0,2}\.?\s*\w+\b', string4)
print(names)
Result
['Mr. Venkat', 'Mr Raj', 'Mr.RK', 'Mr T', 'Mrs Venkat', 'Mrs. Raj', 'Ms Githa', 'Ms. Seetha', 'Miss. A']
Update : based on the comment of @tripleee. To avoid false-positive like M. Name
, or Mris
with my bonus solution, we should list all possible cases
r'\b(?:Mr|Mrs|Ms|Miss)\.?\s*\w+\b'
This is for me is easier to read than previous regexes but we have to add more case if the upper/lower case is not determined.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.