简体   繁体   中英

How to extract person name using regular expression?

I am new to Regular Expression and I have kind of a phone directory. I want to extract the names out of it. I wrote this (below), but it extracts lots of unwanted text rather than just names. Can you kindly tell me what am i doing wrong and how to correct it? Here is my code:

import re

directory = '''Mark Adamson
Home: 843-798-6698
(424) 345-7659
265-1864 ext. 4467
326-665-8657x2986
E-mail:madamson@sncn.net
Allison Andrews
Home: 612-321-0047
E-mail: AEA@anet.com
Cellular: 612-393-0029
Dustin Andrews'''


nameRegex = re.compile('''
(
[A-Za-z]{2,25}
\s
([A-Za-z]{2,25})+
)

''',re.VERBOSE)

print(nameRegex.findall(directory)) 

the output it gives is:

[('Mark Adamson', 'Adamson'), ('net\nAllison', 'Allison'), ('Andrews\nHome', 'Home'), ('com\nCellular', 'Cellular'), ('Dustin Andrews', 'Andrews')]

Would be really grateful for help!

Your problem is that \\s will also match newlines. Instead of \\s just add a space. That is

name_regex = re.compile('[A-Za-z]{2,25} [A-Za-z]{2,25}')

This works if the names have exactly two words. If the names have more than two words (middle names or hyphenated last names) then you may want to expand this to something like:

name_regex = re.compile(r"^([A-Za-z \-]{2,25})+$", re.MULTILINE)

This looks for one or more words and will stretch from the beginning to end of a line (eg will not just get 'John Paul' from 'John Paul Jones')

我建议尝试下一个正则表达式,它对我有用:

"([A-Z][a-z]+\s[A-Z][a-z]+)"

The following regex works as expected.

Related part of the code:

nameRegex = re.compile(r"^[a-zA-Z]+[',. -][a-zA-Z ]?[a-zA-Z]*$", re.MULTILINE)

print(nameRegex.findall(directory) 

Output:

>>> python3 test.py 
['Mark Adamson', 'Allison Andrews', 'Dustin Andrews']

Try:

nameRegex = re.compile('^((?:\w+\s*){2,})$', flags=re.MULTILINE)

This will only choose complete lines that are made up of two or more names composed of 'word' characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM