简体   繁体   English

提取电子邮件和电话号码

[英]Extract email and phone number

I used this code:我使用了这个代码:

#! python3
import pyperclip, re

#Regex for phone number
phoneRegex = re.compile(r'''

(
# phone number
('+1')?
(\s)?
((\d\d\d) | (\ (\d\d\d)))?  #area code optional
(\s|-) # first separator
\d\d\d # first 3 digits
- # Separator
\d\d\d\d # last 4 digits
)
''', re.VERBOSE)

#email
emailRegex = re.compile (r'''
[a-zA-Z0-9_.+-]+ # name part
@ # @ symbol
[a-zA-Z0-9_.+-]+ # domain part
''', re.VERBOSE)

# text off the clipbord
text = pyperclip.paste()

# Extract the email / phone from text
extractedPhone = phoneRegex.findall(text)
extractedEmail = emailRegex.findall(text)

allPhoneNumbers = []
for phoneNumber in extractedPhone:
    allPhoneNumbers.append(phoneNumber[0])

#print (allPhoneNumbers)
#print (extractedEmail)

# Copy the extracted email/phone to the clipbord
results = '\n'.join(allPhoneNumbers) + '\n'.join(extractedEmail)
pyperclip.copy(results)

and I tried to removed from this list:我试图从这个列表中删除:

+1 (786) 665-5397, +1 (786) 773-7145, +1 (786) 804-8869, +1 (786) 806-5097, +1 (786) 856-7950, +1 (786) 862-2875, +1 (786) 915-7830, +1 (786) 991-4304, +1 (857) 334-1162, +1 (862) 944-0090, +1 (863) 307-5291, +1 (914) 826-4343, +1 (918) 992-1382, +1 (954) 226-7037,

It doesn't print the area code.它不打印区号。 I spend few hours trying to find the problem, but no success.我花了几个小时试图找到问题,但没有成功。 I think it's because the +1 .我认为这是因为+1 Can you please help me out?你能帮我一下吗?

As already mentioned by another answer, whitespace in a pattern is significant in RegEx.正如另一个答案已经提到的那样,模式中的空格在 RegEx 中很重要。 You can make RegEx ignore your whitespace, by prefixing your pattern with (?x) .您可以通过在您的模式前加上(?x)使 RegEx 忽略您的空格。

I'm not entirely sure, what your aim with ((\\d\\d\\d) | (\\ (\\d\\d\\d))) is.我不完全确定,您对((\\d\\d\\d) | (\\ (\\d\\d\\d)))是什么。 Shouldn't you need to escape your parenthesis instead?难道你不需要转义括号吗? ie just (\\(\\d\\d\\d\\)) would be sufficient?即只是(\\(\\d\\d\\d\\))就足够了? ( I'm not American, so unsure if I'm missing something ). 我不是美国人,所以不确定我是否遗漏了什么)。

Thus the final pattern, would look something like this:因此,最终的模式看起来像这样:

(?x)(
# phone number
(\+1)?
(\s)?
(\(\d\d\d\))?  #area code optional
(\s|-) # first separator
\d\d\d # first 3 digits
- # Separator
\d\d\d\d # last 4 digits
)

Demo演示

Assuming you don't need to capture all the various whitespace, then you can additionally turn (\\s)?假设您不需要捕获所有各种空格,那么您还可以将(\\s)? into \\s?\\s? or \\s* if you want to allow more space.\\s*如果您想留出更多空间。 You can also change (\\s|-) into (?:\\s|-) , where ?: tells RegEx that it's a "non-capturing group".您还可以将(\\s|-)更改为(?:\\s|-) ,其中?:告诉 RegEx 它是一个“非捕获组”。

(?x)(
# phone number
(\+1)?
\s*
(\(\d\d\d\))?  #area code optional
(?:\s|-) # first separator
\d\d\d # first 3 digits
- # Separator
\d\d\d\d # last 4 digits
)

At the very least, remove the white spaces around the bar in "(\\d\\d\\d) | (\\ (\\d\\d\\d))" .至少,删除"(\\d\\d\\d) | (\\ (\\d\\d\\d))"栏周围的空格。 Regex is not space-agnostic.正则表达式与空间无关。 If you have a space in the pattern, you must have it in the string.如果模式中有空格,则字符串中必须有空格。

If your phone numbers are always formatted as in your example you can easily remove the first 3 characters, parenthesis and minus symbols as follows:如果您的电话号码始终按照您的示例进行格式化,您可以轻松删除前 3 个字符、括号和减号,如下所示:

phoneNumber = '+1 (347) 442-7698'
re.sub('[\(\)\s\-]', '', phoneNumber[3:])

Which gives you:这给了你:

'3474427698'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM