[英]Extract email and phone number
I used this code:我使用了这个代码:
#! python3
import pyperclip, re
#Regex for phone number
phoneRegex = re.compile(r'''
(
# phone number
('+1')?
(\s)?
((\d\d\d) | (\ (\d\d\d)))? #area code optional
(\s|-) # first separator
\d\d\d # first 3 digits
- # Separator
\d\d\d\d # last 4 digits
)
''', re.VERBOSE)
#email
emailRegex = re.compile (r'''
[a-zA-Z0-9_.+-]+ # name part
@ # @ symbol
[a-zA-Z0-9_.+-]+ # domain part
''', re.VERBOSE)
# text off the clipbord
text = pyperclip.paste()
# Extract the email / phone from text
extractedPhone = phoneRegex.findall(text)
extractedEmail = emailRegex.findall(text)
allPhoneNumbers = []
for phoneNumber in extractedPhone:
allPhoneNumbers.append(phoneNumber[0])
#print (allPhoneNumbers)
#print (extractedEmail)
# Copy the extracted email/phone to the clipbord
results = '\n'.join(allPhoneNumbers) + '\n'.join(extractedEmail)
pyperclip.copy(results)
and I tried to removed from this list:我试图从这个列表中删除:
+1 (786) 665-5397, +1 (786) 773-7145, +1 (786) 804-8869, +1 (786) 806-5097, +1 (786) 856-7950, +1 (786) 862-2875, +1 (786) 915-7830, +1 (786) 991-4304, +1 (857) 334-1162, +1 (862) 944-0090, +1 (863) 307-5291, +1 (914) 826-4343, +1 (918) 992-1382, +1 (954) 226-7037,
It doesn't print the area code.它不打印区号。 I spend few hours trying to find the problem, but no success.我花了几个小时试图找到问题,但没有成功。 I think it's because the +1
.我认为这是因为+1
。 Can you please help me out?你能帮我一下吗?
As already mentioned by another answer, whitespace in a pattern is significant in RegEx.正如另一个答案已经提到的那样,模式中的空格在 RegEx 中很重要。 You can make RegEx ignore your whitespace, by prefixing your pattern with (?x)
.您可以通过在您的模式前加上(?x)
使 RegEx 忽略您的空格。
I'm not entirely sure, what your aim with ((\\d\\d\\d) | (\\ (\\d\\d\\d)))
is.我不完全确定,您对((\\d\\d\\d) | (\\ (\\d\\d\\d)))
是什么。 Shouldn't you need to escape your parenthesis instead?难道你不需要转义括号吗? ie just (\\(\\d\\d\\d\\))
would be sufficient?即只是(\\(\\d\\d\\d\\))
就足够了? ( I'm not American, so unsure if I'm missing something ). (我不是美国人,所以不确定我是否遗漏了什么)。
Thus the final pattern, would look something like this:因此,最终的模式看起来像这样:
(?x)(
# phone number
(\+1)?
(\s)?
(\(\d\d\d\))? #area code optional
(\s|-) # first separator
\d\d\d # first 3 digits
- # Separator
\d\d\d\d # last 4 digits
)
Assuming you don't need to capture all the various whitespace, then you can additionally turn (\\s)?
假设您不需要捕获所有各种空格,那么您还可以将(\\s)?
into \\s?
成\\s?
or \\s*
if you want to allow more space.或\\s*
如果您想留出更多空间。 You can also change (\\s|-)
into (?:\\s|-)
, where ?:
tells RegEx that it's a "non-capturing group".您还可以将(\\s|-)
更改为(?:\\s|-)
,其中?:
告诉 RegEx 它是一个“非捕获组”。
(?x)(
# phone number
(\+1)?
\s*
(\(\d\d\d\))? #area code optional
(?:\s|-) # first separator
\d\d\d # first 3 digits
- # Separator
\d\d\d\d # last 4 digits
)
At the very least, remove the white spaces around the bar in "(\\d\\d\\d) | (\\ (\\d\\d\\d))"
.至少,删除"(\\d\\d\\d) | (\\ (\\d\\d\\d))"
栏周围的空格。 Regex is not space-agnostic.正则表达式与空间无关。 If you have a space in the pattern, you must have it in the string.如果模式中有空格,则字符串中必须有空格。
If your phone numbers are always formatted as in your example you can easily remove the first 3 characters, parenthesis and minus symbols as follows:如果您的电话号码始终按照您的示例进行格式化,您可以轻松删除前 3 个字符、括号和减号,如下所示:
phoneNumber = '+1 (347) 442-7698'
re.sub('[\(\)\s\-]', '', phoneNumber[3:])
Which gives you:这给了你:
'3474427698'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.