简体   繁体   中英

How to get all emails from raw strings

I tried this code:

contents = 'alokm.014@gmail.yahoo.com.....thankyou'
    match = re.findall(r'[\w\.-]+@[\w\.-]+', contents)
    print match

Result:

alokm.014@gmail.yahoo.com.....thankyou

I want to remove ....thankyou from my email

Is it possible to obtain only alok.014@gmail.yahoo.com and one more thing the content list is bigger so I want some changes in re.findall(r'[\\w\\.-]+@[\\w\\.-]+', contents) if it is possible.

I don't know about python, but languages like Java have libraries that help validate URLs and email addresses. Alternately, you can use a well-vetted regex expression.

My suggestion would be to keep removing the end of the string based on dots until the string validates. So test the string, and if it doesn't validate as an email, read the string from the right until you encounter a period, then drop the period and everything to the right and start again.

So you'd loop through like this

alokm.014@gmail.yahoo.com.....thankyou
alokm.014@gmail.yahoo.com....
alokm.014@gmail.yahoo.com...
alokm.014@gmail.yahoo.com..
alokm.014@gmail.yahoo.com.
alokm.014@gmail.yahoo.com

At which point it would validate as a real email address. Yes, it's slow. Yes, it can be tricked. But it will work most of the time based on the little info (possible strings) given.

Interesting question! And, here's a Python Regex program to help make extraction of email from the contents possible:

import re

contents = 'alokm.014@gmail.yahoo.com.....thankyou'

emailRegex = re.compile(r'''
[a-zA-Z0-9.]+         # username
@                     # @ symbol
[a-zA-Z0-9.]+\.com    # domain             
''', re.VERBOSE)      # re.VERBOSE helps make Regex multi-line with comments for better readability

extractEmail = emailRegex.findall(contents)
print(extractEmail)

Output will be:

['alokm.014@gmail.yahoo.com']

I will now suggest that you refer to this Regex-HowTo doc to understand what's happening in this program and to come up with a better version that could extract all the emails from your larger text.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM