简体   繁体   中英

Define a function to parse an email adress

I need to write a function parse_email that, given an email address s, returns a tuple, (user-id, domain) corresponding to the user name and domain name. For instance, given richie@cc.gatech.edu it should return (richie, cc.gatech.edu).

The function should parse the email only if it exactly matches the email specification. For example, if there are leading or trailing spaces, the function should not match those. Also, the start of the function should be an alphabet as well as the end. In case it is not an alphabet, it should get an error as well as if the string contains any space.

I tried the following function:

def parse_email (s):
    """Parses a string as an email address, returning an (id, domain) pair."""
    try:
        return(re.match(r'\S([\w\.+_-]+)@([\w\._-]+)',s).groups())
    except:
        pass

Can someone help me in completing the function where if there are spaces in the start of the string, it gives an error.

def parse_email (s):   
    try:    
            return(re.match(r'\S([\w\.+_-]+)@([\w\._-]+)',s).groups())    
    except:    
        pass

Regular expressions are entirely the wrong tool here. Python 3.6+ has a library function which does exactly this.

from email.policy import default as email_default_policy
from email import message_from_string

msg = message_from_string('To: {}'.format(s), policy=email_default_policy)
for addr in msg['to'].addresses:
    yield addr.username, addr.domain

The email.headerregistry library offers a structured representation of a parsed email address. The email.policy.default object is required to enable the 3.6+ email parsing functionality (though the documentation alleges that it will become the default policy eventually, at which point you should no longer need to specify it explicitly).

There are demonstrations of RFC822 header parsers in pure regex. The canonical one is roughly a full page of text: Mail::RFC822::Address Regex

Split can be used:

def parse_email (s):   
    try:
        x=re.split('@',s)
        return (x[0],x[1])   
    except:    
        pass

Happy coding :)

I would recommend you simply split the input string on @ after trimming any whitespace. As mentioned in the comments, you can encounter more than one @ symbol in an e-mail address, so it's important to split on the right one.

Building regular expressions for "valid" e-mail addresses is a nightmare, and you're bound to get it wrong. This article explains why:

https://hackernoon.com/the-100-correct-way-to-validate-email-addresses-7c4818f24643

Below is some code, with tests, that show how this works, but you it doesn't cope with multiple @ symbols.

import pytest

def parse_email(s):
  parts = s.strip().split('@', 1)
  if len(parts) == 2:
    return (parts[0], parts[1])
  else:
    raise ValueError()

def test_parse_simple_email():
  parts = parse_email("cheese@peas.com")
  assert len(parts) == 2
  assert parts[0] == "cheese"
  assert parts[1] == "peas.com"

def test_invalid_email():
  with pytest.raises(ValueError):
    parts = parse_email("this is not an e-mail address")

def test_parse_email_with_whitespace():
  parts = parse_email(" cheese@peas.com ")
  assert len(parts) == 2
  assert parts[0] == "cheese"
  assert parts[1] == "peas.com"

I think the following codE and function should do the job:

def parse_email (s):
try:
z = re.fullmatch(r'\\b([a-zA-Z])([\\w.+ -]+)@([\\w. -]+)([a-zA-Z])\\b',s).groups()
return(z[0]+z[1],z[2]+z[3])
except AttributeError:
raise ValueError

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM