简体   繁体   中英

Parse email fields

I want to parse email adresses from a To: email field.

Indeed, when looping on the emails in a mbox:

mbox = mailbox.mbox('test.mbox')
for m in mbox:
  print m['To']

we can get things like:

info@test.org, Blahblah <blah@test.com>, <another@blah.org>, "Hey" <last@one.com>

that should be parsed into:

[{email: "info@test.org", name: ""}, 
 {email: "blah@test.com", name: "Blahblah"},
 {email: "another@blah.org", name: ""},
 {email: "last@one.com", name: "Hey"}]

Is there something already built-in (in mailbox or another module) for this or nothing?

I read a few times this doc but I didn't find something relevant.

You can use email.utils.getaddresses() for this:

>>> getaddresses(['info@test.org, Blahblah <blah@test.com>, <another@blah.org>, "Hey" <last@one.com>'])
[('', 'info@test.org'), ('Blahblah', 'blah@test.com'), ('', 'another@blah.org'), ('Hey', 'last@one.com')]

(Note that the function expects a list, so you have to enclose the string in [...] .)

email.parser has the modules you're looking for . email.message is still relevant, because the parser will return messages using this structure, so you'll be getting your header data from that. But to actually read the files in, email.parser is the way to go.

As pointed by @TheSpooniest, email has a parser:

import email

s = 'info@test.org, Blahblah <blah@test.com>, <another@blah.org>, "Hey" <last@one.com>'

for em in s.split(','):
    print email.utils.parseaddr(em) 

gives:

('', 'info@test.org')
('Blahblah', 'blah@test.com')
('', 'another@blah.org')
('Hey', 'last@one.com')

Python provides email.Header.decode_header() for decoding header. The function decode each atom and return a list of tuples ( text, encoding ) that you still have to decode and join to get the full text.

For addresses, Python provides email.utils.getaddresses() that split addresses in a list of tuple ( display-name, address ). display-name need to be decoded too and addresses must match the RFC2822 syntax. The function getmailaddresses() does all the job.

Here's a tutorial that might help http://blog.magiksys.net/parsing-email-using-python-header

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM