简体   繁体   English

解析电子邮件字段

[英]Parse email fields

I want to parse email adresses from a To: email field.我想从To:电子邮件字段解析电子邮件地址。

Indeed, when looping on the emails in a mbox:确实,当循环播放 mbox 中的电子邮件时:

mbox = mailbox.mbox('test.mbox')
for m in mbox:
  print m['To']

we can get things like:我们可以得到类似的东西:

info@test.org, Blahblah <blah@test.com>, <another@blah.org>, "Hey" <last@one.com>

that should be parsed into:应该被解析为:

[{email: "info@test.org", name: ""}, 
 {email: "blah@test.com", name: "Blahblah"},
 {email: "another@blah.org", name: ""},
 {email: "last@one.com", name: "Hey"}]

Is there something already built-in (in mailbox or another module) for this or nothing?是否已经为此内置了一些东西(在mailbox或另一个模块中)或什么都没有?

I read a few times this doc but I didn't find something relevant.我读了几次这个文档,但我没有找到相关的东西。

You can use email.utils.getaddresses() for this:您可以email.utils.getaddresses()使用email.utils.getaddresses()

>>> getaddresses(['info@test.org, Blahblah <blah@test.com>, <another@blah.org>, "Hey" <last@one.com>'])
[('', 'info@test.org'), ('Blahblah', 'blah@test.com'), ('', 'another@blah.org'), ('Hey', 'last@one.com')]

(Note that the function expects a list, so you have to enclose the string in [...] .) (请注意,该函数需要一个列表,因此您必须将字符串括在[...] 。)

email.parser has the modules you're looking for . email.parser有你正在寻找的模块 email.message is still relevant, because the parser will return messages using this structure, so you'll be getting your header data from that. email.message仍然相关,因为解析器将使用此结构返回消息,因此您将从中获取标题数据。 But to actually read the files in, email.parser is the way to go.但要真正读取文件, email.parser是要走的路。

As pointed by @TheSpooniest, email has a parser:正如@TheSpooniest 所指出的, email有一个解析器:

import email

s = 'info@test.org, Blahblah <blah@test.com>, <another@blah.org>, "Hey" <last@one.com>'

for em in s.split(','):
    print email.utils.parseaddr(em) 

gives:给出:

('', 'info@test.org')
('Blahblah', 'blah@test.com')
('', 'another@blah.org')
('Hey', 'last@one.com')

Python provides email.Header.decode_header() for decoding header. Python 提供了email.Header.decode_header()用于解码头部。 The function decode each atom and return a list of tuples ( text, encoding ) that you still have to decode and join to get the full text.该函数解码每个原子并返回一个元组列表(文本,编码),您仍然需要解码并加入以获得全文。

For addresses, Python provides email.utils.getaddresses() that split addresses in a list of tuple ( display-name, address ).对于地址,Python 提供email.utils.getaddresses()将地址拆分为元组列表( display-name, address )。 display-name need to be decoded too and addresses must match the RFC2822 syntax. display-name 也需要解码,地址必须符合 RFC2822 语法。 The function getmailaddresses() does all the job.函数getmailaddresses()完成所有工作。

Here's a tutorial that might help http://blog.magiksys.net/parsing-email-using-python-header这是一个可能有助于http://blog.magiksys.net/parsing-email-using-python-header的教程

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM