简体   繁体   English

使用 python imap 和 email 包获取电子邮件的正文

[英]Get body text of an email using python imap and email package

I want to retrieve body (only text) of emails using python imap and email package.我想使用 python imap 和电子邮件包检索电子邮件的正文(仅文本)。

As per this SO thread , I'm using the following code:根据这个SO thread ,我使用以下代码:

mail = email.message_from_string(email_body)
bodytext = mail.get_payload()[ 0 ].get_payload()

Though it's working fine for some instances, but sometime I get similar to following response虽然它在某些情况下工作正常,但有时我会得到类似于以下响应

[<email.message.Message instance at 0x0206DCD8>, <email.message.Message instance at 0x0206D508>]

The main problem in my case is that replied or forwarded message shown as message instance in the bodytext.在我的情况下,主要问题是回复或转发的消息在正文中显示为消息实例。

Solved my problem using the following code:使用以下代码解决了我的问题:

bodytext=mail.get_payload()[0].get_payload();
if type(bodytext) is list:
    bodytext=','.join(str(v) for v in bodytext)

You are assuming that messages have a uniform structure, with one well-defined "main part".您假设消息具有统一的结构,具有一个明确定义的“主要部分”。 That is not the case;事实并非如此; there can be messages with a single part which is not a text part (just an "attachment" of a binary file, and nothing else) or it can be a multipart with multiple textual parts (or, again, none at all) and even if there is only one, it need not be the first part.可以有单个部分的消息,它不是文本部分(只是二进制文件的“附件”,没有别的),或者它可以是具有多个文本部分的多部分(或者,再一次,根本没有),甚至如果只有一个,则不必是第一部分。 Furthermore, there are nested multiparts (one or more parts is another MIME message, recursively).此外,还有嵌套的多部分(一个或多个部分是另一个 MIME 消息,递归)。

In so many words, you must inspect the MIME structure, then decide which part(s) are relevant for your application.总而言之,您必须检查 MIME 结构,然后决定哪些部分与您的应用程序相关。 If you only receive messages from a fairly static, small set of clients, you may be able to cut some corners (at least until the next upgrade of Microsoft Plague hits) but in general, there simply isn't a hierarchy of any kind, just a collection of (not necessarily always directly related) equally important parts.如果您只接收来自相当静态的一小部分客户端的消息,您可能会偷工减料(至少在 Microsoft Plague 的下一次升级到来之前),但总的来说,根本没有任何类型的层次结构,只是一组(不一定总是直接相关的)同样重要的部分。

Maybe this post (of mine) can be of help.也许这篇文章(我的)可以提供帮助。 I receive a Newsletter with prices of different kind of oil in the US.我收到一份时事通讯,其中包含美国不同种类石油的价格。 I fetch email in gmail with a given pattern for the title, then I extract the prices in the mail body using regex.我使用给定的标题模式在 gmail 中获取电子邮件,然后使用正则表达式提取邮件正文中的价格。 So i have to access the mail body for the last n emails which title observe given pattern.所以我必须访问标题观察给定模式的最后 n 封电子邮件的邮件正文。

I am using email.message_from_string() also: msg = email.message_from_string(response_part[1])我也在使用email.message_from_string() : msg = email.message_from_string(response_part[1])

so maybe it gives you concrete example of how to use methods in this python lib.所以也许它为您提供了如何在这个 python 库中使用方法的具体示例。

Basically you have to iterate over the different text/plain (or text/html) parts of the message to get to the body - there is absolutely no guarantee on which position is the body part! 基本上,您必须遍历消息的不同文本/纯文本(或文本/ html)部分才能到达正文-绝对不能保证正文部分位于哪个位置! (though there is the convention for it to be one of the first... in most cases... probably... :) (尽管有惯例约定它是第一个...在大多数情况下...可能... :)

As I don't want to duplicate content, please see my answer to quite similar question here , and adjust according to your needs. 由于我不想重复内容,请在此处查看我对非常类似的问题的回答 ,并根据您的需要进行调整。

External lib: https://github.com/ikvk/imap_tools外部库: https : //github.com/ikvk/imap_tools

from imap_tools import MailBox 

# get list of email texts from INBOX folder
with MailBox('imap.mail.com').login('test@mail.com', 'pwd', 'INBOX') as mailbox:
    data = [msg.text for msg in mailbox.fetch()]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM