简体   繁体   English

带有gmail中的gmail的IMAP4_SSL

[英]IMAP4_SSL with gmail in python

We are retrieving mails from our gmail account using IMAP4_SSL and python. 我们正在使用IMAP4_SSL和python从gmail帐户中检索邮件。 The email body is retrieved in html format. 电子邮件正文以html格式检索。 We need to convert that to plaintext. 我们需要将其转换为纯文本。 Can anyone help us with that? 有人可以帮助我们吗?

Stand on the shoulders of giants... 站在巨人的肩膀上...
Peter Bengtsson has worked out a solution to this exact problem here . 彼得·本格森(Peter Bengtsson)在这里已经解决了这个确切的问题。
Peter's script uses the awesome BeautifulSoup , by Leonard Richardson, 彼得的剧本使用了Leonard Richardson的出色的BeautifulSoup
and Fredrik Lundh's unescape() function . 和Fredrik Lundh的unescape()函数

Using Peter's test case, you get this: 使用Peter的测试用例,您将获得:

This is a paragraph.

Foobar [1]
http://two.com

Visit http://www.google.com.

Text elsewhere. Elsewhere [2]

[1] http://one.com
[2] http://three.com

...from this: ...由此:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<body>

<div id="main">
<p>This is a paragraph.</p>

<p><a href="http://one.com">Foobar</a>
<br />

<a href="http://two.com">two.com</a>

</p>
  <p>Visit <a href="http://www.google.com">www.google.com</a>.</p>
<br />
Text elsewhere.

<a href="http://three.com">Elsewhere</a>

</div>
</body>
</html>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM