带有gmail中的gmail的IMAP4_SSL

Question

We are retrieving mails from our gmail account using IMAP4_SSL and python. 我们正在使用IMAP4_SSL和python从gmail帐户中检索邮件。 The email body is retrieved in html format. 电子邮件正文以html格式检索。 We need to convert that to plaintext. 我们需要将其转换为纯文本。 Can anyone help us with that? 有人可以帮助我们吗？

Answer 1

Stand on the shoulders of giants... 站在巨人的肩膀上...
Peter Bengtsson has worked out a solution to this exact problem here . 彼得·本格森（Peter Bengtsson）在这里已经解决了这个确切的问题。
Peter's script uses the awesome BeautifulSoup , by Leonard Richardson, 彼得的剧本使用了Leonard Richardson的出色的BeautifulSoup ，
and Fredrik Lundh's unescape() function . 和Fredrik Lundh的unescape（）函数。

Using Peter's test case, you get this: 使用Peter的测试用例，您将获得：

This is a paragraph.

Foobar [1]
http://two.com

Visit http://www.google.com.

Text elsewhere. Elsewhere [2]

[1] http://one.com
[2] http://three.com

...from this: ...由此：

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<body>

<div id="main">
<p>This is a paragraph.</p>

<p><a href="http://one.com">Foobar</a>
<br />

<a href="http://two.com">two.com</a>

</p>
  <p>Visit <a href="http://www.google.com">www.google.com</a>.</p>
<br />
Text elsewhere.

<a href="http://three.com">Elsewhere</a>

</div>
</body>
</html>

带有gmail中的gmail的IMAP4_SSL

问题描述

1 个解决方案

解决方案1
2 2009-06-04 06:05:17

带有gmail中的gmail的IMAP4_SSL

问题描述

1 个解决方案

解决方案1 2 2009-06-04 06:05:17

解决方案1
2 2009-06-04 06:05:17