简体   繁体   中英

IMAP4_SSL with gmail in python

We are retrieving mails from our gmail account using IMAP4_SSL and python. The email body is retrieved in html format. We need to convert that to plaintext. Can anyone help us with that?

Stand on the shoulders of giants...
Peter Bengtsson has worked out a solution to this exact problem here .
Peter's script uses the awesome BeautifulSoup , by Leonard Richardson,
and Fredrik Lundh's unescape() function .

Using Peter's test case, you get this:

This is a paragraph.

Foobar [1]
http://two.com

Visit http://www.google.com.

Text elsewhere. Elsewhere [2]

[1] http://one.com
[2] http://three.com

...from this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<body>

<div id="main">
<p>This is a paragraph.</p>

<p><a href="http://one.com">Foobar</a>
<br />

<a href="http://two.com">two.com</a>

</p>
  <p>Visit <a href="http://www.google.com">www.google.com</a>.</p>
<br />
Text elsewhere.

<a href="http://three.com">Elsewhere</a>

</div>
</body>
</html>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM