I am trying to read my emails using a Python script (Python 2.5 and PyPy) Some of my results are not in ASCII and i get strings like this:
=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?='
Is there any way to decode it and convert to utf-8 so that i can process it? I tried .decode('ISO-8859-7') but i got the same string
import email.header as eh
unicode_data= u''.join(
str_data.decode(codec or 'ascii')
for str_data, codec
in eh.decode_header('=?ISO-8859-7?B?0OXm7/Dv8d/hIPP07+0gyuno4enx/u3h?='))
# unicode_data now is u'Πεζοπορία στον Κιθαιρώνα'
You should work with unicode_data here. However, if you (think you) need UTF-8 encoded string, you can:
utf8data= unicode_data.encode('utf-8')
Update: I changed the .decode
call to cater for cases where the codec
is None
(eg eh.decode_header('plain text')
)
Read up on MIME encoding and Base64 encoding . The base64 module will be useful.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.