I want to decode 'quoted-printable' encoded strings in Python, but I seem to be stuck at a point.
I fetch certain mails from my gmail account based on the following code:
import imaplib
import email
import quopri
mail = imaplib.IMAP4_SSL('imap.gmail.com')
mail.login('mail@gmail.com', '*******')
mail.list()
mail.select('"[Gmail]/All Mail"')
typ, data = mail.search(None, 'SUBJECT', '"{}"'.format('123456'))
data[0].split()
print(data[0].split())
for e_mail in data[0].split():
typ, data = mail.fetch('{}'.format(e_mail.decode()),'(RFC822)')
raw_mail = data[0][1]
email_message = email.message_from_bytes(raw_mail)
if email_message.is_multipart():
for part in email_message.walk():
if part.get_content_type() == 'text/plain':
if part.get_content_type() == 'text/plain':
body = part.get_payload()
to = email_message['To']
utf = quopri.decodestring(to)
text = utf.decode('utf-8')
print(text)
.
.
.
If I print 'to'
for example, the result is this if the 'to' has characters like é,á,ó...:
=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=
I can decode the ' body
' quoted-printable encoded string successfully using the quopri library as such:
quopri.decodestring(sometext).decode('utf-8')
But the same logic doesn't work for other parts of the e-mail, such as the to, from, subject.
Anyone knows a hint?
The subject string you have is not pure quoted printable encoding (ie not standard quopri
) — it is a mixture of base64
and quoted printable. You can decode it with the standard library:
from email.header import decode_header
result = decode_header('=?UTF-8?B?UMOpdGVyIFBldMWRY3o=?=')
# ^ the result is a list of tuples of the form [(decoded_bytes, encoding),]
for data, encoding in result:
print(data.decode(encoding))
# outputs: Péter Petőcz
You are trying to decode latin characters using utf-8. The output you are getting is base64. It reads:
No printable characters found, try another source charset, or upload your data as a file for binary decoding.
Give this a try. Python: Converting from ISO-8859-1/latin1 to UTF-8
This solves it:
from email.header import decode_header
def mail_header_decoder(header):
if header != None:
mail_header_decoded = decode_header(header)
l=[]
header_new=[]
for header_part in mail_header_decoded:
l.append(header_part[1])
if all(item == None for item in l):
# print(header)
return header
else:
for header_part in mail_header_decoded:
header_new.append(header_part[0].decode())
header_new = ''.join(header_new) # convert list to string
# print(header_new)
return header_new
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.