Decoding MIME email from Gmail API - \r\n and 3D - Python

Question

I am currently using the Gmail API to read in some HTML emails in Python. I've decoded their body using:

base64.urlsafe_b64decode

After printing out the resulting HTML email, "\r\n" and "3D" are scattered around the HTML. I can't remove the "\r\n" because the \ and r and \ and n register as different characters (?) and I'm not sure where the "3D" comes from.

Is there something wrong with how I'm decoding it?

Here is the code:

results = service.users().messages().list(userId='me', q = 'is: unread').execute()

for index in range(len(results['messages'])):
    message = service.users().messages().get(userId='me', id=results['messages'][index]['id'], format='raw').execute()

    msg_str = base64.urlsafe_b64decode(message['raw'].encode('UTF-8'))

    mime_msg = email.message_from_string(str(msg_str))

    print(mime_msg)

    service.users().messages().modify(userId='me', id=results['messages'][index]['id'], body = {'removeLabelIds': ['UNREAD']}).execute() # mark message as read

Answer 1

I found the solution - I stopped using the email library from Python, and cast msg_str to a string (it is of type bytes). From there, I simply deleted '\\r\\n' from the string and replaced '=3D' with '=' .

Answer 2

This is not a great solution, rather use something like

for email_part in message.walk(): 
    part_data = email_part.get_payload(decode=True)

Where message is a Python email.message.Message obj. Then perhaps uses something like BeautifulSoup to effectively analyse the HTML. Hope that helps!

Answer 3

maksel's solution worked for me provided str.decode('utf-8') was set. The original code encoded instead of decoded the byte-string.

Hence, under python 3.7 we can replace as follows:

msg = msg.replace('\r\n', '').replace('=3D', '=')

Be wary as this solution did not work for all html tags in my case.

Answer 4

I might be bit late. Some of the mentioned solutions worked. But to help others who are visiting here I thought to post this answer as it looks bit cleaner.

When building the mail object use policy=email.policy.default . This removes the mentioned =3D , \r\n etc.

mailobject = email.message_from_string(msg_str,  policy=email.policy.default)

If on Python 3.6+ you can use get_body and get_content methods.

if mailobject.is_multipart():
    body = mailobject.get_body(('html',))
else:
    body = mailobject.get_body(('plain',))

if body:
    body = body.get_content()

print(body)

Above codes are very minimal just to suffice the answer. Here we assumed its either just plain or html. Remember to cater for other situations when handling emails.

An Additional Unrelated Tip:

As it is an encoding problem this answer also works with other similar situations. Like when trying to parse AWS SES emails pushed to s3 forwarding using an AWS Lambda Function(Python). I had to mention it here as this same issue occurred to me while trying to play with those.

In such case use it like this

s3_file = object_s3['Body'].read()
mailobject = email.message_from_string(s3_file.decode('utf-8'),  policy=email.policy.default)

Decoding MIME email from Gmail API - \r\n and 3D - Python

Question

4 answers

solution1
1 ACCPTED 2017-08-10 21:42:44

solution2
1 2017-10-12 13:38:57

solution3
0 2019-06-26 22:48:26

solution4
0 2021-08-13 15:28:59

Decoding MIME email from Gmail API - \r\n and 3D - Python

Question

4 answers

solution1 1 ACCPTED 2017-08-10 21:42:44

solution2 1 2017-10-12 13:38:57

solution3 0 2019-06-26 22:48:26

solution4 0 2021-08-13 15:28:59

solution1
1 ACCPTED 2017-08-10 21:42:44

solution2
1 2017-10-12 13:38:57

solution3
0 2019-06-26 22:48:26

solution4
0 2021-08-13 15:28:59