简体   繁体   English

从 Gmail 解码 MIME email API - \r\n 和 3D - Python

[英]Decoding MIME email from Gmail API - \r\n and 3D - Python

I am currently using the Gmail API to read in some HTML emails in Python. I've decoded their body using:我目前正在使用 Gmail API 阅读 Python 中的一些 HTML 电子邮件。我使用以下方法解码了他们的正文:

base64.urlsafe_b64decode

After printing out the resulting HTML email, "\r\n" and "3D" are scattered around the HTML. I can't remove the "\r\n" because the \ and r and \ and n register as different characters (?) and I'm not sure where the "3D" comes from.打印出结果 HTML email 后,“\r\n”和“3D”分散在 HTML 周围。我无法删除“\r\n”,因为 \ 和 r 和 \ 和 n 注册为不同的字符( ?)而且我不确定“3D”是从哪里来的。

Is there something wrong with how I'm decoding it?我的解码方式有问题吗?

Here is the code:这是代码:

results = service.users().messages().list(userId='me', q = 'is: unread').execute()

for index in range(len(results['messages'])):
    message = service.users().messages().get(userId='me', id=results['messages'][index]['id'], format='raw').execute()

    msg_str = base64.urlsafe_b64decode(message['raw'].encode('UTF-8'))

    mime_msg = email.message_from_string(str(msg_str))

    print(mime_msg)

    service.users().messages().modify(userId='me', id=results['messages'][index]['id'], body = {'removeLabelIds': ['UNREAD']}).execute() # mark message as read

I found the solution - I stopped using the email library from Python, and cast msg_str to a string (it is of type bytes). 我找到了解决方案-我停止使用Python中的电子邮件库,并将msg_str转换为字符串(字节类型)。 From there, I simply deleted '\\r\\n' from the string and replaced '=3D' with '=' . 从那里,我只是从字符串中删除了'\\r\\n'并将'=3D'替换为'='

This is not a great solution, rather use something like 这不是一个很好的解决方案,而是使用类似

for email_part in message.walk(): 
    part_data = email_part.get_payload(decode=True) 

Where message is a Python email.message.Message obj. 其中message是Python email.message.Message obj。 Then perhaps uses something like BeautifulSoup to effectively analyse the HTML. 然后也许使用类似BeautifulSoup之类的方法来有效地分析HTML。 Hope that helps! 希望有帮助!

maksel's solution worked for me provided str.decode('utf-8') was set. 只要设置了str.decode('utf-8'),maksel的解决方案就可以为我工作。 The original code encoded instead of decoded the byte-string. 原始代码经过编码,而不是解码后的字节字符串。

Hence, under python 3.7 we can replace as follows: 因此,在python 3.7下,我们可以替换为:

msg = msg.replace('\r\n', '').replace('=3D', '=')

Be wary as this solution did not work for all html tags in my case. 请警惕,因为在我的情况下,此解决方案不适用于所有html标签。

I might be bit late.我可能来晚了一点。 Some of the mentioned solutions worked.提到的一些解决方案有效。 But to help others who are visiting here I thought to post this answer as it looks bit cleaner.但是为了帮助访问这里的其他人,我想发布这个答案,因为它看起来更干净一些。

When building the mail object use policy=email.policy.default .构建邮件 object 时使用policy=email.policy.default This removes the mentioned =3D , \r\n etc.这将删除提到的=3D\r\n等。

mailobject = email.message_from_string(msg_str,  policy=email.policy.default)

If on Python 3.6+ you can use get_body and get_content methods.如果在 Python 3.6+ 上,您可以使用get_bodyget_content方法。

if mailobject.is_multipart():
    body = mailobject.get_body(('html',))
else:
    body = mailobject.get_body(('plain',))

if body:
    body = body.get_content()

print(body)

Above codes are very minimal just to suffice the answer.上面的代码非常少,只是为了满足答案。 Here we assumed its either just plain or html. Remember to cater for other situations when handling emails.这里我们假设它要么是普通的,要么是 html。请记住在处理电子邮件时要考虑到其他情况。

An Additional Unrelated Tip:一个额外的无关提示:

As it is an encoding problem this answer also works with other similar situations.由于这是一个编码问题,因此该答案也适用于其他类似情况。 Like when trying to parse AWS SES emails pushed to s3 forwarding using an AWS Lambda Function(Python).就像在尝试使用 AWS Lambda 函数(Python)解析推送到 s3 转发的 AWS SES 电子邮件时一样。 I had to mention it here as this same issue occurred to me while trying to play with those.我不得不在这里提到它,因为我在尝试玩这些时也遇到了同样的问题。

In such case use it like this在这种情况下像这样使用它

s3_file = object_s3['Body'].read()
mailobject = email.message_from_string(s3_file.decode('utf-8'),  policy=email.policy.default)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM