[英]Parsing raw email with python email library adding unwanted characters
我正在使用python標准電子郵件解析庫來解析從Amazon ses郵件服務中獲取的原始電子郵件。
下面是我的相同代碼。
import json
import email
from email.Utils import parseaddr
def parse(raw_email):
message = email.message_from_string(raw_email)
text_plain = None
text_html = None
for part in message.walk():
if part.get_content_type() == 'text/plain' and text_plain is None:
text_plain = part.get_payload()
if part.get_content_type() == 'text/html' and text_html is None:
text_html = part.get_payload()
parsed_email_object = {
'to': parseaddr(message.get('To'))[1],
'from': parseaddr(message.get('From'))[1],
'delivered to': parseaddr(message.get('Delivered-To'))[1],
'subject': message.get('Subject'),
'text_plain': text_plain,
'text_html': text_html,
}
json_string = json.dumps(parsed_email_object)
return json_string
當我解析原始電子郵件時,它不是解析100%,而是給了我這樣的不需要的字符
this is a replyo from the gmail indbo asdf asdf asdfa sdfa=
sd sdfa sdfa fasd
=C2=A0dfa sf asdf
a sdfas
<= div>f asdf=C2=A0
Is there anything else like some decoding option to parse it correctly.
將我的評論作為答案,以便引起關注。
part.get_payload(decode=True).decode(part.get_content_charset())
這將解決編碼問題
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.