将邮箱消息转换为PDF：哪个部分？

Question

I am trying to code a script that will export all my messages (mailbox mbox format) into PDF files with pdfkit. 我正在尝试编写一个脚本，将使用pdfkit将我的所有消息（邮箱mbox格式）导出为PDF文件。

It seems that all messages in my mailbox are multipart, and I'm struggling with figuring out which part is the relevant one. 似乎我邮箱中的所有邮件都是多部分的，我正在努力弄清楚哪个部分是相关部分。 If I iterate through all parts with the code below, I will generate typically 3 to 5 PDFs per e-mail, with only one of them being similar to what I would see if I opened the e-mail with an e-mail client. 如果我使用下面的代码遍历所有部分，我将通过电子邮件生成通常3到5个PDF，其中只有一个类似于我用电子邮件客户端打开电子邮件时所看到的。 The other parts are typically either raw text or something that looks like this: x92O&S\\xd2\\x0c\\xb4e\\xee\\x0fh\\xc68\\x1 (hexadecimal?). 其他部分通常是原始文本或类似的东西： x92O&S\\xd2\\x0c\\xb4e\\xee\\x0fh\\xc68\\x1 （十六进制？）。

I tried to solve the issue by including a test to filter for HTML ( if bool(BeautifulSoup(html, "html.parser").find()) ) but it seems that this does not work. 我尝试通过包含一个过滤HTML的测试来解决这个问题（ if bool(BeautifulSoup(html, "html.parser").find()) ）但似乎这不起作用。

for part in message.walk():
    partcounter +=1
    try:
        html = str(part.get_payload(decode=True))
        if bool(BeautifulSoup(html, "html.parser").find()):
            print(str(messagecounter)+'-'+str(partcounter)+' - '+"payload is HTML")
            filename = 'C:/Email_forwarding/Attachments/'+str(messagecounter)+"-"+str(partcounter)+'.pdf'#this keeps the file only for the last part, which seems to be correct
            pdfkit.from_string(html,filename, configuration=config)
            print(str(messagecounter)+'-'+str(partcounter)+' - '+"created %s" %(filename))
        else:
            print(str(messagecounter)+'-'+str(partcounter)+' - '+"payload is not HTML")
    except:
        print(str(messagecounter)+'-'+str(partcounter)+' - '+"no payload or failed to convert")

How can I detect which part of a multipart e-mail contains actual, interpretable HTML? 如何检测多部分电子邮件的哪个部分包含实际的可解释HTML？

Answer 1

You can use part.get_content_type() to filter through the different parts of the message: 您可以使用part.get_content_type()来过滤消息的不同部分：

for part in message.walk():
    if part.get_content_type() == 'text/html':
        html = str(part.get_payload(decode=True))

将邮箱消息转换为PDF：哪个部分？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-04-05 12:54:15

将邮箱消息转换为PDF：哪个部分？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-04-05 12:54:15

解决方案1
1 已采纳 2018-04-05 12:54:15