简体   繁体   English

将邮箱消息转换为PDF:哪个部分?

[英]convert mailbox message to PDF: which part?

I am trying to code a script that will export all my messages (mailbox mbox format) into PDF files with pdfkit. 我正在尝试编写一个脚本,将使用pdfkit将我的所有消息(邮箱mbox格式)导出为PDF文件。

It seems that all messages in my mailbox are multipart, and I'm struggling with figuring out which part is the relevant one. 似乎我邮箱中的所有邮件都是多部分的,我正在努力弄清楚哪个部分是相关部分。 If I iterate through all parts with the code below, I will generate typically 3 to 5 PDFs per e-mail, with only one of them being similar to what I would see if I opened the e-mail with an e-mail client. 如果我使用下面的代码遍历所有部分,我将通过电子邮件生成通常3到5个PDF,其中只有一个类似于我用电子邮件客户端打开电子邮件时所看到的。 The other parts are typically either raw text or something that looks like this: x92O&S\\xd2\\x0c\\xb4e\\xee\\x0fh\\xc68\\x1 (hexadecimal?). 其他部分通常是原始文本或类似的东西: x92O&S\\xd2\\x0c\\xb4e\\xee\\x0fh\\xc68\\x1 (十六进制?)。

I tried to solve the issue by including a test to filter for HTML ( if bool(BeautifulSoup(html, "html.parser").find()) ) but it seems that this does not work. 我尝试通过包含一个过滤HTML的测试来解决这个问题( if bool(BeautifulSoup(html, "html.parser").find()) )但似乎这不起作用。

for part in message.walk():
    partcounter +=1
    try:
        html = str(part.get_payload(decode=True))
        if bool(BeautifulSoup(html, "html.parser").find()):
            print(str(messagecounter)+'-'+str(partcounter)+' - '+"payload is HTML")
            filename = 'C:/Email_forwarding/Attachments/'+str(messagecounter)+"-"+str(partcounter)+'.pdf'#this keeps the file only for the last part, which seems to be correct
            pdfkit.from_string(html,filename, configuration=config)
            print(str(messagecounter)+'-'+str(partcounter)+' - '+"created %s" %(filename))
        else:
            print(str(messagecounter)+'-'+str(partcounter)+' - '+"payload is not HTML")
    except:
        print(str(messagecounter)+'-'+str(partcounter)+' - '+"no payload or failed to convert")

How can I detect which part of a multipart e-mail contains actual, interpretable HTML? 如何检测多部分电子邮件的哪个部分包含实际的可解释HTML?

You can use part.get_content_type() to filter through the different parts of the message: 您可以使用part.get_content_type()来过滤消息的不同部分:

for part in message.walk():
    if part.get_content_type() == 'text/html':
        html = str(part.get_payload(decode=True))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将PDF的一部分转换为图像文件 - Convert a part of PDF to image file 创建一个名为Mailbox的类,它使用一个名为Message的现有类来添加,删除和检索存储在列表中的电子邮件吗? - Creating a class named Mailbox which uses an existing class named Message in order to add, remove, and retrieve e-mail messages stored in a list? 如何找到文本消息的哪一部分有嵌套链接/并打开它? - How to find which part of text message has a nested link/and open it? maildir消息的文件名-mailbox.maildir - Filename of a maildir message - mailbox.Maildir 验证 dataframe 字符串的哪一部分不能转换为字符串 - verify which part of dataframe string cannot convert to string 来自邮箱消息的电子邮件的非递归遍历 - non-recursive walk of email message from mailbox message 如何使用 IMAP_mailbox (imap-tools) 标记 Gmail 邮件? - How to label a Gmail message with IMAP_mailbox (imap-tools)? 使用Protobuf序列化消息的一部分 - Serializing part of a message with Protobuf 定义要保存 email 的邮箱 - win32client python - Define mailbox to which to save an email - win32client python 如何将基于毫米的数字转换为表示米、厘米和毫米的部分的数字? - How to convert a number based on milimeter to a number that says which part is meter , centimeter and milimeter?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM