简体   繁体   English

如何检测和更正 python 中电子邮件标题中的 Content-Type 字符集?

[英]How to detect and correct the Content-Type charset in email header in python?

What is the correct way to programatically detect and correct the Content-Type charset in an email header in python?在 python 中以编程方式检测和更正电子邮件标头中的Content-Type charset的正确方法是什么?

I have a 1000s of emails extracted to .eml (basically plain text) files and some are encoded shift_jis , but the charset in the email header doesn't mention this, so they don't display correctly in any email program.我有 1000 封电子邮件被提取到.eml (基本上是纯文本)文件,有些是经过编码的shift_jis ,但是电子邮件标题中的字符集没有提到这一点,因此它们无法在任何电子邮件程序中正确显示。 Adding in the charset manually to the Content-Type header corrects this.charset手动添加到Content-Type标头可纠正此问题。

Was:曾是:

Content-Type: text/plain; format=flowed

Needs to be:需要是:

Content-Type: text/plain; charset="shift_jis"; format=flowed

What's the correct way to do this in python preserving the email body and other parts of the header?在python中保留电​​子邮件正文和标题的其他部分的正确方法是什么?

Also, is there a way to detect which encoding, and only correct those with that encoding?另外,有没有办法检测哪种编码,并且只纠正那些具有该编码的编码? I can't just convert all blindly, since some are iso_2022_jp , and those are already displaying correctly.我不能盲目地全部转换,因为有些是iso_2022_jp ,而那些已经正确显示。

With get_charset you can get the pre-existing charset of a message.使用get_charset,您可以获得消息的预先存在的字符集。 Here's a sample:这是一个示例:

from email import message_from_file
msg = message_from_file(open('path.eml'))
msg.get_charsets()
[None, 'gb2312', None]

With this approach you can loop through all messages, and using set_charset() set it to the ones that don't have it to the correct one.使用这种方法,您可以遍历所有消息,并使用 set_charset() 将其设置为没有它的那些为正确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Ajax,内容类型标头和python - Ajax, content-type header and python Python HTTP标头内容类型边界 - Python HTTP Header Content-Type boundary 使 Python http.server 对 webp 图像使用正确的内容类型 header - Make Python http.server use correct content-type header for webp images 什么是默认内容类型/字符集? - What is the default content-type/charset? Python:如何获取 URL 的内容类型? - Python: How to get the Content-Type of an URL? 在 Python 中从 IMAP 库读取电子邮件时如何处理所有字符集和内容类型 - How to handle all charset and content type when reading email from IMAP lib in Python 基于Content-Type标头的Python / Django REST Framework POST - Python/Django REST Framework POST based on Content-Type header 如何将请求和响应“Content-Type”设置为“application/json;charset=UTF-8”? - How to set Request and response “Content-Type” to “application/json;charset=UTF-8”? 如何从内容类型 text/html 中检索现有的 json; 字符集=utf-8? - How to retrieve existing json from content-type text/html; charset=utf-8? 如何从 HTTP 标头响应中解析 Content-Type 的值? - How to parse the value of Content-Type from an HTTP Header Response?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM