简体   繁体   English

内容传输编码 7 位或 8 位

[英]Content Transfer Encoding 7bit or 8 bit

While sending email content, it is required to set "Content Transfer Encoding" header.发送电子邮件内容时,需要设置“内容传输编码”标题。 I observed many headers of emails that I received.我观察到我收到的许多电子邮件的标题。 Some emails using "7bit" and some are using "8bit".一些电子邮件使用“7bit”,一些电子邮件使用“8bit”。

What is the difference between these two?这两者有什么区别? Which is recommended?推荐哪个? Is there any special encoding required for email body in order to set these headers?为了设置这些标题,电子邮件正文是否需要任何特殊编码?

It can be a bit dense to read, but the "Content-Transfer-Encoding" section of RFC 1341 has all of the details:阅读起来可能有点密集,但 RFC 1341 的“内容传输编码”部分包含所有详细信息:

http://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html http://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html

The situation kinda goes from bad to worse.情况有点变得越来越糟。 Here's my summary:这是我的总结:

Background背景

SMTP, by definition (RFC 821), limits mail to lines of 1000 characters of 7 bits each.根据定义 (RFC 821),SMTP 将邮件限制为 1000 个字符的行,每行 7 位。 That means that none of the bytes you send down the pipe can have the most significant ("highest-order") bit set to "1".这意味着您通过管道发送的任何字节都不能将最高有效(“最高阶”)位设置为“1”。

The content that we want to send will often not obey this restriction inherently.我们要发送的内容通常不会固有地遵守此限制。 Think of an image file, or a text file that contains Unicode characters: the bytes of these files will often have their 8th bit set to "1".想想一个图像文件,或一个包含 Unicode 字符的文本文件:这些文件的字节通常将它们的第 8 位设置为“1”。 SMTP doesn't allow this, so you need to use "transfer encoding" to describe how you've worked around the mismatch. SMTP 不允许这样做,因此您需要使用“传输编码”来描述您是如何解决不匹配问题的。

The values for the Content-Transfer-Encoding header describe the rule that you've chosen to solve this problem. Content-Transfer-Encoding标头的值描述了您为解决此问题而选择的规则。

7Bit Encoding 7位编码

7bit simply means "My data consists only of US-ASCII characters, which only use the lower 7 bits for each character." 7bit仅表示“我的数据仅由 US-ASCII 字符组成,每个字符仅使用低 7 位。” You're basically guaranteeing that all of the bytes in your content already adhere to the restrictions of SMTP, and so it needs no special treatment.您基本上可以保证内容中的所有字节都已遵守 SMTP 的限制,因此不需要特殊处理。 You can just read it as-is.您可以按原样阅读它。

Note that when you choose 7bit , you're agreeing that all of the lines in your content are less than 1000 characters in length.请注意,当您选择7bit ,您同意内容中的所有行的长度都小于 1000 个字符。

As long as your content adheres to these rule, 7bit is the best transfer encoding, since there's no extra work necessary;只要您的内容遵守这些规则, 7bit是最好的传输编码,因为不需要额外的工作; you just read/write the bytes as they come off the pipe.您只需在字节离开管道时读/写字节。 It's also easy to eyeball 7bit content and make sense of it.观察7bit内容并理解它也很容易。 The idea here is that if you're just writing in "plain English text" you'll be fine.这里的想法是,如果你只是用“纯英文文本”写作,你会没事的。 But that wasn't true in 2005 and it isn't true today.但这在 2005 年不是真的,今天也不是真的。

8Bit Encoding 8位编码

8bit means "My data may include extended ASCII characters; they may use the 8th (highest) bit to indicate special characters outside of the standard US-ASCII 7-bit characters." 8bit表示“我的数据可能包含扩展的 ASCII 字符;它们可能使用第 8(最高)位来表示标准 US-ASCII 7 位字符之外的特殊字符。” As with 7bit , there's still a 1000-character line limit.7bit ,仍然有 1000 个字符的行限制。

8bit , just like 7bit , does not actually do any transformation of the bytes as they're written to or read from the wire. 8bit ,就像7bit一样,实际上并没有对字节进行任何转换,因为它们被写入或从线路中读取。 It just means that you're not guaranteeing that none of the bytes will have the highest bit set to "1".这只是意味着您不能保证没有任何字节的最高位设置为“1”。

This seems like a step up from 7bit , since it gives you more freedom in your content.这似乎是7bit一个进步,因为它为您的内容提供了更多的自由。 However, RFC 1341 contains this tidbit:但是,RFC 1341 包含以下花絮:

As of the publication of this document, there are no standardized Internet transports for which it is legitimate to include unencoded 8-bit or binary data in mail bodies.截至本文档发布时,还没有标准化的 Internet 传输可以在邮件正文中包含未编码的 8 位或二进制数据。 Thus there are no circumstances in which the "8bit" or "binary" Content-Transfer-Encoding is actually legal on the Internet.因此,在 Internet 上没有任何情况下“8 位”或“二进制”内容传输编码实际上是合法的。

RFC 1341 came out over 20 years ago. RFC 1341 于 20 多年前问世。 Since then we've gotten 8bit MIME Extensions in RFC 6152 .从那时起,我们在RFC 6152 中获得了8 位 MIME 扩展 But even then, line limits still may apply:但即便如此,行限制仍然可能适用:

Note that this extension does NOT eliminate the possibility of an SMTP server limiting line length;请注意,此扩展并不能消除 SMTP 服务器限制行长的可能性; servers are free to implement this extension but nevertheless set a line length limit no lower than 1000 octets.服务器可以自由地实现此扩展,但仍将行长度限制设置为不低于 1000 个八位字节。

Binary Encoding二进制编码

binary is the same as 8bit , except that there's no line length restriction. binary8bit相同,只是没有行长度限制。 You can still include any characters you want, and there's no extra encoding.您仍然可以包含您想要的任何字符,并且没有额外的编码。 Similar to 8bit , RFC 1341 states that it's not really a legitimate encoding transfer encoding.8bit类似,RFC 1341 声明它不是真正合法的编码传输编码。 RFC 3030 extended this with BINARYMIME . RFC 3030使用BINARYMIME对此进行了扩展。

Quoted Printable引用可打印

Before the 8BITMIME extension, there needed to be a way to send content that couldn't be 7bit over SMTP.8BITMIME扩展之前,需要有一种方法可以通过 SMTP 发送不能是7bit内容。 HTML files (which might have more than 1000-character lines) and files with international characters are good examples of this. HTML 文件(可能有超过 1000 个字符的行)和带有国际字符的文件就是很好的例子。 The quoted-printable encoding (Defined in Section 5.1 of RFC 1341) is designed to handle this. quoted-printable编码(在 RFC 1341 的第 5.1 节中定义)旨在处理此问题。 It does two things:它做两件事:

  • Defines how to escape non-US-ASCII characters so that they can be represented in only 7-bit characters.定义如何转义非 US-ASCII 字符,以便它们只能用 7 位字符表示。 (Short version: they get displayed as an equals sign plus two 7-bit characters.) (简短版本:它们显示为一个等号加两个 7 位字符。)
  • Defines that lines will be no greater than 76 characters, and that line breaks will be represented using special characters (which are then escaped).定义行将不超过 76 个字符,并且换行符将使用特殊字符(然后进行转义)表示。

Quoted Printable, because of the escaping and short lines, is much harder to read by a human than 7bit or 8bit , but it does support a much wider range of possible content. Quoted Printable 由于转义和短行,比7bit8bit更难被人类阅读,但它确实支持更广泛的可能内容。

Base64 Encoding Base64 编码

If your data is largely non-text (ex: an image file), you don't have many options.如果您的数据主要是非文本的(例如:图像文件),则您没有太多选择。 7bit is off the table. 7bit不在桌面上。 8bit and binary were unsupported prior to the MIME extension RFCs.在 MIME 扩展 RFC 之前,不支持8bitbinary quoted-printable would work, but is really inefficient (every byte is going to be represented by 3 characters). quoted-printable可以工作,但效率很低(每个字节将由 3 个字符表示)。

base64 is a good solution for this type of data. base64是此类数据的一个很好的解决方案。 It encodes 3 raw bytes as 4 US-ASCII characters, which is relatively efficient.它将 3 个原始字节编码为 4 个 US-ASCII 字符,这是相对高效的。 RFC 1341 further limits the line length of base64 -encoded data to 76 characters to fit within an SMTP message, but that's relatively easy to manage when you're just splitting or concatenating arbitrary characters at fixed lengths. RFC 1341 进一步将base64编码数据的行长度限制为 76 个字符以适合 SMTP 消息,但是当您只是以固定长度拆分或连接任意字符时,这相对容易管理。

The big downside is that base64 -encoded data is pretty much entirely unreadable by humans, even if it's just "plain" text underneath.最大的缺点是base64编码的数据几乎完全无法被人类读取,即使它只是下面的“纯”文本。

With content-transfer-encoding: 7bit the bytes that are used in body (or more correct within part's boundaries) should represent ascii characters but not extended-ascii characters.使用 content-transfer-encoding: 7bit正文中使用的字节(或更正确的部分边界内)应表​​示 ascii 字符,但不表示扩展的 ascii 字符。 This means 0-127 decimal (8th bit not used).这意味着 0-127 十进制(不使用第 8 位)。

Since 8th bit is not used it means that you cannot encode your text using utf-8 or iso8859-7 bytes because they use the 8th bit.由于未使用第 8 位,这意味着您无法使用utf-8iso8859-7字节对文本进行编码,因为它们使用第 8 位。 Nor you can add binary content.您也不能添加二进制内容。

With content-transfer-encoding: 8bit you can use any possible byte which means that you can encode your text using utf-8 bytes or iso8859-7 bytes (both assuming that 8BITMIME extension is used in SMTP).使用 content-transfer-encoding: 8bit您可以使用任何可能的字节,这意味着您可以使用utf-8字节或iso8859-7字节(都假设在 SMTP 中使用8BITMIME扩展名)对文本进行编码。 You are however still unsafe adding binary content due to the max line-restriction that still applies which could break your bytes with newlines.但是,由于仍然适用的最大行限制,您添加二进制内容仍然不安全,这可能会用换行符破坏您的字节。

Now even with 7bit content-transfer-encoding you can still set content-type 's charset param to utf-8 as long as you still keep your bytes between the boundaries of 0-127.现在,即使使用 7 位内容传输编码,您仍然可以将content-typecharset参数设置为utf-8 ,只要您仍然将字节保持在 0-127 的边界之间。

eg A possible way to represent characters outside ascii using the 7bit content-transfer-encoding could be to use html code characters (with content-type: text/html )例如,使用7bit内容传输编码表示 ascii 之外的字符的一种可能方法是使用html 代码字符content-type: text/html

Many email clients will set content-transfer-encoding to 7bit or 8bit depending on the case.许多电子邮件客户端会根据情况将content-transfer-encoding7bit 8bit8bit Eg 7bit when sending english text, 8bit when sending multilingual text.例如发送英文文本时为7bit ,发送多语言文本时为8bit And there are always the options of quoted-printable and base64 whose characters are also not using 8th bit, but this is out of scope of the question.并且总是有quoted-printablebase64的选项,它们的字符也不使用第8位,但这超出了问题的范围。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM