简体繁体 English

base64编码字符串中的“块”

[英]“chunking” in base64 encoded string

原文 2018-06-29 01:37:56 0 2 java/ base64/ codec

Some older Base64 encoders add carriage returns "\\r" and/or line feeds "\\n" for every 76 chars in the encoded string, known as "chunking". 一些较旧的Base64编码器为编码字符串中的每76个字符添加回车符“ \\ r”和/或换行符“ \\ n”，称为“块”。 Reason is for the sake of editors that cannot handle longer lines. 原因是为了使编辑器不能处理更长的行。

Question is: Neither "\\r" nor "\\n" is one of the base chars in base64's codepage; 问题是：“ \\ r”和“ \\ n”都不是base64代码页中的基本字符之一； doesn't that make the entire encoded string invalid for base64? 这样会使整个编码字符串对base64无效吗？

Note that I am not asking if the decoders will tolerate "blank" chars like \\r; 注意，我不是在问解码器是否可以容忍\\ r等“空白”字符。 I am asking why adding blank chars into a base64 string is consider OK, while obviously those blank chars are not in the base64 codepage. 我问为什么将空白字符添加到base64字符串中就可以了，而那些空白字符显然不在base64代码页中。

Thanks for your advice on this... 感谢您对此的建议...

2 个解决方案

As per Base64 javadoc, that Base64 variant is for MIME. 根据Base64 javadoc，该Base64变体适用于MIME。

That said, one has to know the usage area. 也就是说，必须知道使用区域。

Fortunately the Base64 class can do all. 幸运的是，Base64类可以完成所有操作。

Basic 基本
Uses "The Base64 Alphabet" as specified in Table 1 of RFC 4648 and RFC 2045 for encoding and decoding operation. 使用RFC 4648和RFC 2045表1中指定的“ Base64字母”进行编码和解码操作。 The encoder does not add any line feed (line separator) character. 编码器不添加任何换行符（换行符）。 The decoder rejects data that contains characters outside the base64 alphabet. 解码器拒绝包含base64字母之外的字符的数据。
URL and Filename safe 网址和文件名安全
Uses the "URL and Filename safe Base64 Alphabet" as specified in Table 2 of RFC 4648 for encoding and decoding. 使用RFC 4648表2中指定的“ URL和文件名安全的Base64字母”进行编码和解码。 The encoder does not add any line feed (line separator) character. 编码器不添加任何换行符（换行符）。 The decoder rejects data that contains characters outside the base64 alphabet. 解码器拒绝包含base64字母之外的字符的数据。
MIME 哑剧
Uses the "The Base64 Alphabet" as specified in Table 1 of RFC 2045 for encoding and decoding operation. 使用RFC 2045的表1中指定的“ Base64字母”进行编码和解码操作。 The encoded output must be represented in lines of no more than 76 characters each and uses a carriage return '\\r' followed immediately by a linefeed '\\n' as the line separator. 编码后的输出必须以不超过76个字符的行表示，并使用回车符'\\ r'和紧跟换行符'\\ n'作为行分隔符。 No line separator is added to the end of the encoded output. 没有行分隔符添加到编码的输出的末尾。 All line separators or other characters not found in the base64 alphabet table are ignored in decoding operation. 在base64字母表中找不到的所有行分隔符或其他字符在解码操作中将被忽略。

After reading into the RFC 2045 spec, ie the MIME portion in Joop's post, I realized my earlier misunderstanding: The codepage of the RFC 2045 char table is not the whole story. 在阅读了RFC 2045规范（即Joop帖子中的MIME部分）之后，我意识到了我之前的误解：RFC 2045 char表的代码页不是全部。

Additionally, the spec clearly states how the encoder should provide line separator chars in addition to the codepage chars, and how the decoder should handle those additional chars, which is what I was missing. 此外，该规范明确指出了编码器除应提供代码页字符外还应提供行分隔符，以及解码器应如何处理这些其他字符，这正是我所缺少的。 That is the reason why those line chars are valid per the spec. 这就是为什么这些行字符按规范有效的原因。