简体   繁体   中英

“chunking” in base64 encoded string

Some older Base64 encoders add carriage returns "\\r" and/or line feeds "\\n" for every 76 chars in the encoded string, known as "chunking". Reason is for the sake of editors that cannot handle longer lines.

Question is: Neither "\\r" nor "\\n" is one of the base chars in base64's codepage; doesn't that make the entire encoded string invalid for base64?

Note that I am not asking if the decoders will tolerate "blank" chars like \\r; I am asking why adding blank chars into a base64 string is consider OK, while obviously those blank chars are not in the base64 codepage.

Thanks for your advice on this...

As per Base64 javadoc, that Base64 variant is for MIME.

That said, one has to know the usage area.

Fortunately the Base64 class can do all.

  • Basic

    Uses "The Base64 Alphabet" as specified in Table 1 of RFC 4648 and RFC 2045 for encoding and decoding operation. The encoder does not add any line feed (line separator) character. The decoder rejects data that contains characters outside the base64 alphabet.

  • URL and Filename safe

    Uses the "URL and Filename safe Base64 Alphabet" as specified in Table 2 of RFC 4648 for encoding and decoding. The encoder does not add any line feed (line separator) character. The decoder rejects data that contains characters outside the base64 alphabet.

  • MIME

    Uses the "The Base64 Alphabet" as specified in Table 1 of RFC 2045 for encoding and decoding operation. The encoded output must be represented in lines of no more than 76 characters each and uses a carriage return '\\r' followed immediately by a linefeed '\\n' as the line separator. No line separator is added to the end of the encoded output. All line separators or other characters not found in the base64 alphabet table are ignored in decoding operation.

After reading into the RFC 2045 spec, ie the MIME portion in Joop's post, I realized my earlier misunderstanding: The codepage of the RFC 2045 char table is not the whole story.

Additionally, the spec clearly states how the encoder should provide line separator chars in addition to the codepage chars, and how the decoder should handle those additional chars, which is what I was missing. That is the reason why those line chars are valid per the spec.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM