简体   繁体   English

base64.decode: 填充前编码无效

[英]base64.decode: Invalid encoding before padding

I'm working on a flutter project and I'm currently getting an error with some of the strings I try do decode using the base64.decode() method.我正在做一个 flutter 项目,目前我尝试使用 base64.decode() 方法解码的一些字符串出现错误。 I've created a short dart code which can reproduce the problem I'm facing with a specific string:我创建了一个简短的飞镖代码,可以重现我在使用特定字符串时遇到的问题:

import 'dart:convert';

void main() {
  final message = 'RU5UUkVHQUdSQVRJU1==';
  print(utf8.decode(base64.decode(message)));
}

I'm getting the following error message:我收到以下错误消息:

Uncaught Error: FormatException: Invalid encoding before padding (at character 19)
RU5UUkVHQUdSQVRJU1==

I've tried decoding the same string with JavaScript and it works fine.我试过用 JavaScript 解码相同的字符串并且它工作正常。 Would be glad if someone could explain why am I getting this error, and possibly show me a solution.如果有人能解释为什么我会收到此错误,并可能向我展示解决方案,我会很高兴。 Thanks.谢谢。

Base64 encoding breaks binary data into 6-bit segments of 3 full bytes and represents those as printable characters in ASCII standard. Base64 编码将二进制数据分解为 3 个完整字节的 6 位段,并将这些段表示为 ASCII 标准中的可打印字符。 It does that in essentially two steps.它基本上分两步完成。

The first step is to break the binary string down into 6-bit blocks.第一步是将二进制字符串分解为 6 位块。 Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable and humanly readable. Base64 仅使用 6 位(对应于 2^6 = 64 个字符)来确保编码数据可打印且可读。 None of the special characters available in ASCII are used.没有使用 ASCII 中可用的特殊字符。

The 64 characters (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters as well as the Plus sign (+) and the Forward Slash (/). 64 个字符(因此命名为 Base64)是 10 个数字、26 个小写字符、26 个大写字符以及加号 (+) 和正斜杠 (/)。 There is also a 65th character known as a pad, which is the Equal sign (=).还有一个第 65 个字符称为填充,即等号 (=)。 This character is used when the last segment of binary data doesn't contain a full 6 bits当二进制数据的最后一段不包含完整的 6 位时使用此字符

So RU5UUkVHQUdSQVRJU1== doesn't follow the encoding pattern.所以 RU5UUkVHQUdSQVRJU1== 不遵循编码模式。

Use Underline character "_" as Padding Character and Decode With Pad Bytes Deleted使用下划线字符“_”作为填充字符并删除填充字节进行解码

For some reason dart:convert 's base64.decode chokes on strings padded with = with the "invalid encoding before padding error".出于某种原因, dart:convertbase64.decode在用=填充的字符串上阻塞,并出现“填充错误前的无效编码”。 This happens even if you use the package's own padding method base64.normalize which pads the string with the correct padding character = .即使您使用包自己的填充方法base64.normalize会使用正确的填充字符=填充字符串,也会发生这种情况。

= is indeed the correct padding character for base64 encoding. =确实是 base64 编码的正确填充字符。 It is used to fill out base64 strings when fewer than 24 bits are available in the input group.当输入组中可用的位数少于 24 位时,它用于填充 base64 字符串。 See RFC 4648, Section 4 .请参阅RFC 4648,第 4 节

However, RFC 4648 Section 5 which is a base64 encoding scheme for Urls uses the underline character _ as padding instead of = to be Url safe.但是, RFC 4648 第 5 节是 Urls 的 base64 编码方案,它使用下划线字符_作为填充而不是=来确保 Url 安全。

Using _ as the padding character will cause base64.decode to decode without error.使用_作为填充字符将导致base64.decode解码无误。

In order to further decode the generated list of bytes to Utf8, you will need to delete the padding bytes or you will get an "Invalid UTF-8 byte" error.为了进一步将生成的字节列表解码为 Utf8,您需要删除填充字节,否则您将收到“无效的 UTF-8 字节”错误。

See the code below.请参阅下面的代码。 Here is the same code as a working dartpad.dev example.这是与工作 dartpad.dev 示例相同的代码。

    import 'dart:convert';

void main() {
  //String message = 'RU5UUkVHQUdSQVRJU1=='; //as of dart 2.18.2 this will generate an "invalid encoding before padding" error
  //String message = base64.normalize('RU5UUkVHQUdSQVRJU1'); // will also generate same error

  String message = 'RU5UUkVHQUdSQVRJU1';
  print("Encoded String: $message");
  print("Decoded String: ${decodeB64ToUtf8(message)}");
}

decodeB64ToUtf8(String message) {
  message =
      padBase64(message); // pad with underline => ('RU5UUkVHQUdSQVRJU1__')
  List<int> dec = base64.decode(message);
  //remove padding bytes
  dec = dec.sublist(0, dec.length - RegExp(r'_').allMatches(message).length);
  return utf8.decode(dec);
}

String padBase64(String rawBase64) {
  return (rawBase64.length % 4 > 0)
      ? rawBase64 += List.filled(4 - (rawBase64.length % 4), "_").join("")
      : rawBase64;
}


The string RU5UUkVHQUdSQVRJU1== is not a compliant base 64 encoding according to RFC 4648 , which in section 3.5, "Canonical Encoding," states:根据RFC 4648 ,字符串RU5UUkVHQUdSQVRJU1==不是兼容的 base 64 编码,在第 3.5 节“规范编码”中指出:

The padding step in base 64 and base 32 encoding can, if improperly implemented, lead to non-significant alterations of the encoded data. base 64 和 base 32 编码中的填充步骤如果实施不当,可能会导致编码数据发生不重要的更改。 For example, if the input is only one octet for a base 64 encoding, then all six bits of the first symbol are used, but only the first two bits of the next symbol are used.例如,如果输入只是 base 64 编码的一个八位位组,则使用第一个符号的所有六位,但仅使用下一个符号的前两位。 These pad bits MUST be set to zero by conforming encoders , which is described in the descriptions on padding below.这些填充位必须由符合标准的编码器设置为零,这在下面的填充描述中有描述。 If this property do not hold, there is no canonical representation of base-encoded data, and multiple base- encoded strings can be decoded to the same binary data.如果此属性不成立,则不存在基本编码数据的规范表示,并且多个基本编码字符串可以解码为相同的二进制数据。 If this property (and others discussed in this document) holds, a canonical encoding is guaranteed.如果此属性(以及本文档中讨论的其他属性)成立,则可以保证规范编码。

In some environments, the alteration is critical and therefore decoders MAY chose to reject an encoding if the pad bits have not been set to zero.在某些环境中,更改是关键的,因此如果填充位未设置为零,解码器可以选择拒绝编码。 The specification referring to this may mandate a specific behaviour.引用此的规范可能会强制执行特定行为。

(Emphasis added.) (强调已添加。)

Here we will manually go through the base 64 decoding process.在这里,我们将手动完成 base 64 解码过程。

Taking your encoded string RU5UUkVHQUdSQVRJU1== and performing the mapping from the base 64 character set (given in "Table 1: The Base 64 Alphabet" of the aforementioned RFC), we have:获取编码字符串RU5UUkVHQUdSQVRJU1==并从 base 64 字符集执行映射(在上述 RFC 的“表 1:The Base 64 Alphabet”中给出),我们有:

  R      U      5      U      U      k      V      H      Q      U      d      S      Q      V      R      J      U      1      =       =
010001 010100 111001 010100 010100 100100 010101 000111 010000 010100 011101 010010 010000 010101 010001 001001 010100 110101 ______ ______

(using __ to represent the padding characters). (使用__来表示填充字符)。

Now, grouping these by 8 instead of 6, we get现在,将这些按 8 个而不是 6 个分组,我们得到

01000101 01001110 01010100 01010010 01000101 01000111 01000001 01000111 01010010 01000001 01010100 01001001 01010011 0101____ ________
  E        N        T        R        E        G        A        G        R        A        T        I        S        P

The important part is at the end, where there are some non-zero bits followed by padding.重要的部分在最后,那里有一些非零位,然后是填充。 The Dart implementation is correctly determining that the padding provided doesn't make sense provided that the last four bits of the previous character do not decode to zeros. Dart 实现正确地确定提供的填充没有意义,前提是前一个字符的最后四位不解码为零。

As a result, the decoding of RU5UUkVHQUdSQVRJU1== is ambiguous.因此, RU5UUkVHQUdSQVRJU1==的解码是不明确的。 Is it ENTREGAGRATIS or ENTREGAGRATISP ?ENTREGAGRATIS还是ENTREGAGRATISP It's precisely this reason why the RFC states, "These pad bits MUST be set to zero by conforming encoders."这正是 RFC 声明“这些填充位必须由符合标准的编码器设置为零”的原因。

In fact, because of this, I'd argue that an implementation that decodes RU5UUkVHQUdSQVRJU1== to ENTREGAGRATIS without complaint is problematic, because it's silently discarding non-zero bits.事实上,正因为如此,我认为将RU5UUkVHQUdSQVRJU1==解码为ENTREGAGRATIS而没有抱怨的实现是有问题的,因为它会默默地丢弃非零位。

The RFC-compliant encoding of ENTREGAGRATIS is RU5UUkVHQUdSQVRJUw== . ENTREGAGRATIS 的 RFC 兼容编码是ENTREGAGRATIS RU5UUkVHQUdSQVRJUw==

The RFC-compliant encoding of ENTREGAGRATISP is RU5UUkVHQUdSQVRJU1A= . ENTREGAGRATISP 的 RFC 兼容编码是ENTREGAGRATISP RU5UUkVHQUdSQVRJU1A=

This further highlights the ambiguity of your input RU5UUkVHQUdSQVRJU1== , which matches neither.这进一步突出了您输入RU5UUkVHQUdSQVRJU1==的歧义,两者都不匹配。

I suggest you check your encoder to determine why it's providing you with non-compliant encodings, and make sure you're not losing information as a result.我建议您检查您的编码器以确定它为什么向您提供不兼容的编码,并确保您不会因此丢失信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM