简体   繁体   English

Base64解码:特定的字符串不正确的填充(具有正确的填充)

[英]Base64 Decode : Specific String Incorrect Padding (with correct padding)

I am trying to Base64 decode a string (into bytes) using Python's base64.b64decode(str) method: 我正在尝试使用Python的base64.b64decode(str)方法对Base64解码字符串(转换为字节):

46oWrWpy2gTEGwNnN6Ayy 46oWrWpy2gTEGwNnN6Ayy

and I am ensuring it has a multiple of 4 ='s for padding or out of frustration any of these: 并且我确保它具有4的倍数用于填充或出于以下原因而感到沮丧:

46oWrWpy2gTEGwNnN6Ayy= 46oWrWpy2gTEGwNnN6Ayy =

46oWrWpy2gTEGwNnN6Ayy== 46oWrWpy2gTEGwNnN6Ayy ==

46oWrWpy2gTEGwNnN6Ayy=== 46oWrWpy2gTEGwNnN6Ayy ===

46oWrWpy2gTEGwNnN6Ayy================================================== 46oWrWpy2gTEGwNnN6Ayy ================================================= =

and yet I still get "Incorrect Padding" on Python v3.6.1. 但是在Python v3.6.1上仍然出现“错误填充”。 Other strings are fine. 其他字符串也可以。

I show a colleague, he tries on Python 2 and observes the same response. 我向一位同事展示了他尝试使用Python 2并观察到了相同的响应。

I note removing the first "4" is enough to ensure the Base64 decode works. 我注意到删除第一个“ 4”足以确保Base64解码正常工作。

I have skim read Python's docs (noting casefold doesn't apply for Base64) and haven't yet ventured further into RFC3548 but wondered if someone else had encountered something similar before. 我已经略读了Python的文档 (注意casefold不适用于Base64),并且还没有进一步冒险使用RFC3548,但是想知道以前是否有人遇到过类似的情况。 Anyone have any clues :)? 任何人都有任何线索:)? Surely this can't be a bug in Python's Base64 decoder? 当然这不是Python的Base64解码器中的错误吗?

Worked it out. 解决了。

Each character of Base64 text is 6 bits of the raw's 8 bits. Base64文本的每个字符是原始8位中的6位。 If a character is mid-way through the raw's bytes then you are missing some remaining bits. 如果某个字符位于原始字节的中间,则您会丢失一些剩余的位。 The Wikipedia article (and many online answers) seems to use padding as interchangeable for a '0' byte which is not the case (in the Base64 dictionary it should be encoded as an A). Wikipedia文章(以及许多在线答案)似乎将填充与“ 0”字节互换使用,事实并非如此(在Base64词典中应将其编码为A)。

Padding is not interchangeable for missing data. 对于丢失的数据,填充不可互换。

#!/usr/bin/env python3

# We use hexlify for debugging.
import binascii

# We use the Base64 library.
import base64

# Base64 works on multiples of 4 characters..
# ..Sometimes we get 3/2/1 characters and it might be midway through another.
def relaxed_decode_base64(data):

 # If there is already padding we strim it as we calculate padding ourselves.
 if '=' in data:
  data = data[:data.index('=')]

 # We need to add padding, how many bytes are missing.
 missing_padding = len(data) % 4

 # We would be mid-way through a byte.
 if missing_padding == 1:
  data += 'A=='
 # Jut add on the correct length of padding.
 elif missing_padding == 2:
  data += '=='
 elif missing_padding == 3:
  data += '='

 # Actually perform the Base64 decode.
 return base64.b64decode(data)

# Debugging
print(str(relaxed_decode_base64('46oWrWpy2gTEGwNnN6Ayy')) + '\n')

testString = ''

for count in range(0, 1024):
 testString += '/'
 print(str(len(testString)) + ' - ' + testString)
 print(binascii.hexlify(relaxed_decode_base64(testString)))
 input()

Seems to be a problem in your data, not related to Python: 似乎是您数据中的问题,与Python无关:

$ echo 46oWrWpy2gTEGwNnN6Ayy | base64 -d
㪭jrÚÄg7 2base64: invalid input
$ echo 46oWrWpy2gTEGwNnN6Ayy= | base64 -d
㪭jrÚÄg7 2base64: invalid input
$ echo 46oWrWpy2gTEGwNnN6Ayy== | base64 -d
㪭jrÚÄg7 2base64: invalid input
$ echo 46oWrWpy2gTEGwNnN6Ayy=== | base64 -d
㪭jrÚÄg7 2base64: invalid input
$ echo 46oWrWpy2gTEGwNnN6Ayy==== | base64 -d
㪭jrÚÄg7 2base64: invalid input

I managed to decode it this way (removed the last 'y'): 我设法以这种方式对其进行了解码(删除了最后一个“ y”):

$ echo 46oWrWpy2gTEGwNnN6Ay | base64 -d
㪭jrÚÄg7 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM