简体   繁体   English

base64编码哈希输入?

[英]does base64 encoding hash the input?

I am trying to debug why something is not quite working and observed that b64encode does not seem to work quite as I imagined: 我正在尝试调试为什么某些东西不能正常工作,并观察到b64encode似乎没有像我想象的那样工作:

import base64

base64.b64encode( bytes("the cat sat on the mat", "utf-8") )
>> b'dGhlIGNhdCBzYXQgb24gdGhlIG1hdA=='

base64.b64encode( bytes("cat sat on the mat", "utf-8") )
>> b'Y2F0IHNhdCBvbiB0aGUgbWF0'

The second input string has only a small difference at the start, so why is it it that the output for each of these strings contains virtually no similarity? 第二个输入字符串在开始时只有很小的差异,那么为什么每个字符串的输出几乎不包含相似性? Would have expected only the start of each output to be a bit different. 本来预计只有每个输出的开始有点不同。

Base64 maps 3 input bytes to 4 output bytes. Base64将3个输入字节映射到4个输出字节。

Since you added 4 input bytes, the means all of the remaining bytes "shifted" into different locations in the output. 由于您添加了4个输入字节,因此意味着所有剩余字节“移位”到输出中的不同位置。

Notice the == (padding) on the first example which went away on the second. 注意第一个例子上的== (填充)在第二个例子上消失了。

Try adding or removing multiples of 3 input bytes: 尝试添加或删除3个输入字节的倍数:

   cat sat on the mat
my cat sat on the mat

Base64 is a fully deterministic, reversible transformation, but it does not operate on a per-character basis (as you can also observe from the output length not being a multiple of the input). Base64是一个完全确定的可逆转换,但它不是基于每个字符运行的(因为您也可以从输出长度观察不是输入的倍数)。

Rather, groups of three bytes (24 bits) are encoded at a time by turning them into four 6-bit numbers (hence base 64 = 2^6). 相反,通过将三个字节(24比特)组成四个6比特数(因此基数64 = 2 ^ 6),一次编码三个字节(24比特)的组。 If the input length is not a multiple of three, it is padded and indicated as such by putting = characters at the end of the output. 如果输入长度不是三的倍数,则填充并通过在输出的末尾加上=字符来表示。

Therefore, common substrings in different inputs will only show up as a common substring in the output if they are aligned on this three-byte frame, and grouped into the same triples. 因此,如果在这个三字节帧上对齐,则不同输入中的公共子串只会在输出中显示为公共子串,并分组为相同的三元组。

the cat sat on the mat
dGhlIGNhdCBzYXQgb24gdGhlIG1hdA==

he cat sat on the mat
aGUgY2F0IHNhdCBvbiB0aGUgbWF0

e cat sat on the mat
ZSBjYXQgc2F0IG9uIHRoZSBtYXQ=

 cat sat on the mat
IGNhdCBzYXQgb24gdGhlIG1hdA==

Observe that if you truncate exactly three characters ("the", leaving the space), the output becomes recognizable again. 请注意,如果您截断正好三个字符(“the”,留下空格),输出将再次被识别。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM