简体   繁体   中英

does base64 encoding hash the input?

I am trying to debug why something is not quite working and observed that b64encode does not seem to work quite as I imagined:

import base64

base64.b64encode( bytes("the cat sat on the mat", "utf-8") )
>> b'dGhlIGNhdCBzYXQgb24gdGhlIG1hdA=='

base64.b64encode( bytes("cat sat on the mat", "utf-8") )
>> b'Y2F0IHNhdCBvbiB0aGUgbWF0'

The second input string has only a small difference at the start, so why is it it that the output for each of these strings contains virtually no similarity? Would have expected only the start of each output to be a bit different.

Base64 maps 3 input bytes to 4 output bytes.

Since you added 4 input bytes, the means all of the remaining bytes "shifted" into different locations in the output.

Notice the == (padding) on the first example which went away on the second.

Try adding or removing multiples of 3 input bytes:

   cat sat on the mat
my cat sat on the mat

Base64 is a fully deterministic, reversible transformation, but it does not operate on a per-character basis (as you can also observe from the output length not being a multiple of the input).

Rather, groups of three bytes (24 bits) are encoded at a time by turning them into four 6-bit numbers (hence base 64 = 2^6). If the input length is not a multiple of three, it is padded and indicated as such by putting = characters at the end of the output.

Therefore, common substrings in different inputs will only show up as a common substring in the output if they are aligned on this three-byte frame, and grouped into the same triples.

the cat sat on the mat
dGhlIGNhdCBzYXQgb24gdGhlIG1hdA==

he cat sat on the mat
aGUgY2F0IHNhdCBvbiB0aGUgbWF0

e cat sat on the mat
ZSBjYXQgc2F0IG9uIHRoZSBtYXQ=

 cat sat on the mat
IGNhdCBzYXQgb24gdGhlIG1hdA==

Observe that if you truncate exactly three characters ("the", leaving the space), the output becomes recognizable again.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM