简体   繁体   中英

BigQuery + Javascript UDF - Not able to manipulate byte array from input

I'm noticing a discrepancy between a javascript function run in Node and a javascript function in a UDF in BigQuery.

I am running the following in BigQuery:

CREATE TEMP FUNCTION testHash(md5Bytes BYTES)
RETURNS BYTES 
LANGUAGE js AS """
md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;
return md5Bytes
""";

SELECT TO_HEX(testHash(MD5("test_phrase")));

and the output ends up being cb5012e39277d48ef0b5c88bded48591 . (This is incorrect )

Running the same code in Node gets cb5012e39277 3 48e b 0b5c88bded48591 (which is the expected value) - notice how 2 of the characters are different.

I've narrowed down the issue to the fact that BigQuery doesn't actually apply the bitwise operators, since the output of not running these bitwise operators in Node is the same incorrect output from BQ:

md5Bytes[6] &= 0x0f;
md5Bytes[6] |= 0x30;
md5Bytes[8] &= 0x3f;
md5Bytes[8] |= 0x80;

Any ideas why the bitwise operators are not being applied to the md5Bytes input to the UDF?

Ths bitwise operations in JavaScript UDF in BigQuery can only be applied to most significant 32 bits as mentioned in the limitations of the JavaScript UDF in this documentation . The MD5 is a hash function algorithm that takes an input and convert it into fixed-length messages of 16 bytes which is equivalent to 128 bits. Since the JavaScript UDF bitwise operations can only be applied to 32 bits that's why it is giving unexpected output.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM