[英]Bitwise XOR in Javascript compared to C++
I am porting a simple C++ function to Javascript, but it seems I'm running into problems with the way Javascript handles bitwise operators. 我正在将一个简单的C ++函数移植到Javascript,但是似乎我在Javascript处理按位运算符的方式时遇到了问题。
In C++: 在C ++中:
AnsiString MyClass::Obfuscate(AnsiString source)
{
int sourcelength=source.Length();
for(int i=1;i<=sourcelength;i++)
{
source[i] = source[i] ^ 0xFFF;
}
return source;
}
Obfuscate("test") yields temporary intvalues Obfuscate(“ test”)产生临时整数值
-117, -102, -116, -117
Obfuscate ("test") yields stringvalue 混淆(“测试”)产生字符串值
‹šŒ‹
In Javascript: 用Javascript:
function obfuscate(str)
{
var obfuscated= "";
for (i=0; i<str.length;i++) {
var a = str.charCodeAt(i);
var b = a ^ 0xFFF;
obfuscated= obfuscated+String.fromCharCode(b);
}
return obfuscated;
}
obfuscate("test") yields temporary intvalues obfuscate(“ test”)产生临时整数值
3979 , 3994 , 3980 , 3979
obfuscate("test") yields stringvalue obfuscate(“ test”)产生stringvalue
ྋྚྌྋ
Now, I realize that there are a ton of threads where they point out that Javascript treats all numbers as floats, and bitwise operations involve a temporary cast to 32bit int. 现在,我意识到有很多线程指出Javascript将所有数字视为浮点数,按位运算涉及临时转换为32位int。
It really wouldn't be a problem except for that I'm obfuscating in Javascript and reversing in C++, and the different results don't really match. 除了我在Javascript中混淆并在C ++中反转之外,这确实不是问题,而且不同的结果并不完全匹配。
How do i tranform the Javascript result into the C++ result? 如何将Javascript结果转换为C ++结果? Is there some simple shift available?
有一些简单的班次吗?
Judging from the result that xoring 116
with 0xFFF
gives -117, we have to emulate 2's complement 8-bit integers in javascript: 从将
0xFFF
与116
进行异或得到的结果来看-117,我们必须在javascript中模拟2的补码8位整数:
function obfuscate(str)
{
var bytes = [];
for (var i=0; i<str.length;i++) {
bytes.push( ( ( ( str.charCodeAt(i) ^ 0xFFF ) & 0xFF ) ^ 0x80 ) -0x80 );
}
return bytes;
}
Ok these bytes are interpreted in windows cp 1252 and if they are negative, probably just subtracted from 256. 好的,这些字节在Windows cp 1252中解释,如果为负数,可能只是从256中减去。
var ascii = [
0x0000,0x0001,0x0002,0x0003,0x0004,0x0005,0x0006,0x0007,0x0008,0x0009,0x000A,0x000B,0x000C,0x000D,0x000E,0x000F
,0x0010,0x0011,0x0012,0x0013,0x0014,0x0015,0x0016,0x0017,0x0018,0x0019,0x001A,0x001B,0x001C,0x001D,0x001E,0x001F
,0x0020,0x0021,0x0022,0x0023,0x0024,0x0025,0x0026,0x0027,0x0028,0x0029,0x002A,0x002B,0x002C,0x002D,0x002E,0x002F
,0x0030,0x0031,0x0032,0x0033,0x0034,0x0035,0x0036,0x0037,0x0038,0x0039,0x003A,0x003B,0x003C,0x003D,0x003E,0x003F
,0x0040,0x0041,0x0042,0x0043,0x0044,0x0045,0x0046,0x0047,0x0048,0x0049,0x004A,0x004B,0x004C,0x004D,0x004E,0x004F
,0x0050,0x0051,0x0052,0x0053,0x0054,0x0055,0x0056,0x0057,0x0058,0x0059,0x005A,0x005B,0x005C,0x005D,0x005E,0x005F
,0x0060,0x0061,0x0062,0x0063,0x0064,0x0065,0x0066,0x0067,0x0068,0x0069,0x006A,0x006B,0x006C,0x006D,0x006E,0x006F
,0x0070,0x0071,0x0072,0x0073,0x0074,0x0075,0x0076,0x0077,0x0078,0x0079,0x007A,0x007B,0x007C,0x007D,0x007E,0x007F
];
var cp1252 = ascii.concat([
0x20AC,0xFFFD,0x201A,0x0192,0x201E,0x2026,0x2020,0x2021,0x02C6,0x2030,0x0160,0x2039,0x0152,0xFFFD,0x017D,0xFFFD
,0xFFFD,0x2018,0x2019,0x201C,0x201D,0x2022,0x2013,0x2014,0x02DC,0x2122,0x0161,0x203A,0x0153,0xFFFD,0x017E,0x0178
,0x00A0,0x00A1,0x00A2,0x00A3,0x00A4,0x00A5,0x00A6,0x00A7,0x00A8,0x00A9,0x00AA,0x00AB,0x00AC,0x00AD,0x00AE,0x00AF
,0x00B0,0x00B1,0x00B2,0x00B3,0x00B4,0x00B5,0x00B6,0x00B7,0x00B8,0x00B9,0x00BA,0x00BB,0x00BC,0x00BD,0x00BE,0x00BF
,0x00C0,0x00C1,0x00C2,0x00C3,0x00C4,0x00C5,0x00C6,0x00C7,0x00C8,0x00C9,0x00CA,0x00CB,0x00CC,0x00CD,0x00CE,0x00CF
,0x00D0,0x00D1,0x00D2,0x00D3,0x00D4,0x00D5,0x00D6,0x00D7,0x00D8,0x00D9,0x00DA,0x00DB,0x00DC,0x00DD,0x00DE,0x00DF
,0x00E0,0x00E1,0x00E2,0x00E3,0x00E4,0x00E5,0x00E6,0x00E7,0x00E8,0x00E9,0x00EA,0x00EB,0x00EC,0x00ED,0x00EE,0x00EF
,0x00F0,0x00F1,0x00F2,0x00F3,0x00F4,0x00F5,0x00F6,0x00F7,0x00F8,0x00F9,0x00FA,0x00FB,0x00FC,0x00FD,0x00FE,0x00FF
]);
function toStringCp1252(bytes){
var byte, codePoint, codePoints = [];
for( var i = 0; i < bytes.length; ++i ) {
byte = bytes[i];
if( byte < 0 ) {
byte = 256 + byte;
}
codePoint = cp1252[byte];
codePoints.push( codePoint );
}
return String.fromCharCode.apply( String, codePoints );
}
Result 结果
toStringCp1252(obfuscate("test"))
//"‹šŒ‹"
I assume that AnsiString
is in some form, an array of char
s. 我假设
AnsiString
是某种形式的char
数组。 And this is the problem. 这就是问题所在。 in c ,
char
can typically only hold 8-bits. 在c中 ,
char
通常只能容纳8位。 So when you XOR with 0xfff
, and store the result in a char
, it is the same as XORing with 0xff
. 因此,当您对
0xfff
XOR并将结果存储在char
,与对0xff
进行XORing相同。
This is not the case with javascript . javascript并非如此。 JavaScript using Unicode.
使用Unicode的JavaScript。 This is demonstrated by looking at the integer values:
通过查看整数值可以证明这一点:
-117 == 0x8b
and 3979 == 0xf8b
-117 == 0x8b
和3979 == 0xf8b
I would recommend XORing with 0xff
as this will work in both languages. 我建议与
0xff
进行异或运算,因为这将在两种语言中均适用。 Or you can switch your c++ code to use Unicode. 或者,您可以将c ++代码切换为使用Unicode。
I'm guessing that AnsiString
contains 8-bit characters (since the ANSI character set is 8 bits). 我猜想
AnsiString
包含8位字符(因为ANSI字符集是8位)。 When you assign the result of the XOR back to the string, it is truncated to 8 bits, and so the resulting value is in the range [-128...127]. 当您将XOR的结果分配回字符串时,它会被截断为8位,因此结果值在[-128 ... 127]范围内。
(On some platforms, it could be [0..255], and on others the range could be wider, since it's not specified whether char
is signed or unsigned, or whether it's 8 bits or larger). (在某些平台上,它可能是[0..255],在其他平台上,范围可能会更宽,因为未指定
char
是带符号的还是无符号的,或者是8位或更大)。
Javascript strings contain unicode characters, which can hold a much wider range of values, the result is not truncated to 8 bits. Javascript字符串包含unicode字符,可以容纳更大范围的值,结果不会被截断为8位。 The result of the XOR will have a range of at least 12 bits, [0...4095], hence the large numbers you see there.
XOR的结果的范围至少为12位[0 ... 4095],因此在此处看到的数字很大。
Assuming the original string contains only 8-bit characters, then changing the operation to a ^ 0xff
should give the same results in both languages. 假设原始字符串仅包含8位字符,则将操作更改
a ^ 0xff
应该会在两种语言中产生相同的结果。
First, convert your AnsiString
to wchar_t*
. 首先,将您的
AnsiString
转换为wchar_t*
。 Only then obfuscate its individual characters: 只有这样才能混淆其各个字符:
AnsiString MyClass::Obfuscate(AnsiString source)
{
/// allocate string
int num_wchars = source.WideCharBufSize();
wchar_t* UnicodeString = new wchar_t[num_wchars];
source.WideChar(UnicodeString, source.WideCharBufSize());
/// obfuscate individual characters
int sourcelength=source.Length();
for(int i = 0 ; i < num_wchars ; i++)
{
UnicodeString[i] = UnicodeString[i] ^ 0xFFF;
}
/// create obfuscated AnsiString
AnsiString result = AnsiString(UnicodeString);
/// delete tmp string
delete [] UnicodeString;
return result;
}
Sorry, I'm not an expert on C++ Builder, but my point is simple: in JavaScript you have WCS2 symbols (or UTF-16), so you have to convert AnsiString
to wide chars first. 抱歉,我不是C ++ Builder的专家,但我的意思很简单:在JavaScript中,您有WCS2符号(或UTF-16),因此必须首先将
AnsiString
转换为宽字符。
Try using WideString
instead of AnsiString
尝试使用
WideString
代替AnsiString
I don't know AnsiString
at all, but my guess is this relates to the width of its characters. 我一点都不了解
AnsiString
,但是我猜这与它的字符宽度有关。 Specifically, I suspect they're less than 32 bits wide, and of course in bitwise operations, the width of what you're operating on matters, particularly when dealing with 2's complement numbers. 具体来说,我怀疑它们的宽度小于32位,当然在按位运算中,这就是您要处理的内容的宽度,尤其是在处理2的补数时。
In JavaScript, your "t"
in "test"
is character code 116, which is b00000000000000000000000001110100. 在JavaScript中,
"test"
"t"
是字符代码116,即b00000000000000000000000001110100。 0xFFF (4095) is b00000000000000000000111111111111, and the result you're getting (3979) is b00000000000000000000111110001011. 0xFFF(4095)为b00000000000000000000111111111111,得到的结果(3979)为b000000000000000000001111111101011。 We can readily see that you're getting the right result for the XOR:
我们可以很容易地看到,对于XOR,您得到了正确的结果:
116 = 00000000000000000000000001110100 4095 = 00000000000000000000111111111111 3979 = 00000000000000000000111110001011
So I'm thinking you're getting some truncation or similar in your C++ code, not least because -117 is b10001011 in eight-bit 2's complement...which is exactly what we see as the last eight bits of 3979 above. 因此,我认为您的C ++代码中出现了一些截断或类似现象,尤其是因为-117是8位2的补码中的b10001011……这正是我们在上述3979中的最后8位。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.