简体繁体 English

为什么Java和Javascript中的字节数组表示形式不同？

[英]Why different byte array representation in Java and Javascript?

原文 2015-10-13 09:56:58 2 1 javascript/ java/ arrays/ encoding/ utf-8

I was trying to see the UTF-8 bytes of 👍 in both Java and Javascript. 我试图同时在Java和Javascript中看到U的UTF-8字节。

In Javascript, 在Javascript中，

new TextEncoder().encode("👍"); returns => [240, 159, 145, 141] 返回=> [240, 159, 145, 141]

while in Java, 在Java中

"👍".getBytes("UTF-8") returns => [-16, -97, -111, -115] "👍".getBytes("UTF-8")返回=> [-16, -97, -111, -115]

I converted those byte arrays to hex string using methods I found corresponding to the language ( JS , Java ) and both returned F09F918D 我使用与语言（ JS ， Java ）相对应的方法将这些字节数组转换为十六进制字符串，并且均返回F09F918D

In fact, -16 & 0xFF gives => 240 实际上， -16 & 0xFF = = 240

I am curious to know more on why both language chooses different ways of representing byte arrays. 我很好奇有关为什么两种语言选择不同的方式表示字节数组的信息。 It took me a while to figure out up to this. 我花了一段时间才弄清楚这一点。

1 个解决方案

In Java all bytes are signed. 在Java中，所有字节均已签名。 Therefore, the range of one byte is from -128 to 127. In Javascript though, the returned values are, well, simply speaking integers. 因此，一个字节的范围是从-128到127。但是，在Javascript中，返回的值只是简单地讲整数。 So it can be represented in decimal using the full range up to 255. 因此可以使用小数点表示，最大范围为255。

Therefore, if you convert both result to 1 byte hexadecimal representation - those would be the same: F0 9F 91 8D . 因此，如果将两个结果都转换为1字节的十六进制表示形式，则它们将是相同的： F0 9F 91 8D 。

Speaking of why java decided to eliminate unsigned types, that is a separate discussion . 说到Java 为什么决定消除无符号类型，那是另外一个讨论。