简体   繁体   English

等效于Java中的C ++基本字符串

[英]Equivalent of C++ basic String in java

I have a function which is returning byte array in both C++ and Java, the logic of the function is same. 我有一个函数,它在C ++和Java中都返回字节数组,该函数的逻辑是相同的。

Given that the byte array which is returned is same, when I print the array after converting to a basic string like: 鉴于返回的字节数组是相同的,当我在转换成如下基本字符串后打印该数组时:

std::string str(byteArray,byteArray+len)

I am able to see the output properly, but when I do something like: 我能够正确看到输出,但是当我执行以下操作时:

new String(byteArray,"UTF-8")

I get some unknown characters on the terminal. 我在终端上收到一些未知字符。 How to retrieve the same output as that of C++? 如何检索与C ++相同的输出?

Here's the problem. 这是问题所在。 When you do this: 执行此操作时:

    new String(byteArray,"UTF-8")

you are saying to the runtime system this: 您对运行时系统说的是:

The byte array contains character data that has been encoded as UTF-8. 字节数组包含已编码为UTF-8的字符数据。 Convert it into a sequence of Unicode codepoints 1 and give them to me as a Java String . 将其转换为Unicode代码点1的序列,并以Java String提供给我。

But the bytes in the byte array are clearly NOT a well-formed UTF-8 sequence, because you are getting stuff that looks like garbage. 但是字节数组中的字节显然不是格式正确的UTF-8序列,因为您得到的东西看起来像垃圾。

So what is going on? 那么发生了什么? Well I think that there are two possibilities: 好吧,我认为有两种可能性:

  1. The bytes in the array could actually be characters in a different character encoding. 数组中的字节实际上可以是采用不同字符编码的字符。 It is clearly not ASCII data because pure 7-bit ASCII is also well-formed as UTF-8. 它显然不是ASCII数据,因为纯7位ASCII的格式也很正确,即UTF-8。 But the bytes could be encoded in some other character encoding. 但是字节可以用其他字符编码来编码。 (If we actually had the byte values, we might be able to make an educated guess as to which encoding was used.) (如果实际上有字节值,我们也许可以对使用哪种编码做出有根据的猜测。)

  2. The bytes in the array could actually be garbled. 数组中的字节实际上可能是乱码。 You say that they were obtained by decrypting AES encrypted data. 您说它们是通过解密AES加密数据获得的。 But if you somehow got the decryption incorrect (eg you used the wrong key), then you would end up with garbled stuff. 但是,如果您以某种方式使解密不正确(例如,使用了错误的密钥),那么最终将出现乱码。

Finally, the closest equivalent in Java to std::string str(byteArray,byteArray+len) is this: 最后,Java中与std::string str(byteArray,byteArray+len)最接近的等效项是:

  new String(byteArray, "LATIN-1")

This is because each encoded byte in an LATIN-1 sequence is equal in value to the equivalent Unicode code point. 这是因为LATIN-1序列中的每个编码字节的值都等于等效的Unicode代码点。

Whether it is unclear whether that would actually work in your case. 是否不清楚在您的情况下是否真的可行? Certainly, it won't work if the bytes were garbled due to an incorrect encryption or decryption. 当然,如果字节由于错误的加密或解密而出现乱码,将无法正常工作。 Or garbling of the encrypted data in transmission. 或在传输过程中盗用加密数据。


1 - actually, UTF-16 code units ... but that's another story. 1-实际上是UTF-16代码单元 ...但这是另一回事了。

In java I convert byte array like below. 在Java中,我将字节数组转换如下。 This "UTF-8" might create a problem in your case. 在您的情况下,此"UTF-8"可能会引起问题。

new String(byteArray);

Also try with 也可以尝试

 new String(byteArray,"UTF-16");

If both the above does not work you can try with below:- 如果以上两种方法均无效,则可以尝试以下方法:-

 UnicodeEncoding uEncoding = new UnicodeEncoding();
 string stringContent=uEncoding.GetString(byteArray);

also for detail read http://www.oracle.com/us/technologies/java/supplementary-142654.html 另请参阅http://www.oracle.com/us/technologies/java/supplementary-142654.html

So, here goes the solution, the problem here was the decryption wasn't properly going through, it wasn't complete but partial, hence there were characters which we could make sense of and the rest were junk, the blunder which i did was using SHA-512 as the message digest algorithm while encryption and MD-5 while decryption. 因此,解决方案来了,这里的问题是解密没有正确进行,不是完整而是部分,因此有些字符我们可以理解,其余字符都是垃圾,我做的错误是在加密时使用SHA-512作为消息摘要算法,在解密时使用MD-5

Cheers!! 干杯!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM