How to get 'original' bytes of a Java String when read from DataOutputStream.writeUTF()?

Question

Currently I'm transferring a String across the network, using DataInput/OutputStream's. The String I am transferring needs to be converted into a byte array, to be decrypted.

However, since when the string was written using DataOutputStream.writeUTF("foobar"), its byte array contains encoded Java Modified UTF-8 data, which stuffs up the encryption process.

How can I get the original bytes from the Java modified UTF-8 String?

Answer 1

Unicode has several variants, where s-with-^ can either be one character or two: s plus combining-^. Java has a Normalizer class to convert to one specific variant. See http://docs.oracle.com/javase/tutorial/i18n/text/normalizerapi.html or look immediately at the API.

This requires that the original string adheres to one variant. One cannot take bytes and then interprete them as UTF-8, because there are illegal sequences. This was done to prevent recognizing a wrong byte/character when in the middle of a byte sequence.

String normalizedString = Normalizer.normalize(s, Normalizer.Form.NFD);

Answer 2

如果您使用http://docs.oracle.com/javase/1.4.2/docs/api/java/io/DataOutputStream.html#write（byte [ ]，int，int）

How to get 'original' bytes of a Java String when read from DataOutputStream.writeUTF()?

Question

2 answers

solution1
0 ACCPTED 2011-12-27 05:08:57

solution2
0 2011-12-27 05:52:22

How to get 'original' bytes of a Java String when read from DataOutputStream.writeUTF()?

Question

2 answers

solution1 0 ACCPTED 2011-12-27 05:08:57

solution2 0 2011-12-27 05:52:22

solution1
0 ACCPTED 2011-12-27 05:08:57

solution2
0 2011-12-27 05:52:22