简体   繁体   中英

Convert Erlang UTF-8 encoded string to java.lang.String

The Java node receives an Erlang string encoded in UTF-8. Its class type is OtpErlangString . If I simply do .toString() or .stringValue() the resulting java.lang.String has invalid codepoints (basically every byte from the Erlang string is considered distinct character).

Now, I want to use new String(bytes, "UTF-8") when creating the Java String but how to get the bytes from the OtpErlangString ?

It's strange you get OtpErlangString on Java side when you use UTF8 characters. I get object of this type if I use ASCII characters only. If I add at least one UTF8 character, the resulting type is OtpErlangList (which is logical as strings are just lists of ints in Erlang) and then I can use its stringValue() method. So that after sending string form Erlang like:

(waco@host)8> {proc, java1@host} ! "ąćśźżęółńa".
[261,263,347,378,380,281,243,322,324,97]

On Java node I receive and print it with:

OtpErlangList l = (OtpErlangList) mbox.receive();
System.out.println(l.stringValue());

The output is correct:

ąćśźżęółńa

However, if its not the case in your situation, you could try to work it around by forcing OtpErlangList representation by eg adding an empty tuple as the very first element of the string list:

(waco@wborowiec)11> {proc, java1@wborowiec} ! [{}] ++ "ąćśźżęółńa".
[{},261,263,347,378,380,281,243,322,324,97]

And on Java side something like:

OtpErlangList l = (OtpErlangList) mbox.receive();
// get rid of an extra tuple
OtpErlangObject[] strArr = Arrays.copyOfRange(l.elements(), 1, l.elements().length);
OtpErlangList l2 = new OtpErlangList(strArr);
System.out.println(l2.stringValue());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM