简体   繁体   English

使用UTF-8将Java对象序列化为字符串

[英]Serialize Java Object into String using UTF-8

I am trying to write a function which serialize an Java object into a String using UTF-8 encoding. 我正在尝试编写一个使用UTF-8编码将Java对象序列化为String的函数。 This is my implementation: 这是我的实现:

public static String serializeToString(DefaultMutableTreeNode tree) {
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    try {
        ObjectOutput out = new ObjectOutputStream(byteArrayOutputStream);
        out.writeObject(tree);
        return byteArrayOutputStream.toString("UTF-8");
    } catch (IOException e) {
        return null;
    }
}

However, it doesn't seem to work. 但是,它似乎不起作用。 I tried to pass the resulting String into a database which only accept UTF-8 encoding but failed with an error with encoding problem. 我试图将生成的String传递到仅接受UTF-8编码但由于编码问题而失败的数据库。

My questions are: 我的问题是:

  1. What is the problem of my implementation? 我的实施有什么问题?
  2. How can I examine if the resulting String is in UTF-8 or not? 如何检查结果字符串是否为UTF-8?

Many thanks 非常感谢

Regards 问候

This is not a good idea, an arbitrary binary array doesn't always translate into a valid UTF-8 sequence. 这不是一个好主意,任意二进制数组并不总是转换为有效的UTF-8序列。 You should rather put the array in the database as a binary blob, or transform the array into a string with something like a Base64 encoding. 您应该将数组作为二进制Blob放入数据库中,或者将数组转换为类似Base64编码的字符串。

You are bound to get unprintable characters in your string, which the DB won't like at all. 您一定会在字符串中得到不可打印的字符,DB根本不会喜欢这些字符。 The Java ByteArrayOutputStream documentation sort-of hints that it might recode the unprintable characters as printable, but, looking at the code, I can't see that it does anything but stop the program with an error. Java ByteArrayOutputStream文档的排序提示可能会将无法打印的字符重新编码为可打印,但是在代码中,我看不到除了停止程序并出现错误之外没有任何作用。 Nor can I see what you would do with such a string in the future. 我也看不到您将来如何使用这样的字符串。

Only a part (about a quarter) of the 256 possible values of a byte are valid ASCII characters. 字节的256个可能值中只有一部分(大约四分之一)是有效的ASCII字符。 Most databases won't take them as part of a character string. 大多数数据库不会将它们作为字符串的一部分。 Hence your error message. 因此,您的错误消息。 (Unicode, and UTF-8 have the same problem.) (Unicode和UTF-8有相同的问题。)

I did once store binary data on a database by converting it to printable characters by converting every 6 bits to a byte containing a printable character. 我曾经通过将每6位转换为一个包含可打印字符的字节来将二进制数据转换为可打印字符来将二进制数据存储在数据库中。 But I used simple ASCII encoding, and I wrote code to convert the characters back to binary. 但是我使用了简单的ASCII编码,并且编写了将字符转换二进制的代码。 I was then able to store binary data in a database character column and retrieve it later. 然后,我能够将二进制数据存储在数据库字符列中,并在以后检索它。 I was rather forced into it; 我宁愿被逼进去; I wouldn't recommend you do it. 我不建议您这样做。

If you want to see what your "character string" looks like, just print out each byte as an integer and compare it to an ASCII table. 如果要查看“字符字符串”的样子,只需将每个字节打印为整数,然后将其与ASCII表进行比较即可。 You'll probably see the problem without needing to consider the fine points of Unicode. 您可能会发现问题而无需考虑Unicode的优点。

I am trying to write a function which serialize an Java object into a String using UTF-8 encoding. 我正在尝试编写一个使用UTF-8编码将Java对象序列化为String的函数。

Yes ... well what your code is actually doing is serializing the object to bytes, and then telling the String constructor "these bytes are a valid UTF-8 encoding of some Unicode code points". 是的……您的代码实际上正在执行的操作是将对象序列化为字节,然后告诉String构造函数“这些字节是某些Unicode代码点的有效UTF-8编码”。 The problem is that (generally speaking) they are NOT ... and when the UTF-8 decoder attempts to convert them to the UTF-16 representation used in a Java String, it finds sequences that are invalid and replaces them with an "invalid character" codepoint. 问题是(通常来说)它们不是...,并且当UTF-8解码器尝试将它们转换为Java String中使用的UTF-16表示形式时,它会找到无效的序列并将其替换为“无效”字符”代码点。

If you want to represent arbitrary bytes as a Java String, then you need to use something like base64 encoding. 如果要将任意字节表示为Java字符串,则需要使用诸如base64编码之类的东西。 A better idea would be to put the bytes into the database as a Blob. 更好的主意是将字节作为Blob放入数据库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM