如何使用Java中的编码将Clob转换为字符串

Question

We are doing massive batch of xml processing and the logic to convert clob to string is shown below. 我们正在做大量的xml处理，将clob转换为字符串的逻辑如下所示。

import java.sql.Clob
import org.apache.commons.io.IOUtils

String extractXml(Clob xmlClob) {

    log.info "DefaultCharset: " + groovy.util.CharsetToolkit.getDefaultSystemCharset()

    String sourceXml
    try {
        sourceXml = new String(IOUtils.toByteArray(xmlClob?.getCharacterStream()), encoding)            // 1. Encoding not working
        sourceXml = new String(IOUtils.toByteArray(xmlClob?.getCharacterStream(), encoding), encoding)  // 2. Encoding working
    } catch (Exception e) {
        ...
    }

    return sourceXml
}

My queries: 我的查询：

a. 一种。 I am not sure why (1) doesn't work even though I am using getCharacterStream() instead of getAsciiStream(). 我不确定为什么（1）即使我使用getCharacterStream（）而不是getAsciiStream（）也不起作用。 but (2) seems to work fine may be I am using explicit overriding of system encoding ? 但是（2）似乎工作正常，可能是我在使用系统编码的显式覆盖吗？

b. b。 The solution (2) looks bit odd as you are specifing 2 times the encoding format (one for bytes array and one for string creation). 解决方案（2）看起来有点奇怪，因为您指定了2倍的编码格式（一个用于字节数组，一个用于字符串创建）。 I am not sure if there are any performance issues or wondered if there are better ways to write them? 我不确定是否存在任何性能问题，或者不知道是否存在更好的编写方法？

c. C。 I thought of not using the Apache-commons libraries and use a simple java package solution. 我想到了不使用Apache通用库，而是使用简单的Java包解决方案。 But the suprising thing is, I did not give any explicit encoding but it seems to work perfectly. 但令人惊讶的是，我没有给出任何明确的编码，但它似乎完美地工作了。 Is it because It does "streams character -> straight to string buffering" ? 是否因为它确实“将字符->直接流到字符串缓冲”？

/*
 * working perfectly and retuns encoding correctly
 */
String extractXmlWithoutApacheCommons(Clob xmlClob) {

    log.info "DefaultCharset: " + groovy.util.CharsetToolkit.getDefaultSystemCharset()

    StringBuffer sb = new StringBuffer((int) xmlClob.length())
    try {
        Reader r = xmlClob.getCharacterStream()
        char[] cbuf = new char[2048]
        int n = 0

        while ((n = r.read(cbuf, 0, cbuf.length)) != -1) {
            if (n > 0) {
                sb.append(cbuf, 0, n)
            }
        }

    } catch (Exception e) {
        ...
    }

    return sb.toString()
}

Can you guys please shed some light to understand them. 你们能帮我理解一下吗？

Answer 1

The Clob already has an encoding. Clob已具有编码。 It's whatever you've specified in the database, and once you read it on Java side it'll be a String (with the implicit UTF-16 encoding, not that it matters at all). 不论您在数据库中指定了什么，一旦在Java端读取它，它都会是一个String （具有隐式UTF-16编码，一点也不重要）。

Whatever you think you're doing with all those encoding tricks is wrong and useless. 无论您认为使用所有这些编码技巧做什么，都是错误且无用的。 You only need to specify an encoding when turning bytes to chars or the other way around. 将bytes为chars或其他方式时，只需指定一种编码。 You're dealing with chars only (except in your first example where you for some unknown reason want to turn them to bytes). 您仅在处理chars （在第一个示例中，出于某些未知原因，您希望将其转换为字节）。

If you want to use IOUtils , then readFully(Reader input, char[] buffer) would be the method to use. 如果要使用IOUtils ，则将使用readFully(Reader input, char[] buffer) 。

The platform default encoding has no effect in this whole question, since you shouldn't be working with bytes at all. 平台默认编码在整个问题中均无效，因为您根本不应该使用字节。

Edit: A slightly more modern way with the standard JDK classes would be to use Reader.read(CharBuffer target) like 编辑：与标准的JDK类稍微更现代的方式是使用Reader.read(CharBuffer target)像

CharBuffer cb = CharBuffer.allocate((int) xmlClob.length());
while(r.read(cb) != -1)
    ;
return cb.toString();

but it doesn't really make a huge difference (it's a bit nicer looking). 但这并没有太大的不同（看起来更好）。

如何使用Java中的编码将Clob转换为字符串

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-07-20 13:56:01

如何使用Java中的编码将Clob转换为字符串

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-07-20 13:56:01

解决方案1
2 已采纳 2017-07-20 13:56:01