Java CharsetDecoder在每個字符后插入空格

Question

我正在嘗試使用此代碼（在Stackoverflow上找到）刪除無效的UTF-8字符：

def text = file.text
CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();
utf8Decoder.onMalformedInput(CodingErrorAction.IGNORE);
utf8Decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
ByteBuffer bytes = ByteBuffer.allocate(text.getBytes().length * 2)
CharBuffer cbuf = bytes.asCharBuffer()
cbuf.put(text)
cbuf.flip()
CharBuffer parsed = utf8Decoder.decode(bytes);
println parsed.toString()

我得到的輸出看起來像這樣：

 < d o c u m e n t >
     < t i t l e > S o me  T i t l e   < / t i t l e >
     < s i t e > A S i t e < / s i t e >

關於它為何如此表現的任何想法？

Answer 1

不知道為什么這不起作用，但這是修復它的代碼（代碼在Groovy中，而不是Java）：

file.withInputStream { stream ->
    CharsetDecoder utf8Decoder = Charset.forName("UTF-8").newDecoder();
    utf8Decoder.onMalformedInput(CodingErrorAction.IGNORE);
    utf8Decoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
    def reader = new BufferedReader(new InputStreamReader(stream, utf8Decoder))
    def line = null

    def sb = new StringBuilder()
    while ( (line = reader.readLine()) != null) {
        sb.append("$line\n")
    }
    reader.close()
}

Java CharsetDecoder在每個字符后插入空格

問題描述

1 個解決方案

解決方案1
1 2014-06-26 10:31:05

Java CharsetDecoder在每個字符后插入空格

問題描述

1 個解決方案

解決方案1 1 2014-06-26 10:31:05

解決方案1
1 2014-06-26 10:31:05