簡體   English   中英

將 DOM 元素編碼從 CP1251 轉換為 UTF-8

[英]Convert DOM element encoding from CP1251 to UTF-8

我有一個簡單的服務器端代碼,它接受請求 xml 並將其作為字符串插入到 Oracle 數據庫 Clob 列中。 問題是客戶端發送帶有 CP1251 編碼文本的請求 xml,但我需要將它插入到帶有 UTF-8 編碼的 Oracle 中。 現在我用於 CP1251 的代碼是:

        Element soapinElement = (Element) streams.getSoapin().getValue().getAny();  //retrieve request xml      
        Node node = (Node) soapinElement;
        Document document = node.getOwnerDocument();
        DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();         
        LSSerializer serializer = domImplLS.createLSSerializer();
        LSOutput output = domImplLS.createLSOutput();
        output.setEncoding("CP1251");
        Writer stringWriter = new StringWriter();
        output.setCharacterStream(stringWriter);
        serializer.write(document, output);
        String soapinString = stringWriter.toString();

此代碼識別以 CP1251 編碼的文本。 任務是使用以 UTF-8 編碼的相同但可讀的文本。 請提出任何想法。

我試過這個,但它產生了不可讀的字符而不是西里爾文:

        Element soapinElement = (Element)   streams.getSoapin().getValue().getAny();            
        Node node = (Node) soapinElement;
        Document document = node.getOwnerDocument();
        DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();         
        LSSerializer serializer = domImplLS.createLSSerializer();
        LSOutput output = domImplLS.createLSOutput();
        output.setEncoding("CP1251");
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        output.setByteStream(byteArrayOutputStream);
        serializer.write(document, output);
        byte[] result = byteArrayOutputStream.toByteArray();
        InputStream is = new ByteArrayInputStream(result);
        Reader reader = new InputStreamReader(is, "CP1251");
        OutputStream out = new ByteArrayOutputStream();
        Writer writer = new OutputStreamWriter(out, "UTF-8");
        char[] buffer = new char[10];
        int read;
        while ((read = reader.read(buffer)) != -1) {
            writer.write(buffer, 0, read);
        }           
        reader.close();
        writer.close();
        String soapinString = out.toString();

您可以像下面這樣解碼 CP1251 字符集數據

Charset utf8charset = Charset.forName("UTF-8");
Charset cp1251charset = Charset.forName("CP1251");

// decode CP1251
        CharBuffer data = cp1251charset.decode(ByteBuffer.wrap(result));

並編碼為 UTF-8 字符集

// encode UTF-8
        ByteBuffer outputBuffer = utf8charset.encode(data);

並將 ByteBuffer 轉換為 byte[]

// UTF-8 Value        
        byte[] outputData = outputBuffer.array();

這應該可以解決您的問題。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM