简体   繁体   English

将 DOM 元素编码从 CP1251 转换为 UTF-8

[英]Convert DOM element encoding from CP1251 to UTF-8

I have a simple server-side code that takes request xml and inserts it as string into Oracle database Clob column.我有一个简单的服务器端代码,它接受请求 xml 并将其作为字符串插入到 Oracle 数据库 Clob 列中。 The problem is that client-side sends request xml with CP1251 encoded text, but I need to insert it into Oracle with UTF-8 encoding.问题是客户端发送带有 CP1251 编码文本的请求 xml,但我需要将它插入到带有 UTF-8 编码的 Oracle 中。 Now the code that I use for CP1251 is:现在我用于 CP1251 的代码是:

        Element soapinElement = (Element) streams.getSoapin().getValue().getAny();  //retrieve request xml      
        Node node = (Node) soapinElement;
        Document document = node.getOwnerDocument();
        DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();         
        LSSerializer serializer = domImplLS.createLSSerializer();
        LSOutput output = domImplLS.createLSOutput();
        output.setEncoding("CP1251");
        Writer stringWriter = new StringWriter();
        output.setCharacterStream(stringWriter);
        serializer.write(document, output);
        String soapinString = stringWriter.toString();

This code recognizes text encoded in CP1251.此代码识别以 CP1251 编码的文本。 The task is to make the same but with readable text encoded in UTF-8.任务是使用以 UTF-8 编码的相同但可读的文本。 Please suggest any ideas.请提出任何想法。

I tried this, but it produced unreadable characters instead of cyrillic:我试过这个,但它产生了不可读的字符而不是西里尔文:

        Element soapinElement = (Element)   streams.getSoapin().getValue().getAny();            
        Node node = (Node) soapinElement;
        Document document = node.getOwnerDocument();
        DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();         
        LSSerializer serializer = domImplLS.createLSSerializer();
        LSOutput output = domImplLS.createLSOutput();
        output.setEncoding("CP1251");
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        output.setByteStream(byteArrayOutputStream);
        serializer.write(document, output);
        byte[] result = byteArrayOutputStream.toByteArray();
        InputStream is = new ByteArrayInputStream(result);
        Reader reader = new InputStreamReader(is, "CP1251");
        OutputStream out = new ByteArrayOutputStream();
        Writer writer = new OutputStreamWriter(out, "UTF-8");
        char[] buffer = new char[10];
        int read;
        while ((read = reader.read(buffer)) != -1) {
            writer.write(buffer, 0, read);
        }           
        reader.close();
        writer.close();
        String soapinString = out.toString();

You can decode the CP1251 characterset Data like below您可以像下面这样解码 CP1251 字符集数据

Charset utf8charset = Charset.forName("UTF-8");
Charset cp1251charset = Charset.forName("CP1251");

// decode CP1251
        CharBuffer data = cp1251charset.decode(ByteBuffer.wrap(result));

and encode to UTF-8 character set并编码为 UTF-8 字符集

// encode UTF-8
        ByteBuffer outputBuffer = utf8charset.encode(data);

and convert the ByteBuffer to byte[]并将 ByteBuffer 转换为 byte[]

// UTF-8 Value        
        byte[] outputData = outputBuffer.array();

This should probably solve your issue.这应该可以解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM