[英]Convert DOM element encoding from CP1251 to UTF-8
我有一个简单的服务器端代码,它接受请求 xml 并将其作为字符串插入到 Oracle 数据库 Clob 列中。 问题是客户端发送带有 CP1251 编码文本的请求 xml,但我需要将它插入到带有 UTF-8 编码的 Oracle 中。 现在我用于 CP1251 的代码是:
Element soapinElement = (Element) streams.getSoapin().getValue().getAny(); //retrieve request xml
Node node = (Node) soapinElement;
Document document = node.getOwnerDocument();
DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();
LSSerializer serializer = domImplLS.createLSSerializer();
LSOutput output = domImplLS.createLSOutput();
output.setEncoding("CP1251");
Writer stringWriter = new StringWriter();
output.setCharacterStream(stringWriter);
serializer.write(document, output);
String soapinString = stringWriter.toString();
此代码识别以 CP1251 编码的文本。 任务是使用以 UTF-8 编码的相同但可读的文本。 请提出任何想法。
我试过这个,但它产生了不可读的字符而不是西里尔文:
Element soapinElement = (Element) streams.getSoapin().getValue().getAny();
Node node = (Node) soapinElement;
Document document = node.getOwnerDocument();
DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation();
LSSerializer serializer = domImplLS.createLSSerializer();
LSOutput output = domImplLS.createLSOutput();
output.setEncoding("CP1251");
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
output.setByteStream(byteArrayOutputStream);
serializer.write(document, output);
byte[] result = byteArrayOutputStream.toByteArray();
InputStream is = new ByteArrayInputStream(result);
Reader reader = new InputStreamReader(is, "CP1251");
OutputStream out = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(out, "UTF-8");
char[] buffer = new char[10];
int read;
while ((read = reader.read(buffer)) != -1) {
writer.write(buffer, 0, read);
}
reader.close();
writer.close();
String soapinString = out.toString();
您可以像下面这样解码 CP1251 字符集数据
Charset utf8charset = Charset.forName("UTF-8");
Charset cp1251charset = Charset.forName("CP1251");
// decode CP1251
CharBuffer data = cp1251charset.decode(ByteBuffer.wrap(result));
并编码为 UTF-8 字符集
// encode UTF-8
ByteBuffer outputBuffer = utf8charset.encode(data);
并将 ByteBuffer 转换为 byte[]
// UTF-8 Value
byte[] outputData = outputBuffer.array();
这应该可以解决您的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.