[英]Multi-byte character XML entity
I'm having a problem encoding a multi-byte character to an XML document 我在将多字节字符编码为XML文档时遇到问题
import java.io.ByteArrayOutputStream;
import java.io.UnsupportedEncodingException;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
public class XmlWriter {
static final XMLOutputFactory outputFactory = XMLOutputFactory.newFactory();
static XMLStreamWriter streamWriter;
public static String Write(String s) throws XMLStreamException, UnsupportedEncodingException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
streamWriter = outputFactory.createXMLStreamWriter(out, "utf-16");
streamWriter.writeCharacters(s);
streamWriter.flush();
return new String(out.toByteArray());
}
}
public class XmlWriterTest extends TestCase {
public void testWrite() throws Exception {
System.out.println("Write");
String s = "\uD803\uDC22";
String expResult = "𐰢";
String result = XmlWriter.Write(s);
assertEquals(expResult, result);
}
I've tried many contortions of charsets etc but to no avail; 我已经尝试过许多扭曲字符集的方法,但是都没有用; I keep getting an output of
我不断得到输出
��
�&#xdc22
This is part of an application which generates an Excel Workbook (*.xlsx) and is failing when the document is opened in Excel due to these characters. 这是生成Excel工作簿(* .xlsx)的应用程序的一部分,由于这些字符,在Excel中打开文档时失败。
What can I do to achieve the correct XML entity? 我该怎么做才能获得正确的XML实体? I was hoping that this would be handled by the XML library (the original code used Apache's
StringEscapeUtils.escapeXml()
). 我希望这将由XML库处理(原始代码使用Apache的
StringEscapeUtils.escapeXml()
)。
The string constructor you are using (new String(byte[])) uses the platform default encoding. 您正在使用的字符串构造函数(new String(byte []))使用平台默认编码。 Try specifying the encoding in an alternate c-tor (new String(byte[], Charset) or new String(byte[], String)
尝试在备用c-tor中指定编码(新字符串(字节[],字符串)或新字符串(字节[],字符串)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.