[英]Saxparser not parsing HTML numeric character reference. question mark displayed
I am trying to Parse an xml which is containing — and ’ numeric character references. 我正在尝试解析包含—和&#8217数字字符引用的xml。 On parsing it gives me output as "?".
解析后,输出为“?”。 it is not only these two, any HTML/XMl numeric character references in the xml creates this issue.
不仅是这两个,xml中的任何HTML / XMl数字字符引用都会造成此问题。 only pre-defined entities are getting accepted by the saxparser
只有预定义的实体被saxparser接受
i use defaulthandler saxparser. 我使用defaulthandler saxparser。 system out in character method shows me a question mark for the numeric character references.
系统以字符方式显示给我一个数字字符引用的问号。
i did lot of googling, everywhere i see that usage of numberic character refernce should not create any issue. 我做了很多谷歌搜索,到处都看到使用数字字符引用不会造成任何问题。
Any help? 有什么帮助吗?
System.out
in character method shows me a question mark for the numeric character references. System.out
字符方法向我显示了数字字符引用的问号。
That sounds like a character encoding problem of your output / console. 这听起来像是您的输出/控制台的字符编码问题。 The following works with JSE 7
以下适用于JSE 7
public static void main(String[] args) throws Exception{ SAXParser parser = SAXParserFactory.newInstance().newSAXParser(); 公共静态void main(String [] args)引发异常{SAXParser parser = SAXParserFactory.newInstance()。newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(new ContentHandler() {
// other methods omitted
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
System.out.println(new String(ch, start, length));
}
});
FileReader fReader = new FileReader("/tmp/HelloWorld.xml");
reader.parse(new InputSource(fReader));
fReader.close();
}
With XML File: 使用XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<Test>
Hello World’
</Test>
Output: Hello World' 输出:Hello World'
Have you tried to look at the incomming character array using a debugger? 您是否尝试过使用调试器查看传入的字符数组?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.