[英]Preserve encoding after SAX parsing
I have an XML document that contains attributes like the following: 我有一个XML文档,其中包含如下属性:
<Tag Body="<p>">
I want to preserve the text in the Body attribute exactly as-is; 我想按原样保留Body属性中的文本; however, the parsing method is converting the text to "<p>".
但是,解析方法将文本转换为“ <p>”。 I want to keep the "&", "l", "t", ";", etc.
我要保留“&”,“ l”,“ t”,“;”等。
I'm using the Java SAX API to parse the XML document like so: 我正在使用Java SAX API来解析XML文档,如下所示:
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser saxParser = spf.newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
xmlReader.setContentHandler(new MyHandler());
xmlReader.setErrorHandler(new MyErrorHandler(System.err));
xmlReader.parse(convertToFileURL(myFileName));
The relevant code in MyHandler.java
is: MyHandler.java
的相关代码是:
public void startElement(String namespaceURI, String localName, String qName, Attributes atts)
throws SAXException
{
if (qName.equals("Tag")){
String Body = atts.getValue("Body");
char []s = Body.toCharArray(); // s[0] will be "<", but I want it to be "&"
}
}
How can I get the parsing method to leave the attribute text alone and not try to convert anything? 如何获得解析方法以不使用属性文本,而不尝试转换任何内容?
I'll answer my own question. 我会回答我自己的问题。
I didn't find a way to stop the parser from unescaping the text to begin with, but I did find a workaround (thatnks @user1516873) to re-escape it afterwards using Apache Commons: 我没有找到阻止解析器开始对文本进行转义的方法,但是我确实找到了一种变通方法(thatnks @ user1516873),之后使用Apache Commons重新转义它:
String Body = atts.getValue("Body");
String Body_escaped = StringEscapeUtils.escapeXml(Body);
This achieves the desired results. 这样可以达到预期的效果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.