SAX解析后保留编码

Question

I have an XML document that contains attributes like the following: 我有一个XML文档，其中包含如下属性：

<Tag Body="&lt;p&gt;">

I want to preserve the text in the Body attribute exactly as-is; 我想按原样保留Body属性中的文本； however, the parsing method is converting the text to "<p>". 但是，解析方法将文本转换为“ <p>”。 I want to keep the "&", "l", "t", ";", etc. 我要保留“＆”，“ l”，“ t”，“;”等。

I'm using the Java SAX API to parse the XML document like so: 我正在使用Java SAX API来解析XML文档，如下所示：

    SAXParserFactory spf = SAXParserFactory.newInstance();
    SAXParser saxParser = spf.newSAXParser();
    XMLReader xmlReader = saxParser.getXMLReader();
    xmlReader.setContentHandler(new MyHandler());
    xmlReader.setErrorHandler(new MyErrorHandler(System.err));
    xmlReader.parse(convertToFileURL(myFileName));

The relevant code in MyHandler.java is: MyHandler.java的相关代码是：

public void startElement(String namespaceURI, String localName, String qName, Attributes atts)
throws SAXException
{
    if (qName.equals("Tag")){
        String Body = atts.getValue("Body");
        char []s = Body.toCharArray();  // s[0] will be "<", but I want it to be "&"
    }
}

How can I get the parsing method to leave the attribute text alone and not try to convert anything? 如何获得解析方法以不使用属性文本，而不尝试转换任何内容？

Answer 1

I'll answer my own question. 我会回答我自己的问题。

I didn't find a way to stop the parser from unescaping the text to begin with, but I did find a workaround (thatnks @user1516873) to re-escape it afterwards using Apache Commons: 我没有找到阻止解析器开始对文本进行转义的方法，但是我确实找到了一种变通方法（thatnks @ user1516873），之后使用Apache Commons重新转义它：

String Body = atts.getValue("Body");
String Body_escaped = StringEscapeUtils.escapeXml(Body);

This achieves the desired results. 这样可以达到预期的效果。

SAX解析后保留编码

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-10-30 18:40:48

SAX解析后保留编码

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-10-30 18:40:48

解决方案1
0 已采纳 2013-10-30 18:40:48