简体   繁体   English

SAX解析后保留编码

[英]Preserve encoding after SAX parsing

I have an XML document that contains attributes like the following: 我有一个XML文档,其中包含如下属性:

<Tag Body="&lt;p&gt;">

I want to preserve the text in the Body attribute exactly as-is; 我想按原样保留Body属性中的文本; however, the parsing method is converting the text to "<p>". 但是,解析方法将文本转换为“ <p>”。 I want to keep the "&", "l", "t", ";", etc. 我要保留“&”,“ l”,“ t”,“;”等。

I'm using the Java SAX API to parse the XML document like so: 我正在使用Java SAX API来解析XML文档,如下所示:

    SAXParserFactory spf = SAXParserFactory.newInstance();
    SAXParser saxParser = spf.newSAXParser();
    XMLReader xmlReader = saxParser.getXMLReader();
    xmlReader.setContentHandler(new MyHandler());
    xmlReader.setErrorHandler(new MyErrorHandler(System.err));
    xmlReader.parse(convertToFileURL(myFileName));

The relevant code in MyHandler.java is: MyHandler.java的相关代码是:

public void startElement(String namespaceURI, String localName, String qName, Attributes atts)
throws SAXException
{
    if (qName.equals("Tag")){
        String Body = atts.getValue("Body");
        char []s = Body.toCharArray();  // s[0] will be "<", but I want it to be "&"
    }
}

How can I get the parsing method to leave the attribute text alone and not try to convert anything? 如何获得解析方法以不使用属性文本,而不尝试转换任何内容?

I'll answer my own question. 我会回答我自己的问题。

I didn't find a way to stop the parser from unescaping the text to begin with, but I did find a workaround (thatnks @user1516873) to re-escape it afterwards using Apache Commons: 我没有找到阻止解析器开始对文本进行转义的方法,但是我确实找到了一种变通方法(thatnks @ user1516873),之后使用Apache Commons重新转义它:

String Body = atts.getValue("Body");
String Body_escaped = StringEscapeUtils.escapeXml(Body);

This achieves the desired results. 这样可以达到预期的效果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM