简体   繁体   中英

Disable automatic ampersand escaping in XML?

Consider:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.newDocument();

Element root = doc.createElement("list");
doc.appendChild(root);

for(CorrectionEntry correction : dictionary){
    Element elem = doc.createElement("elem");
    elem.setAttribute("from", correction.getEscapedFrom());
    elem.setAttribute("to", correction.getEscapedTo());
    root.appendChild(elem);
}

(then follows the writing of the document into an XML file)

where getEscapedFrom and getEscapedTo return (in my code) something like finké if the originating word is finké . So as to perform a Unicode escape for the characters that are bigger than 127.

The problem is that the final XML has the following line <elem from="finke" to="fink&amp;#xE9;" /> <elem from="finke" to="fink&amp;#xE9;" /> ( from is finke , to is finké ) where I would like it to be <elem from="finke" to="fink&#xE9;" /> <elem from="finke" to="fink&#xE9;" />

I've tried, following another response in StackOverflow, to disable escaping of ampersands putting the line doc.appendChild(doc.createProcessingInstruction(StreamResult.PI_DISABLE_OUTPUT_ESCAPING, "&")); after the creation of the doc but without success.

How could I "tell XML" to not escape ampersands? Or, conversely, how could I let "XML" to convert from é , or \\\é , to &#xE9; ?

Update

I managed to come to the problem: up until the writing of the file the node (through debug) seems to contain the right string. Once I call transformer.transform(domSource, streamResult); everything goes wild.

DOMSource domSource = new DOMSource(doc);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(baos);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(domSource, streamResult);
System.out.println(baos.toString());

The problem seems to be the transformer.

Try setting setOutputProperty("encoding", "us-ascii") on the transformer. That tells the serializer to produce the output using ASCII characters only, which means any non-ASCII character will be escaped. But you can't control whether it will be a decimal or hex escape (unless you use Saxon-PE or higher as your Transformer , in which case there's a serialization option to control this).

It's never a good idea to try to do the serialization "by hand". For at least three reasons: (a) you'll get it wrong (we see a lot of SO questions caused by people producing bad XML this way), (b) you should be working with the tools, not against them, (c) the people who wrote the serializers understand XML better than you do, and they know what's expected of them. You're probably working to requirements written by someone whose understanding of XML is very superficial.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM