简体   繁体   中英

Get the Node value for the first Node

I have the following XML:

<?xml version='1.0' ?>
<foo>A&gt;B</foo>

and just want to get the node value of start tag as A&gt;B , if we use getNodeValue it will convert it to A>B which is not needed.

Hence I decided to use the Transformer

        Document doc = getParsedDoc(abovexml);
        TransformerFactory tranFact = TransformerFactory.newInstance();
        Transformer transfor = tranFact.newTransformer();
        transfor.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        Source src = new DOMSource(node);
        StringWriter buffer = new StringWriter();
        Result dest = new StreamResult(buffer);
        transfor.transform(src, dest);
        String result = buffer.toString();

But this gives the following output as part of result as <foo>A&gt;B</foo>

It will be helpful if somebody could clarify, if there is an approach with which we can get A&gt;B without doing string manipulation from the above output ( <foo>A&gt;B</foo> )

Since getNodeValue() is automatically decoding the the String.
You can use StringEscapeUtils from Apache Commons Lang to encode it again.

http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
http://commons.apache.org/lang/

String nodeValue = StringEscapeUtils.escapeHtml(getNodeValue());

That would encode it into the format you want it to be in. It is not very performance friendly because you are applying encode for every node value.

Actually getNodeValue() is not "converting" the string. When the XML is parsed from a file, or produced by a transformation, the resulting information model is that the string is A>B , not A&gt;B . The latter is just a serialization form.

Another legitimate serialization form is A>B (because right angle bracket does not need to be escaped in most cases ). However, there may be compatibility reasons for wanting to produce A&gt;B , especially if your output is intended to be HTML (though you didn't mention that).

If you have a good reason for escaping the > , then I agree with @kensen john's answer for getting that done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM