简体   繁体   English

获取第一个节点的节点值

[英]Get the Node value for the first Node

I have the following XML:我有以下 XML:

<?xml version='1.0' ?>
<foo>A&gt;B</foo>

and just want to get the node value of start tag as A&gt;B , if we use getNodeValue it will convert it to A>B which is not needed.并且只想将开始标记的节点值获取为A&gt;B ,如果我们使用 getNodeValue ,它会将其转换为不需要的 A>B 。

Hence I decided to use the Transformer因此我决定使用 Transformer

        Document doc = getParsedDoc(abovexml);
        TransformerFactory tranFact = TransformerFactory.newInstance();
        Transformer transfor = tranFact.newTransformer();
        transfor.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        Source src = new DOMSource(node);
        StringWriter buffer = new StringWriter();
        Result dest = new StreamResult(buffer);
        transfor.transform(src, dest);
        String result = buffer.toString();

But this gives the following output as part of result as <foo>A&gt;B</foo>但这给出了以下 output 作为<foo>A&gt;B</foo>结果的一部分

It will be helpful if somebody could clarify, if there is an approach with which we can get A&gt;B without doing string manipulation from the above output ( <foo>A&gt;B</foo> )如果有人能澄清一下,如果有一种方法可以让我们在不从上述 output ( <foo>A&gt;B</foo> ) 中进行字符串操作的情况下获得A&gt;B将很有帮助

Since getNodeValue() is automatically decoding the the String.由于 getNodeValue() 会自动解码字符串。
You can use StringEscapeUtils from Apache Commons Lang to encode it again.您可以使用 Apache Commons Lang 中的 StringEscapeUtils 再次对其进行编码。

http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html
http://commons.apache.org/lang/ http://commons.apache.org/lang/

String nodeValue = StringEscapeUtils.escapeHtml(getNodeValue());

That would encode it into the format you want it to be in. It is not very performance friendly because you are applying encode for every node value.这会将其编码为您希望它采用的格式。它对性能不是很友好,因为您正在为每个节点值应用编码。

Actually getNodeValue() is not "converting" the string.实际上 getNodeValue() 不是“转换”字符串。 When the XML is parsed from a file, or produced by a transformation, the resulting information model is that the string is A>B , not A&gt;B .当从文件中解析 XML 或通过转换生成时,得到的信息 model 是字符串A>B ,而不是A&gt;B The latter is just a serialization form.后者只是一种序列化形式。

Another legitimate serialization form is A>B (because right angle bracket does not need to be escaped in most cases ).另一种合法的序列化形式是A>B (因为在大多数情况下不需要转义右尖括号)。 However, there may be compatibility reasons for wanting to produce A&gt;B , especially if your output is intended to be HTML (though you didn't mention that).但是,想要生产A&gt;B可能存在兼容性原因,特别是如果您的 output 打算成为 HTML (尽管您没有提到)。

If you have a good reason for escaping the > , then I agree with @kensen john's answer for getting that done.如果您对 escaping 有充分的理由> ,那么我同意@kensen john 的回答。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM