[英]How to parse the XML with mixed nodes and text in Java?
I have a xml of the format - 我有一个xml格式-
<root>
<sentence>
first part of the text
<a id="interpolation_1"> </a>
second part of the text
<a id="interpolation_2"> </a>
</sentence>
</root>
Essentially, the <sentence>
tag represents a sentence and the child tags <a>
are the interpolated parts in the sentence. 本质上,
<sentence>
标签表示一个句子,而子标签<a>
是句子中的内插部分。
The XPath expression String sentence = xPath.evaluate("sentence", transUnitElement);
XPath表达式
String sentence = xPath.evaluate("sentence", transUnitElement);
gives the text as - first part of the text second part of the text
ie it omits the interpolation. 给出文本为-文本的
first part of the text second part of the text
即省略插值。
The XPath expression - XPath表达式-
NodeList aList = (NodeList) xPath.evaluate("/sentence/a", transUnitElement, XPathConstants.NODESET);
gives the list of the <a>
elements. 给出
<a>
元素的列表。
How can I parse them to get the text of the <sentence>
element as well as the <a>
element without losing the order and positions of the <a>
element? 我怎样才能分析它们得到的文本
<sentence>
元素还有<a>
元素不失的顺序和位置<a>
元素?
The expected output - the first part of the sentence {interpolation_1} second part of the text {interpolation_2}
预期的输出-
the first part of the sentence {interpolation_1} second part of the text {interpolation_2}
The result you are looking for may be achieved by iterating on children nodes of sentence
and building the target string progressively. 您正在寻找的结果可以通过迭代
sentence
子节点并逐步构建目标字符串来实现。 For example: 例如:
// retrieve <sentence> as Node, not as text
Node sentence = (Node) xPath.evaluate("sentence", transUnitElement, XPathConstants.NODE);
StringBuilder resultBuilder = new StringBuilder();
NodeList children = sentence.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
Node child = children.item(i);
short nodeType = child.getNodeType();
switch (nodeType) {
case Node.TEXT_NODE:
String text = child.getTextContent().trim();
resultBuilder.append(text);
break;
case Node.ELEMENT_NODE:
String id = ((Element) child).getAttribute("id");
resultBuilder.append(" {").append(id).append("} ");
break;
default:
throw new IllegalStateException("Unexpected node type: " + nodeType);
}
}
// outputs "first part of the text {interpolation_1} second part of the text {interpolation_2}"
System.out.println(resultBuilder.toString());
Have you thought of doing this with a little XSLT transformation? 您是否想过通过一点XSLT转换来做到这一点? In XSLT 3.0 it's simply
在XSLT 3.0中,它只是
<xsl:template match="sentence">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="a">{<xsl:value-of select="."}</xsl:template>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.