简体   繁体   English

如何在Java中解析带有混合节点和文本的XML?

[英]How to parse the XML with mixed nodes and text in Java?

I have a xml of the format - 我有一个xml格式-

<root>
      <sentence>
           first part of the text 

           <a id="interpolation_1"> </a>

           second part of the text

           <a id="interpolation_2"> </a>
      </sentence>
</root>

Essentially, the <sentence> tag represents a sentence and the child tags <a> are the interpolated parts in the sentence. 本质上, <sentence>标签表示一个句子,而子标签<a>是句子中的内插部分。

The XPath expression String sentence = xPath.evaluate("sentence", transUnitElement); XPath表达式String sentence = xPath.evaluate("sentence", transUnitElement); gives the text as - first part of the text second part of the text ie it omits the interpolation. 给出文本为-文本的first part of the text second part of the text即省略插值。

The XPath expression - XPath表达式-

NodeList aList = (NodeList) xPath.evaluate("/sentence/a", transUnitElement, XPathConstants.NODESET); gives the list of the <a> elements. 给出<a>元素的列表。

How can I parse them to get the text of the <sentence> element as well as the <a> element without losing the order and positions of the <a> element? 我怎样才能分析它们得到的文本<sentence>元素还有<a>元素不失的顺序和位置<a>元素?

The expected output - the first part of the sentence {interpolation_1} second part of the text {interpolation_2} 预期的输出- the first part of the sentence {interpolation_1} second part of the text {interpolation_2}

The result you are looking for may be achieved by iterating on children nodes of sentence and building the target string progressively. 您正在寻找的结果可以通过迭代sentence子节点并逐步构建目标字符串来实现。 For example: 例如:

// retrieve <sentence> as Node, not as text
Node sentence = (Node) xPath.evaluate("sentence", transUnitElement, XPathConstants.NODE);

StringBuilder resultBuilder = new StringBuilder();
NodeList children = sentence.getChildNodes();

for (int i = 0; i < children.getLength(); i++) {
  Node child = children.item(i);
  short nodeType = child.getNodeType();
  switch (nodeType) {
    case Node.TEXT_NODE:
      String text = child.getTextContent().trim();
      resultBuilder.append(text);
      break;
    case Node.ELEMENT_NODE:
      String id = ((Element) child).getAttribute("id");
      resultBuilder.append(" {").append(id).append("} ");
      break;
    default:
      throw new IllegalStateException("Unexpected node type: " + nodeType);
  }
}
// outputs "first part of the text {interpolation_1} second part of the text {interpolation_2}"
System.out.println(resultBuilder.toString());

Have you thought of doing this with a little XSLT transformation? 您是否想过通过一点XSLT转换来做到这一点? In XSLT 3.0 it's simply 在XSLT 3.0中,它只是

<xsl:template match="sentence">
  <xsl:apply-templates/>
</xsl:template>
<xsl:template match="a">{<xsl:value-of select="."}</xsl:template>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM