简体   繁体   English

解析嵌套标签XML Java时遇到困难

[英]Having difficulty parsing nested tags xml java

I am parsing definitions from a dictionary api. 我正在从字典API解析定义。 I have this line of xml 我有这行xml

<dt>:any of a small genus (<it>Apteryx</it>) of flightless New Zealand birds with rudimentary wings, stout legs, a long bill, and grayish brown hairlike plumage</dt>

How would i get the full line of the dt element. 我将如何获得dt元素的完整行。 My problem is that it doesn't work when it gets up to this part (Apteryx) because there are additional tags in the element. 我的问题是,当它到达此部分(Apteryx)时它不起作用,因为该元素中还有其他标签。 How would i get the whole dt element as one whole string. 我如何将整个dt元素作为一个完整的字符串。 Here is my current code. 这是我当前的代码。

Element def = (Element) element.getElementsByTagName("def").item(0);
System.out.println(getValue("dt",def).replaceAll("[^\\p{L}\\p{N} ]", ""));

Where def is the element that holds the dt element. 其中def是保存dt元素的元素。

And here is my getValue code 这是我的getValue代码

private static String getValue(String tag, Element element)
{
    NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
    Node node = (Node) nodes.item(0);
    return node.getNodeValue();
}

Sometimes there are multiple nested tags within the dt element 有时dt元素内有多个嵌套标签

Mixing https://stackoverflow.com/a/5948326/145757 and Get a node's inner XML as String in Java DOM we get: 混合https://stackoverflow.com/a/5948326/145757在Java DOM中获取节点的内部XML作为String,我们得到:

public static String getInnerXml(Node node)
{
    DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    lsSerializer.getDomConfig().setParameter("xml-declaration", false);
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++)
    {
       sb.append(lsSerializer.writeToString(childNodes.item(i)));
    }
    return sb.toString(); 
}

adding my comments this gives: 添加我的评论,这给出了:

getInnerXml(document.getElementsByTagName("dt").item(0));

With result: 结果:

:any of a small genus (<it>Apteryx</it>) of flightless New Zealand birds...

Hope this helps... 希望这可以帮助...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM