简体   繁体   中英

Having difficulty parsing nested tags xml java

I am parsing definitions from a dictionary api. I have this line of xml

<dt>:any of a small genus (<it>Apteryx</it>) of flightless New Zealand birds with rudimentary wings, stout legs, a long bill, and grayish brown hairlike plumage</dt>

How would i get the full line of the dt element. My problem is that it doesn't work when it gets up to this part (Apteryx) because there are additional tags in the element. How would i get the whole dt element as one whole string. Here is my current code.

Element def = (Element) element.getElementsByTagName("def").item(0);
System.out.println(getValue("dt",def).replaceAll("[^\\p{L}\\p{N} ]", ""));

Where def is the element that holds the dt element.

And here is my getValue code

private static String getValue(String tag, Element element)
    NodeList nodes = element.getElementsByTagName(tag).item(0).getChildNodes();
    Node node = (Node) nodes.item(0);
    return node.getNodeValue();

Sometimes there are multiple nested tags within the dt element

Mixing https://stackoverflow.com/a/5948326/145757 and Get a node's inner XML as String in Java DOM we get:

public static String getInnerXml(Node node)
    DOMImplementationLS lsImpl = (DOMImplementationLS)node.getOwnerDocument().getImplementation().getFeature("LS", "3.0");
    LSSerializer lsSerializer = lsImpl.createLSSerializer();
    lsSerializer.getDomConfig().setParameter("xml-declaration", false);
    NodeList childNodes = node.getChildNodes();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < childNodes.getLength(); i++)
    return sb.toString(); 

adding my comments this gives:


With result:

:any of a small genus (<it>Apteryx</it>) of flightless New Zealand birds...

Hope this helps...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM