简体   繁体   中英

How to extract specific tag content in XML file if there are other tags with the same name inside another tag in Java?

Currently, I'm working in parsing XML files in Java using DOM. But I have faced a problem in how to extract specific tag content from XML file if there are other tags with the same name inside another tag as the following scenario :

<file>
    <sub-file>
        <a> ....</a>
        <b> ....</b>
        <c> ....</c>
    </sub-file>

    <a> ..... some data here ....</a>
    <b> ..... some data here ....</b>
    <c> ..... some data here ....</c>

    <image>
        <a> ....</a>
        <b> ....</b>
        <c> ....</c>
    </image>
</file>

So how could I extract a,b,c tags that aren't inside another (inside sub-file or image)? I tried so far this code:

    File xmlfile=new File(path);
            factory = DocumentBuilderFactory.newInstance();
            builder=  factory.newDocumentBuilder();
            document= builder.parse(xmlfile);
            document.getDocumentElement().normalize();
            filelist= document.getElementsByTagName("file");
            for(int o=0;o<filelist.getLength();o++)
            {
                Node nNode = filelist.item(o);

                if (nNode.getNodeType() == Node.ELEMENT_NODE)
                {

                    Element element = (Element) nNode;
                        for (int a=0; a<element.getElementsByTagName("file").getLength(); a++)
                    {   

                            tagA=element.getElementsByTagName("a").item(a).getTextContent();

                            tagB=element.getElementsByTagName("b").item(a).getTextContent();

                            tagC=element.getElementsByTagName("c").item(a).getTextContent();

                    }       
                }
            }// loop
        }

This code print all the tags a,b,c 3 times (inside file, sub-file and image).

Don't use getElementsByTagName() . Instead, navigate the DOM tree yourself:

Node fileNode = filelist.item(o);
for (Node child = fileNode.getFirstChild(); child != null; child = child.getNextSibling()) {
    if (child.getNodeType() == Node.ELEMENT_NODE) {
        switch (child.getNodeName()) {
            case "a":
                tagA = child.getTextContent();
                break;
            case "b":
                tagB = child.getTextContent();
                break;
            case "c":
                tagC = child.getTextContent();
                break;
            default:
                // ignore
        }
    }
}

As an alternative, you can also look into using XPath:

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();

tagA = xpath.evaluate("a", fileNode);
tagB = xpath.evaluate("b", fileNode);
tagC = xpath.evaluate("c", fileNode);

Element.getElementsByTagName(String) returns all descendant nodes with with the provided tag name, not just the immediate children. You can navigate the tree by using getChildNodes() and iterating on the returned NodeList or using getFirstChild() and iterating using getNextSibling() .

If you are not limited to using just DOM, you can also use XPath to select the appropriate nodes, ie //file/a .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM