简体   繁体   English

如果Java的另一个标签内还有其他同名标签,如何提取XML文件中的特定标签内容?

[英]How to extract specific tag content in XML file if there are other tags with the same name inside another tag in Java?

Currently, I'm working in parsing XML files in Java using DOM. 目前,我正在使用DOM解析Java中的XML文件。 But I have faced a problem in how to extract specific tag content from XML file if there are other tags with the same name inside another tag as the following scenario : 但是,如果在其他标签内有其他名称相同的其他标签,则我遇到了以下问题:如何从XML文件中提取特定的标签内容:

<file>
    <sub-file>
        <a> ....</a>
        <b> ....</b>
        <c> ....</c>
    </sub-file>

    <a> ..... some data here ....</a>
    <b> ..... some data here ....</b>
    <c> ..... some data here ....</c>

    <image>
        <a> ....</a>
        <b> ....</b>
        <c> ....</c>
    </image>
</file>

So how could I extract a,b,c tags that aren't inside another (inside sub-file or image)? 那么,如何提取不在另一个文件内(在子文件或图像内)的a,b,c标签呢? I tried so far this code: 到目前为止,我尝试了以下代码:

    File xmlfile=new File(path);
            factory = DocumentBuilderFactory.newInstance();
            builder=  factory.newDocumentBuilder();
            document= builder.parse(xmlfile);
            document.getDocumentElement().normalize();
            filelist= document.getElementsByTagName("file");
            for(int o=0;o<filelist.getLength();o++)
            {
                Node nNode = filelist.item(o);

                if (nNode.getNodeType() == Node.ELEMENT_NODE)
                {

                    Element element = (Element) nNode;
                        for (int a=0; a<element.getElementsByTagName("file").getLength(); a++)
                    {   

                            tagA=element.getElementsByTagName("a").item(a).getTextContent();

                            tagB=element.getElementsByTagName("b").item(a).getTextContent();

                            tagC=element.getElementsByTagName("c").item(a).getTextContent();

                    }       
                }
            }// loop
        }

This code print all the tags a,b,c 3 times (inside file, sub-file and image). 此代码将a,b,c的所有标签打印3次(在文件,子文件和图像中)。

Don't use getElementsByTagName() . 不要使用getElementsByTagName() Instead, navigate the DOM tree yourself: 而是自己浏览DOM树:

Node fileNode = filelist.item(o);
for (Node child = fileNode.getFirstChild(); child != null; child = child.getNextSibling()) {
    if (child.getNodeType() == Node.ELEMENT_NODE) {
        switch (child.getNodeName()) {
            case "a":
                tagA = child.getTextContent();
                break;
            case "b":
                tagB = child.getTextContent();
                break;
            case "c":
                tagC = child.getTextContent();
                break;
            default:
                // ignore
        }
    }
}

As an alternative, you can also look into using XPath: 另外,您也可以考虑使用XPath:

XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();

tagA = xpath.evaluate("a", fileNode);
tagB = xpath.evaluate("b", fileNode);
tagC = xpath.evaluate("c", fileNode);

Element.getElementsByTagName(String) returns all descendant nodes with with the provided tag name, not just the immediate children. Element.getElementsByTagName(String)返回具有提供的标签名称的所有后代节点,而不仅仅是直接子节点。 You can navigate the tree by using getChildNodes() and iterating on the returned NodeList or using getFirstChild() and iterating using getNextSibling() . 您可以使用getChildNodes()并在返回的NodeList上进行迭代,或者使用getFirstChild()并使用getNextSibling()迭代来导航树。

If you are not limited to using just DOM, you can also use XPath to select the appropriate nodes, ie //file/a . 如果您不仅限于使用DOM,还可以使用XPath选择适当的节点,即//file/a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM