简体   繁体   English

Java。 使用与子标签相同的标签名移动xml

[英]Java. Moving in a xml with the same tag name as child

The problem I have is that I have to work with an xml file the providers of the company I work for sent to me. 我的问题是我必须使用一个XML文件,将我工作的公司的提供者发送给我。
This would not be a problem if the xml was well constructed but it is not at all. 如果xml构造良好,但根本没有问题,这将不是问题。

<catalog>
    <product>
        <ref>4780</ref>
             .
             .
             .
        <arrivals>
            <product>
                <image title="AMARILLO">AMA</image>
                <size>S/T </size>
            </product>
            <product>
                <image title="AZUL">AZUL</image>
                <size>S/T </size>
            </product>
        </arrivals>
    </product>
</catalog>

As you can see, the tag <product> have all the information of the product but there are more tags named <product> to distinguish when there are different colors. 如您所见,标签<product>具有<product>所有信息,但是还有更多名为<product>标签可以区分不同的颜色。
This is the code I use to move in the xml. 这是我用来在xml中移动的代码。

doc = db.parse("filename.xml");
Element esproducte = (Element)doc.getElementsByTagName("product").item(0);

NodeList nArrv = esproducte.getElementsByTagName("arrivals");
Element eArrv = (Element) nArrv.item(0);
NodeList eProds = eArrv.getElementsByTagName("product");//THIS THING

for(int l=0; l<eProds.getLength(); l++)
{
Node ln = eProds.item(l);
if (ln.getNodeType() == Node.ELEMENT_NODE)
{
    Element le = (Element) ln;

    //COLORS / IMAGES / CONFIGS
    NodeList nimgcol = le.getElementsByTagName("image");
    Element eimgcol = (Element) nimgcol.item(0);
    System.out.println("Name of the color " + eimgcol.getTextContent());
}

What happens is that the print is reapeated more times it should and I think it's because of the parent <product> . 发生的结果是,该印刷品被多次翻倍,我认为这是由于其父<product> I thought it shouldn't happen because where I wrote //THIS THING I take into account the fact that <product> is set in <arrivals> . 我认为这不应该发生,因为我在//THIS THING编写//THIS THING地方考虑到<product><arrivals>设置的事实。 But it is not working. 但这是行不通的。
What should I modify in the code to move only 2 times in the for and not 3, which is what happen in this case? 我应该在代码中修改什么,以便在for中仅移动2次,而不是3次,在这种情况下会发生什么?

Solution: 解:

NodeList eProds = eArrv.getElementsByTagName("product");//THIS THING

to

NodeList eProds = eArrv.getChildNodes();//THIS THING

And the rest exactly the same. 和其余的完全一样。 Works perfect. 完美的作品。

It is perfectly valid to have tags inside different parent elements that are named the same, but have different content/meaning, as is the case in your example. 就像在示例中一样,在不同的父元素中命名相同但具有不同内容/含义的标签是完全有效的

An element whose path is /catalog/product is entirely different from an element whose path is /catalog/product/arrivals/product . 路径为/catalog/product的元素与路径为/catalog/product/arrivals/product的元素完全不同。 As an example, both XPath and XML Schema will consider them distinct. 例如, XPathXML Schema都将它们视为不同的。

It is only lazily written code that cannot distinguish the difference, eg by using getElementsByTagName , which locates elements anywhere ("all descendants") regardless of the location (path). 只是懒散编写的代码无法区分差异,例如,通过使用getElementsByTagName可以将元素定位在任何位置(“所有后代”),而与位置(路径)无关。

When processing the DOM tree, do it in a structured fashion: 处理DOM树时,请以结构化方式进行:

  • Iterate all child elements (not all descendants) of the root ( catalog ). 迭代根( catalog )的所有子元素(不是所有后代)。
  • Depending on strictness, fail if elements are not named product . 根据严格性,如果元素未命名为product ,则失败。
  • For each element named product : 对于每个名为product元素:
    • Iterate all child elements of the product element. 迭代product元素的所有子元素。
    • Process element by name, eg ref , arrivals . 按名称表示的处理元素,例如refarrivals
    • If strict, fail if element name is unknown. 如果严格,则失败,如果元素名称未知。
    • If element name is arrivals : 如果元素名称是arrivals
      • Iterate all child elements of the arrivals element. 迭代arrivals元素的所有子元素。
      • If strict, fail if elements are not named product . 如果严格,则如果元素未命名为product ,则失败。
      • For each element named product : 对于每个名为product元素:
        • Process element by name, eg image , size . 按名称处理元素,例如imagesize
        • If strict, fail if element name is unknown. 如果严格,则失败,如果元素名称未知。

As you can see, the place in your code that handles an element named product inside an element named catalog is different from the code that handles an element named product inside an element named arrivals . 如您所见,在代码中处理名称为catalog的元素中名为product的元素的代码与处理名称为arrivals元素中的product的代码不同。

getElementsByTagName give you all Tags with the name "product" that are inside that tag, including those "product" tags for colors. getElementsByTagName为您提供该标签内所有名称为“ product”的标签,包括颜色的“ product”标签。 Try use getChildNodes and check the name of the Nodes instead 尝试使用getChildNodes并检查节点的名称

As Andreas mentioned there is nothing invalid about the document and the problem is using getElementsByTagName, which simply scans the entire document for any elements with that tag name, regardless of structure. 正如Andreas提到的那样,文档没有任何问题,而问题在于使用了getElementsByTagName,它可以简单地在整个文档中扫描具有该标签名称的任何元素,而无论其结构如何。

You can use XPath to simplify the traversal of specific elements. 您可以使用XPath简化特定元素的遍历。

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.*;
import java.io.IOException;
import java.io.StringReader;

public class XMLParsing {

    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException {
        String xml = "<catalog>\n" +
                "    <product>\n" +
                "        <ref>4780</ref>\n" +
                "             .\n" +
                "             .\n" +
                "             .\n" +
                "        <arrivals>\n" +
                "            <product>\n" +
                "                <image title=\"AMARILLO\">AMA</image>\n" +
                "                <size>S/T </size>\n" +
                "            </product>\n" +
                "            <product>\n" +
                "                <image title=\"AZUL\">AZUL</image>\n" +
                "                <size>S/T </size>\n" +
                "            </product>\n" +
                "        </arrivals>\n" +
                "    </product>\n" +
                "</catalog>\n";
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();

        Document document = builder.parse(new InputSource(new StringReader(xml)));
        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xPath = xPathFactory.newXPath();

        // get all products under "arrivals"
        XPathExpression expression = xPath.compile("/catalog/product/arrivals//product");

        NodeList nodes = (NodeList) expression.evaluate(document, XPathConstants.NODESET);
        for (int i = 0; i < nodes.getLength(); i++) {
            Node product = nodes.item(i);
            NodeList productChildren = product.getChildNodes();
            for (int j = 0; j < productChildren.getLength(); j++) {
                Node item = productChildren.item(j);
                if (item instanceof Element) {
                    Element element = (Element) item;
                    switch (element.getTagName()) {
                        case "image":
                            System.out.println("product image title : " + element.getAttribute("title"));
                            break;
                        case "size":
                            System.out.println("product size : " + element.getTextContent());
                            break;
                        default:
                            break;
                    }
                }
            }
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM