简体   繁体   中英

Java. Moving in a xml with the same tag name as child

The problem I have is that I have to work with an xml file the providers of the company I work for sent to me.
This would not be a problem if the xml was well constructed but it is not at all.

<catalog>
    <product>
        <ref>4780</ref>
             .
             .
             .
        <arrivals>
            <product>
                <image title="AMARILLO">AMA</image>
                <size>S/T </size>
            </product>
            <product>
                <image title="AZUL">AZUL</image>
                <size>S/T </size>
            </product>
        </arrivals>
    </product>
</catalog>

As you can see, the tag <product> have all the information of the product but there are more tags named <product> to distinguish when there are different colors.
This is the code I use to move in the xml.

doc = db.parse("filename.xml");
Element esproducte = (Element)doc.getElementsByTagName("product").item(0);

NodeList nArrv = esproducte.getElementsByTagName("arrivals");
Element eArrv = (Element) nArrv.item(0);
NodeList eProds = eArrv.getElementsByTagName("product");//THIS THING

for(int l=0; l<eProds.getLength(); l++)
{
Node ln = eProds.item(l);
if (ln.getNodeType() == Node.ELEMENT_NODE)
{
    Element le = (Element) ln;

    //COLORS / IMAGES / CONFIGS
    NodeList nimgcol = le.getElementsByTagName("image");
    Element eimgcol = (Element) nimgcol.item(0);
    System.out.println("Name of the color " + eimgcol.getTextContent());
}

What happens is that the print is reapeated more times it should and I think it's because of the parent <product> . I thought it shouldn't happen because where I wrote //THIS THING I take into account the fact that <product> is set in <arrivals> . But it is not working.
What should I modify in the code to move only 2 times in the for and not 3, which is what happen in this case?

NodeList eProds = eArrv.getElementsByTagName("product");//THIS THING

to

NodeList eProds = eArrv.getChildNodes();//THIS THING

And the rest exactly the same. Works perfect.

It is perfectly valid to have tags inside different parent elements that are named the same, but have different content/meaning, as is the case in your example.

An element whose path is /catalog/product is entirely different from an element whose path is /catalog/product/arrivals/product . As an example, both XPath and XML Schema will consider them distinct.

It is only lazily written code that cannot distinguish the difference, eg by using getElementsByTagName , which locates elements anywhere ("all descendants") regardless of the location (path).

When processing the DOM tree, do it in a structured fashion:

  • Iterate all child elements (not all descendants) of the root ( catalog ).
  • Depending on strictness, fail if elements are not named product .
  • For each element named product :
    • Iterate all child elements of the product element.
    • Process element by name, eg ref , arrivals .
    • If strict, fail if element name is unknown.
    • If element name is arrivals :
      • Iterate all child elements of the arrivals element.
      • If strict, fail if elements are not named product .
      • For each element named product :
        • Process element by name, eg image , size .
        • If strict, fail if element name is unknown.

As you can see, the place in your code that handles an element named product inside an element named catalog is different from the code that handles an element named product inside an element named arrivals .

getElementsByTagName give you all Tags with the name "product" that are inside that tag, including those "product" tags for colors. Try use getChildNodes and check the name of the Nodes instead

As Andreas mentioned there is nothing invalid about the document and the problem is using getElementsByTagName, which simply scans the entire document for any elements with that tag name, regardless of structure.

You can use XPath to simplify the traversal of specific elements.

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.*;
import java.io.IOException;
import java.io.StringReader;

public class XMLParsing {

    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException {
        String xml = "<catalog>\n" +
                "    <product>\n" +
                "        <ref>4780</ref>\n" +
                "             .\n" +
                "             .\n" +
                "             .\n" +
                "        <arrivals>\n" +
                "            <product>\n" +
                "                <image title=\"AMARILLO\">AMA</image>\n" +
                "                <size>S/T </size>\n" +
                "            </product>\n" +
                "            <product>\n" +
                "                <image title=\"AZUL\">AZUL</image>\n" +
                "                <size>S/T </size>\n" +
                "            </product>\n" +
                "        </arrivals>\n" +
                "    </product>\n" +
                "</catalog>\n";
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();

        Document document = builder.parse(new InputSource(new StringReader(xml)));
        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xPath = xPathFactory.newXPath();

        // get all products under "arrivals"
        XPathExpression expression = xPath.compile("/catalog/product/arrivals//product");

        NodeList nodes = (NodeList) expression.evaluate(document, XPathConstants.NODESET);
        for (int i = 0; i < nodes.getLength(); i++) {
            Node product = nodes.item(i);
            NodeList productChildren = product.getChildNodes();
            for (int j = 0; j < productChildren.getLength(); j++) {
                Node item = productChildren.item(j);
                if (item instanceof Element) {
                    Element element = (Element) item;
                    switch (element.getTagName()) {
                        case "image":
                            System.out.println("product image title : " + element.getAttribute("title"));
                            break;
                        case "size":
                            System.out.println("product size : " + element.getTextContent());
                            break;
                        default:
                            break;
                    }
                }
            }
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM