简体   繁体   中英

Java: JAXP unexpected parse value (parsing XML to List < List<String> >)

I have such XML file:

<?xml version="1.0" encoding="ISO-8859-2"?>
<some some1="string" some2="string">
<value1>string</value1>
<value2>string</value2>
<position1>
  <someval1>string</someval1>
  <someval2>string</someval2>
  <someval3>string</someval3>
  <someval4>string</someval4>
</position1>
<position2>
  <someval1>string</someval1>
  <someval2>string</someval2>
  <someval3>string</someval3>
  <someval4>string</someval4>
</position2>

And i wrote next code:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(Vars.LOCAL_PATH + fileName);
XPath xPath =  XPathFactory.newInstance().newXPath();
Element root = doc.getDocumentElement();
NodeList nl = root.getChildNodes();
ArrayList<String> tempData = new ArrayList<String>();

for (int i=0; i < nl.getLength() ; i++) {
    Node n = nl.item(i);
    if (n.getNodeType() == Node.ELEMENT_NODE) {
    NodeList current = n.getChildNodes();
    for (int j = 0; j < current.getLength(); j++) {
        tempData.add(current.item(j).getTextContent().trim());
        System.out.println(current.item(j).getTextContent().trim() + " - str to note every output line");
    }
    xmlData.add(tempData);
    tempData.clear();
    }
}

BUT the result is:

000/F/ZZZ/2001 - str to note every output line
2001-01-01 - str to note every output line
 - str to note every output line
USD - str to note every output line
 - str to note every output line
1 - str to note every output line
 - str to note every output line
EUR - str to note every output line
 - str to note every output line

Why there are blank lines? Whats wrong with my code? More, System.out.println( current.getLength() ) gives me 9, but why 9, there must be 4... Thanks.

In the second for loop you are looping through each node , and not detecting if it's an element node or not. You get 9 nodes because you count the 4 element nodes + the 5 text nodes (containing whitespace - tabs, spaces and newlines) before and after each <someval> element.

If you want to filter only the element nodes, then you need to test the type of current node in that loop as you did in the previous one:

for (int j = 0; j < current.getLength(); j++) {
    if (current.item(j).getNodeType() == Node.ELEMENT_NODE) { // add this!
        tempData.add(current.item(j).getTextContent().trim());
        System.out.println(current.item(j).getTextContent().trim() + " - str to note every output line");
    }
}

Now it will no longer print blank lines and the loop will iterate four times for each <position> element.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM