简体   繁体   中英

Read Complex Xml file in java

I am able to read many type of xml file in java. but today i got a xml file and not able to read its details.

<ENVELOPE>
    <BILLFIXED>
        <BILLDATE>1-Jul-2017</BILLDATE>
        <BILLREF>1</BILLREF>
        <BILLPARTY>Party1</BILLPARTY>
    </BILLFIXED>
    <BILLCL>-10800.00</BILLCL>
    <BILLPDC/>
    <BILLFINAL>-10800.00</BILLFINAL>
    <BILLDUE>1-Jul-2017</BILLDUE>
    <BILLOVERDUE>30</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>1-Jul-2017</BILLDATE>
        <BILLREF>2</BILLREF>
        <BILLPARTY>Party2</BILLPARTY>
    </BILLFIXED>
    <BILLCL>-2000.00</BILLCL>
    <BILLPDC/>
    <BILLFINAL>-2000.00</BILLFINAL>
    <BILLDUE>1-Jul-2017</BILLDUE>
    <BILLOVERDUE>30</BILLOVERDUE>
    <BILLFIXED>
        <BILLDATE>1-Jul-2017</BILLDATE>
        <BILLREF>3</BILLREF>
        <BILLPARTY>Party3</BILLPARTY>
    </BILLFIXED>
    <BILLCL>-1416.00</BILLCL>
    <BILLPDC/>
    <BILLFINAL>-1416.00</BILLFINAL>
    <BILLDUE>31-Jul-2017</BILLDUE>
    <BILLOVERDUE>0</BILLOVERDUE>
</ENVELOPE>

I am using this code for read xml file. I am able to read data inside <BILLFIXED> tag but not able to read data outside of this like < BILLFINAL> and <BILLDUE> etc.

try {
          File fXmlFile = new File("filepath");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(fXmlFile);
            
            doc.getDocumentElement().normalize();
            NodeList billNodeList = doc.getElementsByTagName("ENVELOPE");
            for(int i=0;i<billNodeList.getLength();i++){
                Node voucherNode = billNodeList.item(i);
                Element voucherElement = (Element) voucherNode;
                NodeList nList = voucherElement.getElementsByTagName("BILLFIXED");
                
                for (int temp = 0; temp < nList.getLength(); temp++) {
                    Node insideNode = nList.item(temp);
                    Element voucherElements = (Element) insideNode;
                    System.out.println(voucherElements.getElementsByTagName("BILLDATE").item(0).getTextContent());
                    System.out.println(voucherElements.getElementsByTagName("BILLREF").item(0).getTextContent());
                    System.out.println(voucherElements.getElementsByTagName("BILLPARTY").item(0).getTextContent());
                    System.out.println(voucherElements.getElementsByTagName("BILLFINAL").item(0).getTextContent());
                    System.out.println(voucherElements.getElementsByTagName("BILLOVERDUE").item(0).getTextContent());
                }
            }
            
            
    } catch (Exception e) {
        e.printStackTrace();
    }

I am try all possible way which i know that but currently i am not able to find any solution. If anyone have any solution please share with me.

One way to do it, is to "fix" the XML to be more well-structured, eg like this:

// Fix the XML
Element envelopeElem = doc.getDocumentElement();
List<Node> children = new ArrayList<>();
for (Node child = envelopeElem.getFirstChild(); child != null; child = child.getNextSibling())
    children.add(child);
Element billElem = null;
for (Node child : children) {
    if (child.getNodeType() == Node.ELEMENT_NODE && "BILLFIXED".equals(child.getNodeName()))
        envelopeElem.insertBefore(billElem = doc.createElement("BILL"), child);
    if (billElem != null)
        billElem.appendChild(child);
}

The code basically creates a new <BILL> element as a child of <ENVELOPE> whenever it encounters a <BILLFIXED> element, then moves all subsequent nodes into the <BILL> element.

The result is that the XML in the DOM tree looks like this 1 , which should be easier for you to process:

<ENVELOPE>
    <BILL>
        <BILLFIXED>
            <BILLDATE>1-Jul-2017</BILLDATE>
            <BILLREF>1</BILLREF>
            <BILLPARTY>Party1</BILLPARTY>
        </BILLFIXED>
        <BILLCL>-10800.00</BILLCL>
        <BILLPDC/>
        <BILLFINAL>-10800.00</BILLFINAL>
        <BILLDUE>1-Jul-2017</BILLDUE>
        <BILLOVERDUE>30</BILLOVERDUE>
    </BILL>
    <BILL>
        <BILLFIXED>
            <BILLDATE>1-Jul-2017</BILLDATE>
            <BILLREF>2</BILLREF>
            <BILLPARTY>Party2</BILLPARTY>
        </BILLFIXED>
        <BILLCL>-2000.00</BILLCL>
        <BILLPDC/>
        <BILLFINAL>-2000.00</BILLFINAL>
        <BILLDUE>1-Jul-2017</BILLDUE>
        <BILLOVERDUE>30</BILLOVERDUE>
    </BILL>
    <BILL>
        <BILLFIXED>
            <BILLDATE>1-Jul-2017</BILLDATE>
            <BILLREF>3</BILLREF>
            <BILLPARTY>Party3</BILLPARTY>
        </BILLFIXED>
        <BILLCL>-1416.00</BILLCL>
        <BILLPDC/>
        <BILLFINAL>-1416.00</BILLFINAL>
        <BILLDUE>31-Jul-2017</BILLDUE>
        <BILLOVERDUE>0</BILLOVERDUE>
    </BILL>
</ENVELOPE>

1) The XML has been reformatted for human readability, ie it has been re-indented.

It isn't well-structured XML. Inside your <envelope> tags there is nothing to indicate the start of each set of six attributes that constitute a 'bill'. You'd normally expect that each one would have a <bill> and </bill> tag to contain them. And this is going to confuse the parser...

As per sample XML, it has data for 3 records. But each record does not have any separation. Looks like each field data populated into XML tag and written into file.

There 2 possible option I would suggest

  1. JAVA based : As Andreas suggested, Read the file content and add a root tag for each record which would give finite XML structure then would be easier to handle. Performance impact may raise when the input file is in large size.
  2. Transformation based : Try STX transformation which would convert the structure to required format either XML or even flat file. Then processing would be simpler

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM