简体   繁体   中英

How to traverse all the XML files in a directory and subdirectory to read a particular Element using java?

  • I have a Directory and it has a number of sub Directories.
  • It contains number of xml files and it may have the same file name but in different directories.
  • Now i want to read all the xml files to get the xml element and store in a array list.
  • But while parsing a xml file It throws an error as java.io.FileNotFoundException: \\BDOPS-4\\ORDERS\\CreateCLELE\\APRIL-2016\\28-04-2016\\8449066_1\\ItemFile\\1461809102571_4\\ftp\\content-providers\\ewh-e\\data\\incoming\\OBI00000000001818A\\OBI00000000001818\\00012092\\v103i5\\si540.dtd (The system cannot find the file specified)
  • The directory doesnt have the file (si540.dtd)it searches. can any one help me to solve this issue.below i have provided my code with stack trace.

Thanks in advance

package Read_XML;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.TrueFileFilter;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import DB_INFO.DOI;
import DB_INFO.Insert_Missing_DOI;
public class Read_DOI {
public static void read_XML_for_DOI(String root_path){

    System.out.println("Received Incoming path : "+root_path);

    File f = null;
    try {
        String root = root_path;
        f = new File(root);
        //shall accept all files in directories and subdirectories
        List<File> files = (List<File>) FileUtils.listFiles(f,    
 TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
        ArrayList<String> issn_valueLst = new ArrayList<>();
        for (File fXmlFile : files) {
            // prints filename and directory name
            if(accept(fXmlFile.getName(), ".xml")){
            DocumentBuilderFactory dbFactory  =   
 DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = 
dbFactory.newDocumentBuilder();           
            System.out.println("XML Name::"+fXmlFile.getName());
            Document doc = dBuilder.parse(fXmlFile);
            doc.getDocumentElement().normalize();
            System.out.println("Traversing File : "+fXmlFile.getName());
            System.out.println("Traversing path : 
 "+fXmlFile.getAbsolutePath());
            NodeList nList2=doc.getElementsByTagName("ce:doi");
            //NodeList nList2=doc.getElementsByTagName("DOI");

            if(nList2.getLength()>=1)
            {
                 for (int temp2 = 0; temp2 < nList2.getLength(); 
 temp2++) {
                     Node nNode4 = nList2.item(temp2);

                     if (nNode4.getNodeType() == Node.ELEMENT_NODE) 
                     {
                        Element eElement1 = (Element) nNode4;
                        issn_valueLst.add(eElement1.getTextContent());
                        //issn_valueLst.add(System.lineSeparator());
                   } 
                           }
              }
            }
        }

        System.out.println("DOI IN DB : "+DOI.DOI_values.toString());

        System.out.println("The DOI Values in INPUT XML :  
"+issn_valueLst.toString());
        System.out.println("Total number of DOI in INout XML : 
 "+issn_valueLst.size());

        //secondList.removeAll(firstList);

        issn_valueLst.removeAll(DOI.DOI_values);  


        if(issn_valueLst.size()>0)
        {
            System.out.println("\nThe Missing new DOI in the Input XML : 
 "+issn_valueLst.toString());
            Insert_Missing_DOI.insert_DOI(issn_valueLst.toString());
        }
        else
        {
            System.out.println("ALL DOI are available in input xml");
        }
        System.out.println();



       // }
    } 
    catch(FileNotFoundException fe)
    {

        System.out.println("File not found");
        fe.printStackTrace();
    }

    catch (Exception e) {
        // if any error occurs
        e.printStackTrace();
    }


}



  public static boolean accept( String name, String str) {
    return name.toLowerCase().endsWith(str.toLowerCase());
  }
 }

Stack trace :

java.io.FileNotFoundException: \\BDOPS-4\ORDERS\CreateCLELE\APRIL-2016\28-04-2016\8449066_1\ItemFile\1461809102571_4\ftp\content-providers\ewh-e\data\incoming\OBI00000000001818A\OBI00000000001818\00012092\v103i5\si540.dtd (The system cannot find the file specified)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
    at Read_XML.Read_DOI.read_XML_for_DOI(Read_DOI.java:49)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:29)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.findDirectory(find_incoming_dir.java:32)
    at Incoming_folder_path.find_incoming_dir.find(find_incoming_dir.java:15)
    at Extraction.ZipExtraction.extract(ZipExtraction.java:41)
    at execution_Point.Exact_orderpath.find_exact_path(Exact_orderpath.java:24)
    at execution_Point.Get_orderpath.getorderpath_from_orderinfo(Get_orderpath.java:53)
    at execution_Point.Get_order_from_marker.starter_pub(Get_order_from_marker.java:270)
    at execution_Point.Cl_Execute.main(Cl_Execute.java:47)

Most likely, the exception indicates you don't have permission to read the file. From the FileNotFoundException documentation :

This exception … will also be thrown … if the file does exist but for some reason is inaccessible….

The exception name doesn't make much sense for that type of error, does it? java.io.File is a very old class that hails from Java 1.0. If you want more useful feedback, use the modern replacement for File, the Path class:

Path f = Paths.get(root);
try (DirectoryStream<Path> dir = Files.newDirectoryStream(f, "*.xml")) {
    for (Path fXmlFile : dir) {
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();           
        System.out.println("XML Name::" + fXmlFile.getFileName());
        Document doc;
        try (InputStream stream = Files.newInputStream(fXmlFile)) {
            doc = dBuilder.parse(stream);
        }
        // etc.
    }
}

This is due to entity resolving issue.

The below code solves the issue by ignoring the entity resolving process.

dBuilder.setEntityResolver(new EntityResolver() {
    @Override
    public InputSource resolveEntity(String publicId, String systemId) {
        // it might be a good idea to insert a trace 
        // logging here that you are ignoring publicId/systemId
        return new InputSource(new StringReader("")); // Returns a valid dummy source
    }
});

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM