简体   繁体   中英

Finding the list of files in a directory which has subdirectories in a faster manner

There is a list of files(27000 in number). The objective is to search each of these files in a directory structure(which has multiple levels of sub-directories) and print the missing files. I have code with recursive function to search for the presence of the file. The code seems to be work but it is very slow for this particular scenario when the number of files to be searched is very high. Is there is anyway to increase the performance of this code.

Code snippet is below:

public static boolean walk(String path, String fileName) throws Exception {

    File root = new File(path);
    File[] list = root.listFiles();

    if (list == null)
        return false;

    for (File f : list) {
        if (f.isDirectory()) {
            walk(f.getAbsolutePath(), fileName);
        } else {
            if (f.getAbsoluteFile().getName().equalsIgnoreCase(fileName)) {
                presentFiles.add(f.getAbsoluteFile().getName());
                throw new Exception("hi");
            }
        }
    }
    return false;
}



public static void main(String[] args) {

    int i = 0;

    for (String fileName : attrSet) {//attrSet is HashSet of all the files which are being searched.
        try{
        boolean isFileFound = walk(source, fileName);
        }
        catch(Exception e) {
            System.out.println(e.getMessage() + i++);
        }
    }

    attrSet.removeAll(presentFiles); //presentFiles is HashSet of all files present in the directory

    for (String fileNm : attrSet) {
        System.out.println("FileName : " + fileNm);
    }

}

As already mentioned in a comment, turn the process around:

  1. Put the file names in the list into a hash set
  2. recursively traverse the directory structure once and while doing so remove all found files from the hash set
  3. the hash set now contains only the missing files.

This should take approximately the same time you need now for testing one file (if we don't take into account disk caching). The speedup is therefore almost a factor of 27000.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM