简体   繁体   中英

More efficient way to compare two huge directories and replace same files

F:/original_images , E:/resized_images

I have two HDDs containing a really lot of directories and image(jpg) files, each total size is about 1.5TB(original) and 400GB(resized) .

Each have same file names but different size(resized). Then I have to replace the resized ones with the originals. Unfortunately, each directory hierarchy is totally different.

I managed to do this job, but it takes a really really long time. I'm expecting few days to complete. It has two loops( Files.walkFileTree() ) that just searching for the matched from A to Z. Not smart at all.

public static void main(String[] args) throws IOException {
        FileWriter ostream = new FileWriter("result.txt");
        BufferedWriter out = new BufferedWriter(ostream);

        String fromDir = "F:/original_images";
        String toDir = "E:/resized_images";
        final Path source = Paths.get(fromDir);
        final Path target = Paths.get(toDir);

        Files.walkFileTree(source, EnumSet.of(FileVisitOption.FOLLOW_LINKS), Integer.MAX_VALUE,
                            new SimpleFileVisitor<Path>() {
            @Override
            public FileVisitResult visitFile(Path sourceFile,
                    BasicFileAttributes attrs) throws IOException {
                // if jpg (there are no jpeg)
                if(sourceFile.toString().toLowerCase().endsWith("jpg")) {

                    // search for the matching file                 
                    // start ** inner of [Files.walkFileTree()]
                    Files.walkFileTree(target, EnumSet.of(FileVisitOption.FOLLOW_LINKS), Integer.MAX_VALUE,
                            new SimpleFileVisitor<Path>() {
                        @Override
                        public FileVisitResult visitFile(Path Targetfile, BasicFileAttributes attrs) throws IOException {

                            if(sourceFile.getFileName().equals(Targetfile.getFileName())) {
                                out.write("replace : [" + sourceFile + "] -> [" + Targetfile + "]");
                                try {
                                    // copy..
                                    Files.copy(sourceFile, Targetfile, REPLACE_EXISTING);
                                }catch(Exception e) {
                                    out.write(e.toString());
                                }
                                // stop searching for this file.
                                return FileVisitResult.TERMINATE;
                            }else
                                return FileVisitResult.CONTINUE;
                        }
                    });
                    // end ** inner of [Files.walkFileTree()]
                }
                return FileVisitResult.CONTINUE;
            }
        });
        out.write("[completed folder] " + fromDir);
        out.close();
    }

I believe there must be the smarter way.

(My guess is storing the file names in indexed array, coz it's much faster to compare.)

How would you do this?

update (solved)

By adopting the idea of two answers, I finally came to do it.

The source code is too long to show, but the concise is :

  1. loop 'resized_images' and store files_info into the hashmap(key:file_name, value:full_Path).

  2. loop 'original_images' and store files_info into the hashmap(key:file_name, value:full_Path). I made each hashmap for each sub-directory for the efficiency.

  3. compare and replace each 'resized' and 'original' hashmap.

The result is much much faster than before. Most of the execution time is when copying files. Except that, it take less than 10 minutes.

The way I look at it, there are two sub-problems:

  1. Create a Map based on common criteria, that is file name eg "a.jpg"
  2. Based on file name replace the re-sized one in another directory

In the approach you have listed above, you are recursively iterating in your source directory, lets call that the outer loop. Then for each file in the source directory you are recursively iterating in the target directory, lets call that the inner loop. Thats an O(n2)(Read it as Big Oh of n square) approach.

Another simple approach could be, create two maps(Hashmap) with key being the file names. So you will have to recursively walk the two directories separately, ie in separately loops.

Then iterate over the smaller hashmap and then replace the resized images.

That would be an O(n) approach. As n keeps on growing you should see significant improvements in the time taken.

As Sanket Naik mentioned, create a Map for original images. I'm not sure how good your implementation is but you can easily edit this code in mkyong.

In the Map store image_name.jpg as key and its_directory as value. For example, img1.jpg is under F:/original_images/dir1/dir2/dir3/ , the corresponding entry should be img1.jpg, /dir1/dir2/dir3/ .

Then,

for each entry in resized image directory{
    value = map.get(entry);
    replaceImage(path/to/entry/+entry, value+entry);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM