简体   繁体   English

比较两个巨大目录并替换相同文件的更有效方法

[英]More efficient way to compare two huge directories and replace same files

F:/original_images , E:/resized_images F:/original_imagesE:/resized_images

I have two HDDs containing a really lot of directories and image(jpg) files, each total size is about 1.5TB(original) and 400GB(resized) . 我有两个硬盘,其中包含很多目录和image(jpg)文件,每个硬盘的总大小约为1.5TB(原始)和400GB(调整大小)

Each have same file names but different size(resized). 每个都有相同的文件名,但大小不同(调整大小)。 Then I have to replace the resized ones with the originals. 然后,我必须用原件替换调整后的尺寸。 Unfortunately, each directory hierarchy is totally different. 不幸的是, 每个目录层次结构完全不同。

I managed to do this job, but it takes a really really long time. 我设法完成了这项工作,但是确实需要很长时间。 I'm expecting few days to complete. 我希望几天能完成。 It has two loops( Files.walkFileTree() ) that just searching for the matched from A to Z. Not smart at all. 它有两个循环( Files.walkFileTree() ),它们仅搜索从A到Z的匹配项。一点都不聪明。

public static void main(String[] args) throws IOException {
        FileWriter ostream = new FileWriter("result.txt");
        BufferedWriter out = new BufferedWriter(ostream);

        String fromDir = "F:/original_images";
        String toDir = "E:/resized_images";
        final Path source = Paths.get(fromDir);
        final Path target = Paths.get(toDir);

        Files.walkFileTree(source, EnumSet.of(FileVisitOption.FOLLOW_LINKS), Integer.MAX_VALUE,
                            new SimpleFileVisitor<Path>() {
            @Override
            public FileVisitResult visitFile(Path sourceFile,
                    BasicFileAttributes attrs) throws IOException {
                // if jpg (there are no jpeg)
                if(sourceFile.toString().toLowerCase().endsWith("jpg")) {

                    // search for the matching file                 
                    // start ** inner of [Files.walkFileTree()]
                    Files.walkFileTree(target, EnumSet.of(FileVisitOption.FOLLOW_LINKS), Integer.MAX_VALUE,
                            new SimpleFileVisitor<Path>() {
                        @Override
                        public FileVisitResult visitFile(Path Targetfile, BasicFileAttributes attrs) throws IOException {

                            if(sourceFile.getFileName().equals(Targetfile.getFileName())) {
                                out.write("replace : [" + sourceFile + "] -> [" + Targetfile + "]");
                                try {
                                    // copy..
                                    Files.copy(sourceFile, Targetfile, REPLACE_EXISTING);
                                }catch(Exception e) {
                                    out.write(e.toString());
                                }
                                // stop searching for this file.
                                return FileVisitResult.TERMINATE;
                            }else
                                return FileVisitResult.CONTINUE;
                        }
                    });
                    // end ** inner of [Files.walkFileTree()]
                }
                return FileVisitResult.CONTINUE;
            }
        });
        out.write("[completed folder] " + fromDir);
        out.close();
    }

I believe there must be the smarter way. 我相信一定有更聪明的方法。

(My guess is storing the file names in indexed array, coz it's much faster to compare.) (我的猜测是将文件名存储在索引数组中,因为比较起来要快得多。)

How would you do this? 你会怎么做?

update (solved) 更新 (已解决)

By adopting the idea of two answers, I finally came to do it. 通过采用两个答案的想法,我终于做到了。

The source code is too long to show, but the concise is : 源代码太长,无法显示,但简洁的是:

  1. loop 'resized_images' and store files_info into the hashmap(key:file_name, value:full_Path). 循环'resized_images'并将files_info存储到hashmap(key:file_name,value:full_Path)中。

  2. loop 'original_images' and store files_info into the hashmap(key:file_name, value:full_Path). 循环'original_images'并将files_info存储到hashmap(key:file_name,value:full_Path)中。 I made each hashmap for each sub-directory for the efficiency. 为了提高效率,我为每个子目录制作了每个哈希图。

  3. compare and replace each 'resized' and 'original' hashmap. 比较并替换每个“调整大小”和“原始”哈希图。

The result is much much faster than before. 结果比以前快得多。 Most of the execution time is when copying files. 大多数执行时间是在复制文件时。 Except that, it take less than 10 minutes. 除此之外,它不到10分钟。

The way I look at it, there are two sub-problems: 我的观察方式有两个子问题:

  1. Create a Map based on common criteria, that is file name eg "a.jpg" 根据常见标准创建一个地图,即文件名,例如“ a.jpg”
  2. Based on file name replace the re-sized one in another directory 根据文件名替换大小调整后的另一个目录

In the approach you have listed above, you are recursively iterating in your source directory, lets call that the outer loop. 在上面列出的方法中,您在源目录中递归地进行迭代,我们称其为外循环。 Then for each file in the source directory you are recursively iterating in the target directory, lets call that the inner loop. 然后,对于源目录中的每个文件,您都在目标目录中进行递归迭代,让我们将其称为内部循环。 Thats an O(n2)(Read it as Big Oh of n square) approach. 多数民众赞成在O(n2)(读为n平方的大哦)方法。

Another simple approach could be, create two maps(Hashmap) with key being the file names. 另一种简单的方法可能是,创建两个映射(哈希映射),键为文件名。 So you will have to recursively walk the two directories separately, ie in separately loops. 因此,您将必须递归地分别遍历两个目录,即分别遍历循环。

Then iterate over the smaller hashmap and then replace the resized images. 然后遍历较小的哈希图,然后替换调整大小的图像。

That would be an O(n) approach. 那将是一种O(n)方法。 As n keeps on growing you should see significant improvements in the time taken. 随着n的不断增长,您应该看到所花费的时间有了显着的改善。

As Sanket Naik mentioned, create a Map for original images. 如Sanket Naik所述,为原始图像创建一个Map。 I'm not sure how good your implementation is but you can easily edit this code in mkyong. 我不确定您的实现效果如何,但是您可以在mkyong中轻松编辑代码。

In the Map store image_name.jpg as key and its_directory as value. Map存储中,将image_name.jpg作为key并将其its_directory作为值。 For example, img1.jpg is under F:/original_images/dir1/dir2/dir3/ , the corresponding entry should be img1.jpg, /dir1/dir2/dir3/ . 例如, img1.jpg位于F:/original_images/dir1/dir2/dir3/ ,相应的条目应为img1.jpg, /dir1/dir2/dir3/

Then, 然后,

for each entry in resized image directory{
    value = map.get(entry);
    replaceImage(path/to/entry/+entry, value+entry);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在两个或多个类之间共享同一接口实现的有效方法 - Efficient way to share the same interface implementation between two or more classes 有没有更有效的方式写入文件? - Is there a more efficient way to write to files? 是否有更有效的方法来比较If语句中的3个以上的项目? - Is there a more efficient way to compare 3+ items in an If statement? 以更有效的方式替换字符串中的一组子字符串? - Replace a set of substring in a string in more efficient way? 比较两组不同类型的有效方法 - Efficient way to compare two sets of different type 比较java中两个相似地图的有效方法 - Efficient way to compare two similar maps in java 比较两个布尔数组的最有效方法是什么? - What is the most efficient way to compare two arrays of booleans? 比较两个 HashMap 的键的最有效方法是什么? - What is the most efficient way to compare keys of two HashMaps? 比较两个字符串的有效方法(字符顺序无关紧要) - Efficient way to compare two strings (ordering of characters irrelevant) 更有效,更紧凑的方法是:大量的链表变量集或包含每个变量的二维数组列表? - What's more efficient and compact: A huge set of linkedlist variables or a two-dimensional arraylist containing each of these?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM