简体   繁体   English

比较2个文件夹和查找具有不同字节数的文件

[英]Compare 2 Folders and Find Files with Differing Byte Counts

Using Gnome in Linux Mint 12, I copied a Folder of about 9.7 GB (containing a complex tree of subfolders) from one NTFS Flash Drive to another NTFS Flash Drive. 在Linux Mint 12中使用Gnome,我将一个大约9.7 GB的文件夹(包含一个复杂的子文件夹树)从一个NTFS闪存驱动器复制到另一个NTFS闪存驱动器。 According to Gnome the file counts match, but according to du (and other programs) the byte counts don't match. 根据Gnome,文件计数匹配,但根据du(和其他程序),字节计数不匹配。 (I've had the same problem copying folders in other Linux distros and Windows XP.) (我在其他Linux发行版和Windows XP中复制文件夹时遇到了同样的问题。)

I only want to know which files don't have matching byte counts. 我只想知道哪些文件没有匹配的字节数。 (I don't want to compare the contents of each file, because that would take way too long.) What's the best, easiest and fastest way to find the byte-count-mismatched files? (我不想比较每个文件的内容,因为这会花费太长时间。)找到字节数不匹配文件的最佳,最简单和最快的方法是什么?

I would adapt the answer by @user1464130 as it has trouble handling spaces in file names. 我会调整@ user1464130的答案,因为它在处理文件名中的空格时遇到了麻烦。

cd dir1
find . -type f -printf "%p %s\n" | sort > ~/dir1.txt
cd dir2
find . -type f -printf "%p %s\n" | sort > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

If you want to launch a command on each file and use the result in the report, you can use the while Bash construct. 如果要在每个文件上启动命令并在报告中使用结果,则可以使用while Bash构造。 This example uses md5sum to compute a checksum for each file. 此示例使用md5sum计算每个文件的校验和。

find . -maxdepth 1 -type f -printf "%p %s\n" | while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done

Each $() is executed separately and allows us to compute the checksum for each file. 每个$()都是单独执行的,允许我们计算每个文件的校验和。 The use of tr squeezes every consecutive spaces into a single space and cut extracts the word in the n-th position, here in the first position. tr的使用将每个连续的空间挤压到一个空间中,并且cut在第n个位置提取单词,这里是第一个位置。 If we don't do that, we get the name of the file two times because md5sum give it back on stdout. 如果我们不这样做,我们得到文件的名称两次,因为md5sum在stdout上给它。

Here is an example without using the comparison (no diff ). 这是一个不使用比较(没有diff )的例子。 Note that I've used a dash - to emphasize the three datas we output about each file but it could be a problem if you want to feed it to another program. 请注意,我使用了破折号-强调我们输出的关于每个文件的三个数据,但如果您想将其提供给另一个程序则可能会出现问题。

$ find . -maxdepth 1 -name "*.c" -type f -printf "%p %s\n" |  while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done
./thread.c - 5f2b7b12c7cd12fcb9e9796078e5d15b - 584
./utils.c - d61bc1dbc72768e622a04f03e3b8f7a2 - 3413

EDIT : And to handle spaces in filenames and still get the checksum and the size, you can use the following code. 编辑 :并处理文件名中的空格,仍然得到校验和和大小,您可以使用以下代码。

$ find . -maxdepth 1 -name "*.c" -type f -print0 | xargs -0 -n 1 md5sum | while read checksum path; do echo $path $(stat --printf="%s" "$path") $checksum ; done
./ini tia li za tion.c 84 31626123e9056bac2e96b472bd62f309

Did you check if both partitions have the same attributes? 您是否检查了两个分区是否具有相同的属性? (block size, size, reserved space for deletions or bad blocks, etc.) (块大小,大小,删除或坏块的保留空间等)

For your specific case, I would recommend rsync with option -n (or --dry-run). 对于您的具体情况,我建议使用选项-n(或--dry-run)进行rsync It will tell you which files are different. 它会告诉你哪些文件是不同的。 That is: 那是:

$ rsync -I -n /source/ /target/

The option -I is to ignore times. 选项-I是忽略时间。 You can use the same command to make both directories equivalent (timestamp, permissions, etc.). 您可以使用相同的命令使两个目录等效(时间戳,权限等)。

Check the manual of rsync or try the option --help to get more options and examples on how to use it. 查看rsync手册或尝试选项--help以获取有关如何使用它的更多选项和示例。 It is very powerful. 它非常强大。

Assuming you need to compare dir1 and dir 2, here are the console commands: 假设你需要比较dir1和dir 2,这里是控制台命令:

cd dir1
find . -type f|sort|xargs ls -l| awk '{print $5,$8}' > ~/dir1.txt
cd dir2
find . -type f|sort|xargs ls -l| awk '{print $5,$8}' > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

You may need to edit awk parameters to make it print file length and path properly. 您可能需要编辑awk参数以使其正确打印文件长度和路径。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM