[英]Checksums comparison on linux
I have two large files with 200k and 100k lines.我有两个 200k 和 100k 行的大文件。 The files contain two columns each: the checksum and the path from which it was taken.
这些文件各包含两列:校验和和获取它的路径。 The first file contains paths, half of which are in the second file and half of which are not.
第一个文件包含路径,其中一半在第二个文件中,一半不在。 My goal is to compare checksums of files along their path.
我的目标是比较文件路径上的校验和。
I tried to use diff but it doesn't work correctly on all lines.我尝试使用 diff,但它不能在所有行上正常工作。 Then I wrote a script to compare file paths first, then checksums if the paths matched.
然后我写了一个脚本首先比较文件路径,然后校验和是否匹配。 But with such a large number of lines, the script takes an incredibly long time to complete.
但是由于行数如此之多,脚本需要非常长的时间才能完成。
#!/bin/bash
IFS=$'\n'
del=$' '
while read LineG
do
while read LineA
do
if [ ${LineG#*$del} = ${LineA#*$del} ]
then
if [ ${LineG%%$del*} != ${LineA%%$del*} ]; then
printf "%s\n%s\n\n" $LineG $LineA >> "./diff.txt"
fi
break
fi
done < $2
done < $1
How can I solve this problem?我怎么解决这个问题? How can the process be optimized to run faster?
如何优化流程以更快地运行?
I would:我会:
# Join on filenames
join -j2 -o 1.1,2.1,1.2 <(sort -k2 file1) <(sort -k2 file2)
# Print filenames with mismatch checksum.
awk '$1 != $2{ print $3 }'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.