简体   繁体   English

linux 上的校验和比较

[英]Checksums comparison on linux

I have two large files with 200k and 100k lines.我有两个 200k 和 100k 行的大文件。 The files contain two columns each: the checksum and the path from which it was taken.这些文件各包含两列:校验和和获取它的路径。 The first file contains paths, half of which are in the second file and half of which are not.第一个文件包含路径,其中一半在第二个文件中,一半不在。 My goal is to compare checksums of files along their path.我的目标是比较文件路径上的校验和。

I tried to use diff but it doesn't work correctly on all lines.我尝试使用 diff,但它不能在所有行上正常工作。 Then I wrote a script to compare file paths first, then checksums if the paths matched.然后我写了一个脚本首先比较文件路径,然后校验和是否匹配。 But with such a large number of lines, the script takes an incredibly long time to complete.但是由于行数如此之多,脚本需要非常长的时间才能完成。

#!/bin/bash

IFS=$'\n'
del=$' '

while read LineG
do
        while read LineA
        do
                if [ ${LineG#*$del} = ${LineA#*$del}  ]
                then
                        if [ ${LineG%%$del*} != ${LineA%%$del*}  ]; then
                                printf "%s\n%s\n\n" $LineG $LineA >> "./diff.txt"
                        fi
                        break
                fi
        done < $2
done < $1

How can I solve this problem?我怎么解决这个问题? How can the process be optimized to run faster?如何优化流程以更快地运行?

I would:我会:

# Join on filenames
join -j2 -o 1.1,2.1,1.2 <(sort -k2 file1) <(sort -k2 file2)
# Print filenames with mismatch checksum.
awk '$1 != $2{ print $3 }'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM