linux 上的校验和比较

Question

I have two large files with 200k and 100k lines.我有两个 200k 和 100k 行的大文件。 The files contain two columns each: the checksum and the path from which it was taken.这些文件各包含两列：校验和和获取它的路径。 The first file contains paths, half of which are in the second file and half of which are not.第一个文件包含路径，其中一半在第二个文件中，一半不在。 My goal is to compare checksums of files along their path.我的目标是比较文件路径上的校验和。

I tried to use diff but it doesn't work correctly on all lines.我尝试使用 diff，但它不能在所有行上正常工作。 Then I wrote a script to compare file paths first, then checksums if the paths matched.然后我写了一个脚本首先比较文件路径，然后校验和是否匹配。 But with such a large number of lines, the script takes an incredibly long time to complete.但是由于行数如此之多，脚本需要非常长的时间才能完成。

#!/bin/bash

IFS=$'\n'
del=$' '

while read LineG
do
        while read LineA
        do
                if [ ${LineG#*$del} = ${LineA#*$del}  ]
                then
                        if [ ${LineG%%$del*} != ${LineA%%$del*}  ]; then
                                printf "%s\n%s\n\n" $LineG $LineA >> "./diff.txt"
                        fi
                        break
                fi
        done < $2
done < $1

How can I solve this problem?我怎么解决这个问题？ How can the process be optimized to run faster?如何优化流程以更快地运行？

Answer 1

I would:我会：

# Join on filenames
join -j2 -o 1.1,2.1,1.2 <(sort -k2 file1) <(sort -k2 file2)
# Print filenames with mismatch checksum.
awk '$1 != $2{ print $3 }'

linux 上的校验和比较

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-15 10:02:12

linux 上的校验和比较

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-15 10:02:12

解决方案1
1 已采纳 2022-08-15 10:02:12