使用 bash 脚本比较两个文件并打印详细的差异报告

Question

I have 2 large files on Unix system which have thousands of rows and about 80 columns each.我在 Unix 系统上有 2 个大文件，每个文件有数千行和大约 80 列。 I have sorted the files based on group of unique keys so that we compare the same rows always.我已经根据一组唯一键对文件进行了排序，以便我们始终比较相同的行。 To ease of understanding I am giving only 3 rows and 7 columns here.为了便于理解，我在这里只给出 3 行和 7 列。

File 1:文件 1：

d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC123" contract_id="ABC123" src_system_id="PRA" entity_cd="U0525"     
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC124" contract_id="ABC124" src_system_id="PRA" entity_cd="U0526"     
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC125" contract_id="ABC125" src_system_id="PRA" entity_cd="U0527"

File2:文件2：

d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC123" contract_id="ABC123" src_system_id="PRA" entity_cd="U0525"     
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC124" contract_id="ABC124" src_system_id="PRB" entity_cd="V0528"    
d_report_ref_date="2021-03-31" system_id="VTX" contract_id="1130" credit_line_cd="ABC125" contract_id="ABC125" src_system_id="PRA" entity_cd="U0530"

Expected Output:预期 Output：

Mismatch in row 2 : file1.src_system_id=PRA file2.src_system_id=PRB, file1.entity_cd=U0526 file2.entity_cd=V0528 

Mismatch in row 3 : file1.entity_cd=U0527 file2.entity_cd=U0530

Is it possible to achieve this using bash scripting?是否可以使用 bash 脚本来实现这一点？ I tried AWK which isn't giving me the desired output-我试过 AWK 没有给我想要的输出 -

paste -d' ' file1 file2| 
  awk -F' ' '{w=NF/2; 
              for(i=1;i<=w;i++) 
                 if($i!=$(i+w)) printf "%d %d %s %s", NR,i,$i,$(i+w); 
              print ""}'

Thanks in Advance !!!提前致谢！！！

Answer 1

Using any awk in any shell on every Unix box:在每个 Unix 盒子上的任何 shell 中使用任何 awk：

$ cat tst.awk
BEGIN { FS="[= ]" }
NR==FNR {
    for (i=1; i<NF; i+=2) {
        file1[NR,i] = $(i+1)
    }
    next
}
{
    msg = sep = ""
    for (i=1; i<NF; i+=2) {
        if ( $(i+1) != file1[FNR,i] ) {
            msg = msg sep " " ARGV[1] "." $i "=" file1[FNR,i] " " FILENAME "." $i "=" $(i+1)
            sep = ","
        }
    }
    if ( msg != "" ) {
        print "Mismatch in row " FNR " :" msg ORS
    }
}

$ awk -f tst.awk file1 file2
Mismatch in row 2 : file1.src_system_id="PRA" file2.src_system_id="PRB", file1.entity_cd="U0526" file2.entity_cd="V0528"

Mismatch in row 3 : file1.entity_cd="U0527" file2.entity_cd="U0530"

The above assumes:以上假设：

Your quoted strings cannot contain = or blanks您引用的字符串不能包含=或空格
Every tag present in a row of file1 is also present in the same row of file2存在于 file1 行中的每个标签也存在于 file2 的同一行中
The tags are always present in the same order in a given row标签始终以相同的顺序出现在给定的行中
You can have multiple duplicate tags in a given row您可以在给定行中有多个重复标签

Answer 2

Take a look at wdiff , something like this might work:看看wdiff ，这样的事情可能会起作用：

$ wdiff -w$'\e[31m' -x $'\e[0m' -y $'\e[32m' -z $'\e[0m' file1 file2

The options -wxyz is to define prefix and suffix for deletion and insertions respectively.选项-wxyz分别定义删除和插入的前缀和后缀。 In this case we do a naive attempt to color deletions red, and insertions green.在这种情况下，我们尝试将缺失部分涂成红色，将插入部分涂成绿色。

Answer 3

A bit late to the party, but you can do some kind of "nested-diff" where the first diff captures the different rows and put every column from them in a line of its own.聚会有点晚了，但是您可以做某种“嵌套差异”，其中第一个差异捕获不同的行并将其中的每一列放在自己的一行中。 Then, you do another diff to capture the exact different columns.然后，您执行另一个diff 以捕获完全不同的列。 It is pure bash, and uses only bash loops, grep , sed and diff .它是纯 bash，仅使用diff循环、 grep 、 sed .

$ for f in $(diff file1.txt file2.txt | grep -e "<"); do echo $f; done > left && for f in $(diff file1.txt file2.txt | grep -e ">"); do echo $f; done > right && diff -y left right | grep "|" | sed "s/\t>/-----/g"
<                                 |-----
src_system_id="PRA"                       | src_system_id="PRB"
entity_cd="U0526"                         | entity_cd="V0528"
<                                 |-----
entity_cd="U0527"                         | entity_cd="U0530"

使用 bash 脚本比较两个文件并打印详细的差异报告

问题描述

2 个解决方案

解决方案1
4 已采纳 2022-01-07 12:43:49

解决方案2
1 2022-01-07 12:36:24

解决方案3
0 2022-01-07 13:16:32

使用 bash 脚本比较两个文件并打印详细的差异报告

问题描述

2 个解决方案

解决方案1 4 已采纳 2022-01-07 12:43:49

解决方案2 1 2022-01-07 12:36:24

解决方案3 0 2022-01-07 13:16:32

解决方案1
4 已采纳 2022-01-07 12:43:49

解决方案2
1 2022-01-07 12:36:24

解决方案3
0 2022-01-07 13:16:32