简体   繁体   中英

Characters comparison between two files

Let's imagine that we have 2 txt files in column format (space is a separator). 1st file:

Col1 Col2  Col3
1    1     1
Two  2     2
3    3     3
4    4     4

2nd:

Col1 Col2  Col3
1    1     One
2    2     2
Test 3     Test
4    4     4

Let's compare:

  • Row 1. Values in Col3 are different

  • Row 2. Values in Col1 are different

  • Row 3. Values in Col1, Col3 are different

  • Row 4. Values are equal

The question is how to get a list of all columns which differ (Col1 and Col3 in this particular case). I'm wondering is it possible to reach only using Linux tools like diff ?

I have found a wdiff tool for comparing files on a word per word basis. But I don't know how to use it to resolve my task.

wdiff file1.txt file2.txt

1    1 [-1
Two-]     {+One+}
2    2     {+2
Test+} 3 [-3 3-]     {+Test+}
4    4     4

Perl to the rescue!

paste file1 file2 \
| perl -lane '
    @cols = @F if 1 == $.;
    $F[$_] eq $F[$_ + @F/2] or ++$h{$_} for 0 .. $#F/2;
    END {print "@cols[keys %h]"}'

paste prints the files side by side:

1    1     1    1    1     One
Two  2     2    2    2     2
3    3     3    Test 3     Test
4    4     4    4    4     4

Perl then reads the lines, compares the first column with the fourth and so on, and remembers which columns were different. At the end, it shows the names of the columns where a difference was.

  • -l removes newlines from output, adds them to print
  • -n runs the code for each line of input
  • -a splits input on whitespace into the @F array
  • $. is the input line number. The first line populates the column names array @cols.
  • $#F is the last index of the array @F. So, we go over the indices of @F from 0 to the half, and for each we compare the columns, and if they are different, we store the index of the column in the %h hash.
  • END runs at the end. keys return the indices of the differing columns, so we map them to their names stored in the @cols array.

somewhat verbose and long-handed...

awk '
FNR==1 || !NF {next}
FNR==NR{
  for(i=1;i<=NF;i++)
     f1[FNR,i]=$i
  next
}
{
  for(i=1;i<=NF;i++)
    if ($i != f1[FNR,i])
     diff[i]
}
END {
  for (i in diff)
    print i
}
' file1 file2

similar to perl's approach by @choroba:

paste file1 file2 | \
   awk '
     {
         for(i=1;i<=NF/2;i++)
           if ($i != $(i+NF/2) )
             diff[i]
     }
     END {
       for (i in diff)
         print i
     }
   '

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM