Let's imagine that we have 2 txt files in column format (space is a separator). 1st file:
Col1 Col2 Col3
1 1 1
Two 2 2
3 3 3
4 4 4
2nd:
Col1 Col2 Col3
1 1 One
2 2 2
Test 3 Test
4 4 4
Let's compare:
Row 1. Values in Col3 are different
Row 2. Values in Col1 are different
Row 3. Values in Col1, Col3 are different
Row 4. Values are equal
The question is how to get a list of all columns which differ (Col1 and Col3 in this particular case). I'm wondering is it possible to reach only using Linux tools like diff ?
I have found a wdiff tool for comparing files on a word per word basis. But I don't know how to use it to resolve my task.
wdiff file1.txt file2.txt
1 1 [-1
Two-] {+One+}
2 2 {+2
Test+} 3 [-3 3-] {+Test+}
4 4 4
Perl to the rescue!
paste file1 file2 \
| perl -lane '
@cols = @F if 1 == $.;
$F[$_] eq $F[$_ + @F/2] or ++$h{$_} for 0 .. $#F/2;
END {print "@cols[keys %h]"}'
paste
prints the files side by side:
1 1 1 1 1 One
Two 2 2 2 2 2
3 3 3 Test 3 Test
4 4 4 4 4 4
Perl then reads the lines, compares the first column with the fourth and so on, and remembers which columns were different. At the end, it shows the names of the columns where a difference was.
-l
removes newlines from output, adds them to print
-n
runs the code for each line of input -a
splits input on whitespace into the @F array $.
is the input line number. The first line populates the column names array @cols.$#F
is the last index of the array @F. So, we go over the indices of @F from 0 to the half, and for each we compare the columns, and if they are different, we store the index of the column in the %h hash.END
runs at the end. keys return the indices of the differing columns, so we map them to their names stored in the @cols array.somewhat verbose and long-handed...
awk '
FNR==1 || !NF {next}
FNR==NR{
for(i=1;i<=NF;i++)
f1[FNR,i]=$i
next
}
{
for(i=1;i<=NF;i++)
if ($i != f1[FNR,i])
diff[i]
}
END {
for (i in diff)
print i
}
' file1 file2
similar to perl's approach by @choroba:
paste file1 file2 | \
awk '
{
for(i=1;i<=NF/2;i++)
if ($i != $(i+NF/2) )
diff[i]
}
END {
for (i in diff)
print i
}
'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.