简体   繁体   English

一个shell脚本问题,用于比较两个文件之间的差异。 [Linux]

[英]A shell script problem for comparing difference between two files. [Linux]

Now my program generates two data files. 现在,我的程序生成两个数据文件。 a.txt and b.txt Take a.txt as an example, it's content just like this: a.txt和b.txt以a.txt为例,内容如下:

0,0
0,1
1,0
-3,1
1,-2
1,3
......

b.txt is similar with a.txt. b.txt与a.txt类似。

Now, I hope to find out difference lines count. 现在,我希望找出差异行数。 In other words, for example, if b.txt like this: 换句话说,例如,如果b.txt像这样:

0,0
1,1
1,2
-3,1
1,-2
1,3
......

a shell script output 2 as the 2nd and the 3rd lines are different with one number different. Shell脚本输出2作为第二行和第三行是不同的,但数字不同。 How to do this??? 这个怎么做???

I try diff command, however, I cannot get what I want... 我尝试使用diff命令,但是却无法获得所需的信息...

Need your kind help..Thanks. 需要您的帮助。

Addition: There are about 10,000 - 100,000 rows for each files. 另外:每个文件大约有10,000-100,000行。 Of course, they have same no. 当然,它们有相同的编号。 of rows at each time. 每次的行数。

diff a.txt b.txt | grep "<" | wc -l

Faced the same problem a while back. 不久前面对同样的问题。 What you need is diffstat. 您需要的是diffstat。 Diffstat is part of the GNU diff package and can summarizes diff results: Diffstat是GNU diff软件包的一部分,可以总结diff结果:

SYNOPSIS 概要

diffstat reads the output of diff and displays a histogram of the insertions, deletions, and modifications per-file. diffstat读取diff的输出,并显示每个文件的插入,删除和修改的直方图。 It is useful for reviewing large, complex patch files. 这对于检查大型的复杂补丁文件很有用。

You can also process the output of diffstat to get summarized results: 您还可以处理diffstat的输出以获取汇总结果:

diff -u FileA.txt FileB.txt | diff -u FileA.txt FileB.txt | diffstat -f0 | diffstat -f0 | grep -v files | grep -v文件| awk '{ print $3 }' awk'{print $ 3}'

Where -u is obligatory. 其中-u是必需的。 You can explore diffstat documentation for options. 您可以浏览diffstat文档中的选项。

diff seems to be exactly what you want. diff似乎正是您想要的。

#> diff a.txt b.txt
2,3c2,3
< 0,1
< 1,0
---
> 1,1
> 1,2

Is there something more specific you were looking for? 您在寻找更具体的东西吗?

diff may move chunks within a file which is not what you want I think. diff可能会在文件中移动块,这不是我想要的。 Here's an alternative: 这是一个替代方案:

join -t'\0' -v2 <(cat -n a.txt) <(cat -n b.txt) | wc -l

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM