简体   繁体   English

逐行比较两个文本文件,发现差异但忽略数值差异

[英]Compare two text files line by line, finding differences but ignoring numerical values differences

I'm working on a bash script to compare two similar text files line by line and find the eventual differences between each line of the files, i should point the difference and tell in which line the difference is, but i should ignore the numerical values in this comparison.我正在编写一个 bash 脚本来逐行比较两个相似的文本文件并找到文件的每一行之间的最终差异,我应该指出差异并告诉差异在哪一行,但我应该忽略数值在这个比较中。
Example :例子 :

Process is running; process found : 12603 process is listening on port 1200
Process is running; process found : 43023 process is listening on port 1200

in the example above, the script shouldn't find any difference since it's just the process id and it changes all the time.在上面的示例中,脚本不应该发现任何差异,因为它只是进程 ID,并且它一直在变化。
But otherwise i want it to notify me of the differences between the lines.但除此之外,我希望它通知我行之间的差异。
Example :例子 :

Process is running; process found : 12603 process is listening on port 1200
Process is not running; process found : 43023 process is not listening on port 1200

i already have a working script to find the differences, and i've used the following function to find the difference and ignore the numerical values, but it's not working perfectly, Any suggestions ?我已经有一个工作脚本来查找差异,并且我使用以下函数来查找差异并忽略数值,但它不能完美地工作,有什么建议吗?

    COMPARE_FILES()
{
    awk 'NR==FNR{a[FNR]=$0;next}$0!~a[FNR]{print $0}' $1 $2
}

Where $1 and $2 are the two files to compare.其中 $1 和 $2 是要比较的两个文件。

Would you please try the following:请您尝试以下方法:

COMPARE_FILES() {
    awk '
    NR==FNR {a[FNR]=$0; next}
    {
        b=$0; gsub(/[0-9]+/,"",b)
        c=a[FNR]; gsub(/[0-9]+/,"",c)
        if (b != c) {printf "< %s\n> %s\n", $0, a[FNR]}
    }' "$1" "$2"
}

Any suggestions ?有什么建议么 ?

Jettison digits before making comparison, I would ameloriate your code following way replace在进行比较之前 Jettison 数字,我会按照替换方式改进您的代码

NR==FNR{a[FNR]=$0;next}$0!~a[FNR]{print $0}

using使用

NR==FNR{a[FNR]=$0;next}gensub(/[[:digit:]]/,"","g",$0)!~gensub(/[[:digit:]]/,"","g",a[FNR]){print $0}

Explanation: I harness gensub string function as it does return new string ( gsub alter selected variable value).说明:我利用gensub字符串函数,因为它确实返回了新字符串( gsub更改所选变量值)。 I replace [:digit:] character using empty string (ie delete it) g lobally.我用空字符串替换[:digit:]字符(即删除它) g lobly。

Using any awk:使用任何 awk:

compare_files() {
    awk '{key=$0; gsub(/[0-9]+(.[0-9]+)?/,RS,key)} NR==FNR{a[FNR]=key; next} key!~a[FNR]' "${@}"
}

The above doesn't just remove the digits, it replaces every set of numbers, whether they're integers like 17 or decimals like 17.31 , with the contents of RS (a newline by default) to avoid false matches like:上面不只是删除数字,它会替换每组数字,无论它们是像17这样的整数还是像17.31这样的小数,用RS的内容(默认为换行符)以避免错误匹配,例如:

file1: foo 1234 bar
file2: foo bar

If you just remove the digits then those 2 lines incorrectly become identical:如果您只是删除数字,那么这两行错误地变得相同:

file1: foo bar
file2: foo bar

whereas if you replace digits with aa newline then they correctly remain not identical:而如果您用换行符替换数字,那么它们正确地保持不相同:

file1: foo 
bar
file2: foo bar

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM