简体   繁体   English

Bash脚本,用于比较列中的数字

[英]Bash script for comparing numbers in columns

I have a problem with writing a bash script and hope that someone can help me with this. 我在编写bash脚本时遇到问题,希望有人可以帮助我。 I have written a few smaller scripts in bash before, so I'm not totally new, but there's still lots of space for improvement. 我以前用bash写过一些较小的脚本,所以我不是一个新手,但仍有很多改进空间。

So, I have a file that only contains two columns of decimal numbers, like: 因此,我有一个仅包含两列十进制数字的文件,例如:

0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
...

What I want to do is to compare every number in the first column with every number in the second column and check, if any two numbers are equal and print this number then to the screen or a file. 我想做的是将第一列中的每个数字与第二列中的每个数字进行比较,并检查是否有两个数字相等,然后将此数字打印到屏幕或文件中。

I found an answer for how to do this in an excel table, but I would be really interested in how to do this in bash or maybe with awk. 我在excel表中找到了如何执行此操作的答案,但我对如何在bash或awk中执行此操作非常感兴趣。

The first problem for me is that I don't even know how I would compare the first number to all others in the second column. 对我来说,第一个问题是我什至不知道如何将第一个数字与第二列中的所有其他数字进行比较。 I guess that I would have to do this via arrays. 我猜想我将不得不通过数组来做到这一点。 I could read the two numbers by a 'while read var_1 var_2' command and then I would have to somehow add var_1 of each line to an array_1, same for var_2 for another array_2 and then I somehow would have to compare all the elements with each other. 我可以通过“ while read var_1 var_2”命令读取两个数字,然后我必须以某种方式将每行的var_1添加到array_1中,将var_2相同地添加到另一个array_2中,然后我必须以某种方式比较所有元素其他。

But I don't know how to. 但是我不知道该怎么办。 I hope someone can help me. 我希望有一个人可以帮助我。

Using awk 使用awk

awk 'FNR==NR {a[$1]++;next} ($2 in a) {print $2}' file file
4.08
1.38

Read the file and store column #1 in array a , then compare column #2 with array a 读取文件并将第1列存储在数组a ,然后将第2列与数组a

cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38

this line should work: 这行应该工作:

 awk '{a[$1]=1;b[$2]}END{for(x in b){a[x]++;if(a[x]>1)print x}}' file

note that the loop and check in END is for excluding the duplicated numbers in same column. 请注意,循环和签入END是为了排除同一列中重复的数字。 if each col has unique numbers, that part could be simplified. 如果每个col具有唯一编号,则可以简化该部分。

with fedorqui's example, the output is: 以fedorqui的示例为例,输出为:

4.08
1.38


cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38

一行,转换为一列,排序并使用uniq仅打印重复项:

(awk '{print $1}' test_input|sort|uniq  ; awk '{print $2}' test_input|sort|uniq)|sort|uniq -d

A bash solution that works the way you described: 一个按照您描述的方式工作的bash解决方案:

#!/bin/bash

while read c1 c2 ;do
    c1a=("${c1a[@]}" "$c1")
    c2a=("${c2a[@]}" "$c2")
done < numbers.txt

for c1 in ${c1a[@]} ;do
    for c2 in ${c2a[@]} ;do
        [[ $c1 == $c2 ]] && echo $c1
    done
done

使用awk两次不读取文件。

awk '{a[$1];b[$2];for (i in b) if (i in a) {print i;delete a[i];delete b[i]}}' file
awk '{ a[$1]; b[$2] }
END{
    for (x in a) {
        for (y in b) {
            if (x+0 == y) {
                print x
                break
            }
        }
    }
}' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM