简体   繁体   中英

Bash script for comparing numbers in columns

I have a problem with writing a bash script and hope that someone can help me with this. I have written a few smaller scripts in bash before, so I'm not totally new, but there's still lots of space for improvement.

So, I have a file that only contains two columns of decimal numbers, like:

0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
...

What I want to do is to compare every number in the first column with every number in the second column and check, if any two numbers are equal and print this number then to the screen or a file.

I found an answer for how to do this in an excel table, but I would be really interested in how to do this in bash or maybe with awk.

The first problem for me is that I don't even know how I would compare the first number to all others in the second column. I guess that I would have to do this via arrays. I could read the two numbers by a 'while read var_1 var_2' command and then I would have to somehow add var_1 of each line to an array_1, same for var_2 for another array_2 and then I somehow would have to compare all the elements with each other.

But I don't know how to. I hope someone can help me.

Using awk

awk 'FNR==NR {a[$1]++;next} ($2 in a) {print $2}' file file
4.08
1.38

Read the file and store column #1 in array a , then compare column #2 with array a

cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38

this line should work:

 awk '{a[$1]=1;b[$2]}END{for(x in b){a[x]++;if(a[x]>1)print x}}' file

note that the loop and check in END is for excluding the duplicated numbers in same column. if each col has unique numbers, that part could be simplified.

with fedorqui's example, the output is:

4.08
1.38


cat file
0.46    0.68
0.92    1.36
1.38    2.04
1.84    2.72
 2.3    3.4
2.76    4.08
3.22    4.76
3.68    5.44
4.14    6.12
4.08    1.38

一行,转换为一列,排序并使用uniq仅打印重复项:

(awk '{print $1}' test_input|sort|uniq  ; awk '{print $2}' test_input|sort|uniq)|sort|uniq -d

A bash solution that works the way you described:

#!/bin/bash

while read c1 c2 ;do
    c1a=("${c1a[@]}" "$c1")
    c2a=("${c2a[@]}" "$c2")
done < numbers.txt

for c1 in ${c1a[@]} ;do
    for c2 in ${c2a[@]} ;do
        [[ $c1 == $c2 ]] && echo $c1
    done
done

使用awk两次不读取文件。

awk '{a[$1];b[$2];for (i in b) if (i in a) {print i;delete a[i];delete b[i]}}' file
awk '{ a[$1]; b[$2] }
END{
    for (x in a) {
        for (y in b) {
            if (x+0 == y) {
                print x
                break
            }
        }
    }
}' file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM