简体   繁体   中英

Sum columns based on 2 different fields using awk

I am trying to collapse lines that have same names by summing a particular field. I would also like to check if another field is having a different id as well. For eg., My file looks like this:

F1  F2  F3  F4  F5
1   A_1 1   B_1 4
2   A_1 2   B_1 5
3   A_2 4   B_1 2
4   A_3 3   B_2 4
5   A_3 2   B_2 2
6   A_3 1   B_2 1
7   A_4 2   B_2 2

I want to check the F4 value and F2 value to sum F5 and F3 as follows:

1   A_1 3   B_1 9
3   A_2 4   B_1 2
6   A_3 6   B_2 7
7   A_4 2   B_2 2

so far, I've tried this:

awk 'BEGIN{OFS=FS="\t"}FNR==NR{a[$4]+=$5;next}; {print $0,a[$4]}' \ 
dummy.txt dummy.txt |sort -k 4,4 -u

which gives me:

1       A_1     1       B_1     4       11
4       A_3     3       B_2     4       9

How can I modify this so that it'll consider the F2 as well before merging? I would prefer awk, but other solutions are welcome too!

You can use this gnu awk command:

awk 'BEGIN {
   FS=OFS="\t"
   PROCINFO["sorted_in"] = "@ind_num_asc"
}
{
   k=$2 SUBSEP $4
}
!(k in c1) {
   c1[k]=$1
   c2[k]=$2
   c4[k]=$4
}
{
   s3[k]+=$3
   s5[k]+=$5
} 
END {
   for (i in s3)
      print c1[i], c2[i], s3[i], c4[i], s5[i]
}' file

1   A_1 3   B_1 9
3   A_2 4   B_1 2
4   A_3 6   B_2 7
7   A_4 2   B_2 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM