简体   繁体   中英

Calculate sum of column using reference of other column in awk

I have a file which contains 2 column. first column contains some keyword and second contains its size. Keywords can be repeated like below:

data1 5
data2 7
data3 4
data2 6
data1 3
data2 8

I want to calculate sum of sizes which are bound with same keyword.

For example output of above data will be:

data1 8
data2 21
data3 4

Is it possible using awk?

If yes then kindly guide me.

You can do awk with array:

awk '{a[$1]+=$2} END {for (i in a) print i,a[i]}' file
data1 8
data2 21
data3 4

How it works a[$1] this create array named a using field #1 as reference.
a[$1]+=$2 is the same as a[$1]=a[$1]+$2 add value of field #2 to the array a[$1]
for (i in a) loop trough all value in array a[$1]
print i,a[i] prints the array i and the value of array a[i]

If you want to keep output in same order as the input then use this little longer awk:

awk '$1 in a{a[$1]+=$2; next} {b[++k]=$1; a[$1]=$2}
             END{for(i=1; i<=k; i++) print b[i], a[b[i]]}' file
data1 8
data2 21
data3 4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM