Calculate sum of column using reference of other column in awk

Question

I have a file which contains 2 column. first column contains some keyword and second contains its size. Keywords can be repeated like below:

data1 5
data2 7
data3 4
data2 6
data1 3
data2 8

I want to calculate sum of sizes which are bound with same keyword.

For example output of above data will be:

data1 8
data2 21
data3 4

Is it possible using awk?

If yes then kindly guide me.

Answer 1

You can do awk with array:

awk '{a[$1]+=$2} END {for (i in a) print i,a[i]}' file
data1 8
data2 21
data3 4

How it works a[$1] this create array named a using field #1 as reference.
a[$1]+=$2 is the same as a[$1]=a[$1]+$2 add value of field #2 to the array a[$1]
for (i in a) loop trough all value in array a[$1]
print i,a[i] prints the array i and the value of array a[i]

Answer 2

If you want to keep output in same order as the input then use this little longer awk:

awk '$1 in a{a[$1]+=$2; next} {b[++k]=$1; a[$1]=$2}
             END{for(i=1; i<=k; i++) print b[i], a[b[i]]}' file
data1 8
data2 21
data3 4

Calculate sum of column using reference of other column in awk

Question

2 answers

solution1
4 ACCPTED 2014-06-23 14:41:14

solution2
1 2014-06-23 14:44:48

Calculate sum of column using reference of other column in awk

Question

2 answers

solution1 4 ACCPTED 2014-06-23 14:41:14

solution2 1 2014-06-23 14:44:48

solution1
4 ACCPTED 2014-06-23 14:41:14

solution2
1 2014-06-23 14:44:48