简体   繁体   中英

adding columns for specified rows & dividing by the number of rows using awk

So I'm really new to using linux and script commands, help would really be appreciated! I have a file of 1050 rows and 8 columns. Example:

anger 1 0 5 101 13 2 somesentenceofwords
anger 2 0 5 101 23 3 somesentenceofwords
anger 3 0 3 101 35 3 somesentenceofwords
anger 4 0 2 101 23 3 somesentenceofwords
arch 5 0 3 101 34 12 somesentenceofwords
arch 6 0 2 101 45 23 somesentenceofwords
arch 7 0 2 101 23 12 somesentenceofwords
hand 8 9 0 101 32 21 somesentenceofwords
hand 9 0 2 101 23 12 somesentenceofwords

What I want to do is if the first column is the same for x number of rows then output the sum of the 6th column for those rows and divide it by the number of rows (an average essentially).

So in the example since the first 4 rows are all anger I want to get the average of the numbers corresponding to all rows with anger in column 1 for column 6. It would add 13 + 23 + 35 + 23 / 4. It would then do the same for arch, then hand and so on.

Example output:

anger 23.5 arch 34 hand 27.5

I tried this just to see if I can do it individually where each column would equal a specific letter string but couldn't even get that to work.

$ awk '{if($1="anger"){sum+=$6} {print sum}}' filename

Is this possible?

Pretty straight forward with awk:

$ awk '{a[$1]+=$6;b[$1]++}END{for (i in a) print i,a[i]/b[i]}' file
hand 27.5
arch 34
anger 23.5

How this works?

The block {a[$1]+=$6;b[$1]++} is executed for every line that is read. We create two maps, one storing the sum, for each key, and one storing the count for each key.

The block END{for (i in a) print i,a[i]/b[i]} is executed after all lines are read. We iterate over the keys of the first map, and print the key, and the division of the sum over the count (ie the mean).

Using awk :

awk '!($1 in s){b[++i]=$1; s[$1]=0} {c[$1]++; s[$1]+=$6} 
        END{for (k=1; k<=i; k++) printf "%s %.1f\n", b[k], s[b[k]]/c[b[k]]}' file
anger 23.5
arch 34.0
hand 27.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM