简体   繁体   English

为指定的行添加列并使用awk除以行数

[英]adding columns for specified rows & dividing by the number of rows using awk

So I'm really new to using linux and script commands, help would really be appreciated! 因此,我对于使用linux和脚本命令真的很陌生,非常感谢您的帮助! I have a file of 1050 rows and 8 columns. 我有1050行8列的文件。 Example: 例:

anger 1 0 5 101 13 2 somesentenceofwords
anger 2 0 5 101 23 3 somesentenceofwords
anger 3 0 3 101 35 3 somesentenceofwords
anger 4 0 2 101 23 3 somesentenceofwords
arch 5 0 3 101 34 12 somesentenceofwords
arch 6 0 2 101 45 23 somesentenceofwords
arch 7 0 2 101 23 12 somesentenceofwords
hand 8 9 0 101 32 21 somesentenceofwords
hand 9 0 2 101 23 12 somesentenceofwords

What I want to do is if the first column is the same for x number of rows then output the sum of the 6th column for those rows and divide it by the number of rows (an average essentially). 我想做的是,如果第一列与x的行数相同,则输出这些行的第六列的总和,然后将其除以行数(本质上是平均值)。

So in the example since the first 4 rows are all anger I want to get the average of the numbers corresponding to all rows with anger in column 1 for column 6. It would add 13 + 23 + 35 + 23 / 4. It would then do the same for arch, then hand and so on. 因此,在该示例中,由于前4行都是愤怒,因此我想获取第1列与第6列中所有带有愤怒的行相对应的数字的平均值。它将加上13 + 23 + 35 + 23 /4。然后对足弓做同样的事情,然后做手,依此类推。

Example output: 输出示例:

anger 23.5 arch 34 hand 27.5

I tried this just to see if I can do it individually where each column would equal a specific letter string but couldn't even get that to work. 我尝试这样做只是为了看看是否可以单独进行操作,其中每一列都等于一个特定的字母字符串,但甚至无法正常工作。

$ awk '{if($1="anger"){sum+=$6} {print sum}}' filename

Is this possible? 这可能吗?

Pretty straight forward with awk: 用awk挺简单的:

$ awk '{a[$1]+=$6;b[$1]++}END{for (i in a) print i,a[i]/b[i]}' file
hand 27.5
arch 34
anger 23.5

How this works? 如何运作?

The block {a[$1]+=$6;b[$1]++} is executed for every line that is read. 对每条读取的行执行块{a[$1]+=$6;b[$1]++} We create two maps, one storing the sum, for each key, and one storing the count for each key. 我们创建了两个映射,一个映射为每个密钥存储总和,一个映射为每个密钥存储计数。

The block END{for (i in a) print i,a[i]/b[i]} is executed after all lines are read. 读取所有行后,执行块END{for (i in a) print i,a[i]/b[i]} We iterate over the keys of the first map, and print the key, and the division of the sum over the count (ie the mean). 我们遍历第一张图的键,然后打印键,以及总和除以计数(即均值)。

Using awk : 使用awk

awk '!($1 in s){b[++i]=$1; s[$1]=0} {c[$1]++; s[$1]+=$6} 
        END{for (k=1; k<=i; k++) printf "%s %.1f\n", b[k], s[b[k]]/c[b[k]]}' file
anger 23.5
arch 34.0
hand 27.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM