[英]Calculate average expression of a letter in a file using AWK
I have a problem with a file.我的文件有问题。 An example of the input file is the following
输入文件的示例如下
A 0.234
B 0.345
A 0.43
B 0.323
A 0.78
B 0.45
F 0.89
L 0.34
F 0.21
L 0.3
F 0.1
I need to calculate the average expression of each letter so the output will be similar to this我需要计算每个字母的平均表达,因此 output 将与此类似
A 0.4813
B 0.3727
F 0.4
L 0.32
I already designed my code so it finds in the file the lines which contains only one letter and also calculate the proportion of each letter in the file.我已经设计了我的代码,因此它可以在文件中找到仅包含一个字母的行,并计算文件中每个字母的比例。 My question is if I can continue using that assumptions to calculate the average expression, because I don't know how to count each letter and add its corresponding value.
我的问题是我是否可以继续使用该假设来计算平均表达式,因为我不知道如何计算每个字母并添加其相应的值。
This is my code, which a partner from stackflow helped me with.这是我的代码,stackflow 的一位合作伙伴帮助了我。
awk 'length($0)==1{a[$0]++} END{for(c in a) print c, a[c]/NR}'
You may use this awk
:您可以使用此
awk
:
awk '{sum[$1] += $2; ++freq[$1]} END {for (i in freq) printf "%s\t%.4f\n", i, sum[i]/freq[i]}' file
A 0.4813
B 0.3727
F 0.4000
L 0.3200
Even though you asked for awk
specifically, why would you reinvented the wheel, when there are already plenty of tools designed for jobs like these, for instance GNU datamash:即使您特别要求
awk
,但当已经有很多工具专为此类工作而设计时,您为什么要重新发明轮子,例如 GNU datamash:
datamash -Wsg 1 mean 2 < yourFile
prints印刷
A 0.48133333333333
B 0.37266666666667
F 0.4
L 0.32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.