简体   繁体   English

使用 AWK 计算文件中字母的平均表达

[英]Calculate average expression of a letter in a file using AWK

I have a problem with a file.我的文件有问题。 An example of the input file is the following输入文件的示例如下

A   0.234
B   0.345
A   0.43
B   0.323
A   0.78
B   0.45
F   0.89
L   0.34
F   0.21
L   0.3
F   0.1

I need to calculate the average expression of each letter so the output will be similar to this我需要计算每个字母的平均表达,因此 output 将与此类似

A   0.4813
B   0.3727
F   0.4
L   0.32

I already designed my code so it finds in the file the lines which contains only one letter and also calculate the proportion of each letter in the file.我已经设计了我的代码,因此它可以在文件中找到仅包含一个字母的行,并计算文件中每个字母的比例。 My question is if I can continue using that assumptions to calculate the average expression, because I don't know how to count each letter and add its corresponding value.我的问题是我是否可以继续使用该假设来计算平均表达式,因为我不知道如何计算每个字母并添加其相应的值。

This is my code, which a partner from stackflow helped me with.这是我的代码,stackflow 的一位合作伙伴帮助了我。

awk 'length($0)==1{a[$0]++} END{for(c in a) print c, a[c]/NR}'

You may use this awk :您可以使用此awk

awk '{sum[$1] += $2; ++freq[$1]} END {for (i in freq) printf "%s\t%.4f\n", i, sum[i]/freq[i]}' file

A   0.4813
B   0.3727
F   0.4000
L   0.3200

Even though you asked for awk specifically, why would you reinvented the wheel, when there are already plenty of tools designed for jobs like these, for instance GNU datamash:即使您特别要求awk ,但当已经有很多工具专为此类工作而设计时,您为什么要重新发明轮子,例如 GNU datamash:

datamash -Wsg 1 mean 2 < yourFile 

prints印刷

A       0.48133333333333
B       0.37266666666667
F       0.4
L       0.32

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM