简体   繁体   English

如何使用awk聚合数据行

[英]How to use awk to aggregate rows of data

I have a question, I have a set of data in rows which some rows are belong to a group. 我有一个问题,我有一组行数据,其中一些行属于一个组。

Eg 例如

Apple 0.4 0.5 0.6
Orange 0.2 0.3 0.2
Apple 0.4 0.3 0.4
Orange 0.4 0.5 0.8

The question is how can I automatically aggregate the columns accordingly using awk. 问题是如何使用awk自动聚合列。 In the past, I would easily deal with the following awk manually for each file.. 在过去,我会轻松地为每个文件手动处理以下awk ..

awk '{col2[$1]+=$2; col3[$1]+=$3; col4[$1]+=$4} END {for(i in col2){printf("%s\t%.2f\%.2f\t%.2f\n",i,col2[i]/2,col3[i]/2,col4[i]/2)}}' myfile

But this time around the I am dealing with several files with different NF (number of fields) and I try to issue a command to automatically calculate the average of the group. 但这次我正在处理几个具有不同NF(字段数)的文件,我尝试发出一个命令来自动计算组的平均值。 Eventually, we will have 最终,我们将拥有

Apple 0.4 0.5 0.5
Orange 0.3 0.4 0.5

Please advise. 请指教。 Thanks. 谢谢。

here's something for a start. 这是一个开始的东西。

awk '
{
    fruits[$1]++
    for(o=2;o<=NF;o++){
        fruit[$1 SUBSEP o]=fruit[$1 SUBSEP o]+$o
    }
}
END{
    for(combined in fruit){
        split(combined, sep,    SUBSEP)
        avg=fruit[ sep[1] SUBSEP sep[2] ]/fruits[ sep[1] ]
        f[sep[1],sep[2]]=avg
    }
    for(fr in fruits) {
        printf "%s ",fr
        for(i=2;i<=NF;i++){
            printf "%s ",f[fr,i]

        }
        print ""
    }
}' file

output 产量

$ ./shell.sh
Orange 0.3 0.4 0.5
Apple 0.4 0.4 0.5

Reference to gawk is here 这里提到了gawk

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM