使用awk，根据第二和第五列中的字符串以及第三列中的值计算行的平均值，并附加结果

Question

This is a variant on Using awk, how to convert dates to week and quarter? 这是使用awk的变体，如何将日期转换为星期和季度？

Input data.txt: 输入data.txt：

a;2016-04-25;10;2016-w17;2016-q2
b;2016-04-25;20;2016-w17;2016-q2
c;2016-04-25;30;2016-w17;2016-q2
d;2016-04-26;40;2016-w17;2016-q2
e;2016-07-25;50;2016-w30;2016-q3
f;2016-07-25;60;2016-w30;2016-q3
g;2016-07-25;70;2016-w30;2016-q3

Wanted output.txt: 想要output.txt：

a;2016-04-25;10;2016-w17;2016-q2;50
b;2016-04-25;20;2016-w17;2016-q2;50
c;2016-04-25;30;2016-w17;2016-q2;50
d;2016-04-26;40;2016-w17;2016-q2;50
e;2016-07-25;50;2016-w30;2016-q3;180
f;2016-07-25;60;2016-w30;2016-q3;180
g;2016-07-25;70;2016-w30;2016-q3;180

Hence, calculate the quarterly average of the days which has data and append the result. 因此，请计算具有数据的天数的季度平均值并附加结果。

For 2016-q2 the average is calculated as follows: 对于2016年第二季度，平均值计算如下：

(10+20+30+40)/2 = 50     ("2" is the number_of_unique_dates for that quarter)

For 2016-q3 the average is: 对于2016年第三季度，平均值为：

(50+60+70)/1 = 180

Here is my work in progress which seem quite close to a final solution, but not sure how to get the "number of unique dates" (column 2) and use as divisor? 这是我正在进行的工作，似乎很接近最终解决方案，但是不确定如何获取“唯一日期数”（第2列）并将其用作除数吗？

awk '
BEGIN { FS=OFS=";" }
NR==FNR { s[$5]+=$3; next }
{ print $0,s[$5] / need_num_of_unique_dates_here }
 ' output.txt output.txt

Any idea how to get the "number of unique dates" per quarter? 知道如何获取每个季度的“唯一日期数”吗？

Answer 1

$ cat tst.awk
BEGIN { FS=OFS=";" }
$5 != p5 { prt(); p5=$5 }
{ lines[++numLines]=$0; dates[$2]; sum+=$3 }
END { prt() }
function prt(   lineNr) {
    for (lineNr=1; lineNr<=numLines; lineNr++) {
        print lines[lineNr], sum/length(dates)
    }
    delete dates
    numLines = sum = 0
}

$ awk -f tst.awk file
a;2016-04-25;10;2016-w17;2016-q2;50
b;2016-04-25;20;2016-w17;2016-q2;50
c;2016-04-25;30;2016-w17;2016-q2;50
d;2016-04-26;40;2016-w17;2016-q2;50
e;2016-07-25;50;2016-w30;2016-q3;125
f;2016-07-25;60;2016-w30;2016-q3;125
g;2016-07-25;70;2016-w30;2016-q3;125
h;2016-04-01;70;2016-w30;2016-q3;125

Answer 2

Another gawk solution: 另一个gawk解决方案：

awk -F';' '{ a[$5][$2]+=$3; r[NR]=$0; q[NR]=$5 }
     END { 
           for (i in a) { s=0; len=length(a[i]); 
               for (j in a[i]) { s += a[i][j] } 
               a[i]["avg"] = s/len 
           } 
           for (n=1;n<=NR;n++) { print r[n],a[q[n]]["avg"] }
     }' OFS=";" file

The output: 输出：

a;2016-04-25;10;2016-w17;2016-q2,50
b;2016-04-25;20;2016-w17;2016-q2,50
c;2016-04-25;30;2016-w17;2016-q2,50
d;2016-04-26;40;2016-w17;2016-q2,50
e;2016-07-25;50;2016-w30;2016-q3,180
f;2016-07-25;60;2016-w30;2016-q3,180
g;2016-07-25;70;2016-w30;2016-q3,180

a[$5][$2]+=$3 - multidimensional array, summing up values for each unique date within a certain quarter a[$5][$2]+=$3多维数组，将某个季度内每个唯一日期的值相加
len=length(a[i]) - determining the number of unique dates within a certain quarter len=length(a[i]) -确定某个季度内唯一日期的数量
for(j in a[i]){ s+=a[i][j] } - summing up values for all dates within a quater for(j in a[i]){ s+=a[i][j] } -对四分之一内所有日期的值求和
a[i]["avg"]=s/len - calculating average value a[i]["avg"]=s/len计算平均值

使用awk，根据第二和第五列中的字符串以及第三列中的值计算行的平均值，并附加结果

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-05-28 16:14:16

解决方案2
1 2017-05-28 16:32:50

使用awk，根据第二和第五列中的字符串以及第三列中的值计算行的平均值，并附加结果

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-05-28 16:14:16

解决方案2 1 2017-05-28 16:32:50

解决方案1
1 已采纳 2017-05-28 16:14:16

解决方案2
1 2017-05-28 16:32:50