简体   繁体   English

在所有列中使用 awk 移动平均

[英]Moving average using awk in all columns

I have a data as:我有一个数据:

2       2
3       3
4       3
2       2
1       1
56      4
3       2
4       1
2       2
4       2
5       5
3       3
5       6
6       4

I would like to print the moving average with each 5 past numbers for all columns.我想打印所有列的每 5 个过去数字的移动平均线。

Desire Output is:愿望 Output 是:

2.4     2.2
13.2    2.6
13.2    2.4
13.2    2
13.2    2
13.8    2.2
3.6     2.4
3.6     2.6
3.8     3.6
4.6     4

You can do it with three-rules utilizing a "Sliding-Window" for values stored in two arrays a[] and b[] .您可以通过三个规则使用“滑动窗口”来存储两个 arrays a[]b[]中的值。 You simply fill each element using a counter n as the index, and then when n >= 5 output the sum and delete the value at a[n-4] and b[n-4] (optional) and keep going.您只需使用计数器n作为索引填充每个元素,然后当n >= 5 output 求和并delete a[n-4]b[n-4]处的值(可选)并继续。 Your first rule is just that (with the addition of a loop and a sum of the last 5 values in each for the average.您的第一条规则就是这样(添加一个循环和每个平均值的最后 5 个值的总和。

Your second rule simply validates you have 2 fields and fills the a[] and b[] arrays.您的第二条规则只是验证您有 2 个字段并填写a[]b[] arrays。 (you can add tests to ensure both field1 and field2 are numeric values -- that is left to you) (您可以添加测试以确保 field1 和 field2 都是数值——这留给您)

Your third rule is the END rule which computes and outputs the final sum, eg您的第三条规则是计算并输出最终总和的END规则,例如

awk '
    n >= 5 {
        suma=sumb=0
        for (i = n-4; i <= n; i++) {
            suma+=a[i]
            sumb+=b[i]
        }
        print suma/5"\t"sumb/5
        delete a[n-4]
        delete b[n-4]
    }
    NF >= 2 {
        a[++n] = $1
        b[n] = $2
    }
    END {
        suma=sumb=0
        for (i = n-4; i <= n; i++) {
            suma+=a[i]
            sumb+=b[i]
        }
        print suma/5"\t"sumb/5
    }
' data

(instead of looping to compute the sums, you can keep running sums and subtract the values you unset from the arrays -- up to you) (而不是循环计算总和,您可以继续运行总和并从 arrays 中减去您未unset的值——由您决定)

Example Use/Output示例使用/输出

You can simply use an xterm and change to the directory where your data file is stored (change the name as needed) and select-copy the awk script above and middle-mouse-paste into the xterm.您可以简单地使用 xterm 并切换到存储data文件的目录(根据需要更改名称),然后选择复制上面的awk脚本并用鼠标中键粘贴到 xterm。 You will receive:您将收到:

2.4     2.2
13.2    2.6
13.2    2.4
13.2    2
13.2    2
13.8    2.2
3.6     2.4
3.6     2.6
3.8     3.6
4.6     4

Keeping Running Sums保持运行总和

If you did want to keep running sums ( suma and sumb ) and remove the values at n-4 instead of looping, (which would be slightly more efficient), you could do:如果您确实想继续运行总和( sumasumb )并删除n-4处的值而不是循环(这会更有效),您可以这样做:

awk '
    n >= 5 {
        print suma/5"\t"sumb/5
        suma -= a[n-4]
        sumb -= b[n-4]
    }
    NF >= 2 {
        a[++n] = $1
        b[n] = $2
        suma += a[n]
        sumb += b[n]
    }
    END {
        print suma/5"\t"sumb/5
    }
' data

The output is the same. output 是一样的。

Here is another awk using 2 pass:这是另一个使用 2 遍的awk

awk -v OFS='\t' 'FNR == NR {
   a[FNR] = $1
   b[FNR] = $2
   for (i=FNR-4; FNR>= 5 && i<=FNR; i++) {
      sum1[FNR-4] += a[i]
      sum2[FNR-4] += b[i]
   }
   tr = FNR
   next
}
FNR <= tr-4 {
   printf "%.2f%s%.2f\n", sum1[FNR]/5, OFS, sum2[FNR]/5
}' file file
2.40    2.20
13.20   2.60
13.20   2.40
13.20   2.00
13.20   2.00
13.80   2.20
3.60    2.40
3.60    2.60
3.80    3.60
4.60    4.00

Could you please try following, adding one more way of doing this.您能否尝试以下操作,添加另一种方法。 Written and tested with shown samples in GNU awk .使用 GNU awk中的示例编写和测试。

awk '
FNR==NR{
  a[FNR]=$1
  b[FNR]=$2
  lines++
  next
}
FNR<=(lines-4){
  ++count
  for(i=count;i<=(4+count);i++){
    sum1+=a[i]
    sum2+=b[i]
  }
  print sum1/5,sum2/5
  sum1=sum2=""
}
' Input_file  Input_file | column -t

All presented results are very memory intensive as the load the entire system into memory.所有呈现的结果都是非常密集的 memory,因为将整个系统加载到 memory 中。 While some delete the allocated memory, it is easier to just use a modular index.虽然有些删除分配的 memory,但使用模块化索引更容易。 On top of that you don't really need to constantly recompute the sums (with floats I would argue differently if you have a high precision demand, but with integers it is not needed):最重要的是,您实际上并不需要不断地重新计算总和(如果您有高精度需求,我会以不同的方式争论浮点数,但对于整数则不需要):

This solution assumes an equal amount of columns and a sliding window of n :此解决方案假定列数相等且滑动 window 为n

awk -v n=5 '{for(i=1;i<=NF;++i) {s[i] = s[i] - a[FNR%n,i] + $i; a[FNR%n,i]=$i } }
            (FNR >= n)  { for(i=1;i<=NF;++i) printf "%s" (i==NF?ORS:OFS), s[i]/n }' file

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM