在所有列中使用 awk 移动平均

Question

I have a data as:我有一个数据：

I would like to print the moving average with each 5 past numbers for all columns.我想打印所有列的每 5 个过去数字的移动平均线。

Desire Output is:愿望 Output 是：

2.4     2.2
13.2    2.6
13.2    2.4
13.2    2
13.2    2
13.8    2.2
3.6     2.4
3.6     2.6
3.8     3.6
4.6     4

Answer 1

You can do it with three-rules utilizing a "Sliding-Window" for values stored in two arrays a[] and b[] .您可以通过三个规则使用“滑动窗口”来存储两个 arrays a[]和b[]中的值。 You simply fill each element using a counter n as the index, and then when n >= 5 output the sum and delete the value at a[n-4] and b[n-4] (optional) and keep going.您只需使用计数器n作为索引填充每个元素，然后当n >= 5 output 求和并delete a[n-4]和b[n-4]处的值（可选）并继续。 Your first rule is just that (with the addition of a loop and a sum of the last 5 values in each for the average.您的第一条规则就是这样（添加一个循环和每个平均值的最后 5 个值的总和。

Your second rule simply validates you have 2 fields and fills the a[] and b[] arrays.您的第二条规则只是验证您有 2 个字段并填写a[]和b[] arrays。 (you can add tests to ensure both field1 and field2 are numeric values -- that is left to you) （您可以添加测试以确保 field1 和 field2 都是数值——这留给您）

Your third rule is the END rule which computes and outputs the final sum, eg您的第三条规则是计算并输出最终总和的END规则，例如

awk '
    n >= 5 {
        suma=sumb=0
        for (i = n-4; i <= n; i++) {
            suma+=a[i]
            sumb+=b[i]
        }
        print suma/5"\t"sumb/5
        delete a[n-4]
        delete b[n-4]
    }
    NF >= 2 {
        a[++n] = $1
        b[n] = $2
    }
    END {
        suma=sumb=0
        for (i = n-4; i <= n; i++) {
            suma+=a[i]
            sumb+=b[i]
        }
        print suma/5"\t"sumb/5
    }
' data

(instead of looping to compute the sums, you can keep running sums and subtract the values you unset from the arrays -- up to you) （而不是循环计算总和，您可以继续运行总和并从 arrays 中减去您未unset的值——由您决定）

Example Use/Output示例使用/输出

You can simply use an xterm and change to the directory where your data file is stored (change the name as needed) and select-copy the awk script above and middle-mouse-paste into the xterm.您可以简单地使用 xterm 并切换到存储data文件的目录（根据需要更改名称），然后选择复制上面的awk脚本并用鼠标中键粘贴到 xterm。 You will receive:您将收到：

2.4     2.2
13.2    2.6
13.2    2.4
13.2    2
13.2    2
13.8    2.2
3.6     2.4
3.6     2.6
3.8     3.6
4.6     4

Keeping Running Sums保持运行总和

If you did want to keep running sums ( suma and sumb ) and remove the values at n-4 instead of looping, (which would be slightly more efficient), you could do:如果您确实想继续运行总和（ suma和sumb ）并删除n-4处的值而不是循环（这会更有效），您可以这样做：

awk '
    n >= 5 {
        print suma/5"\t"sumb/5
        suma -= a[n-4]
        sumb -= b[n-4]
    }
    NF >= 2 {
        a[++n] = $1
        b[n] = $2
        suma += a[n]
        sumb += b[n]
    }
    END {
        print suma/5"\t"sumb/5
    }
' data

The output is the same. output 是一样的。

Answer 2

Here is another awk using 2 pass:这是另一个使用 2 遍的awk ：

awk -v OFS='\t' 'FNR == NR {
   a[FNR] = $1
   b[FNR] = $2
   for (i=FNR-4; FNR>= 5 && i<=FNR; i++) {
      sum1[FNR-4] += a[i]
      sum2[FNR-4] += b[i]
   }
   tr = FNR
   next
}
FNR <= tr-4 {
   printf "%.2f%s%.2f\n", sum1[FNR]/5, OFS, sum2[FNR]/5
}' file file

2.40    2.20
13.20   2.60
13.20   2.40
13.20   2.00
13.20   2.00
13.80   2.20
3.60    2.40
3.60    2.60
3.80    3.60
4.60    4.00

Answer 3

Could you please try following, adding one more way of doing this.您能否尝试以下操作，添加另一种方法。 Written and tested with shown samples in GNU awk .使用 GNU awk中的示例编写和测试。

awk '
FNR==NR{
  a[FNR]=$1
  b[FNR]=$2
  lines++
  next
}
FNR<=(lines-4){
  ++count
  for(i=count;i<=(4+count);i++){
    sum1+=a[i]
    sum2+=b[i]
  }
  print sum1/5,sum2/5
  sum1=sum2=""
}
' Input_file  Input_file | column -t

Answer 4

All presented results are very memory intensive as the load the entire system into memory.所有呈现的结果都是非常密集的 memory，因为将整个系统加载到 memory 中。 While some delete the allocated memory, it is easier to just use a modular index.虽然有些删除分配的 memory，但使用模块化索引更容易。 On top of that you don't really need to constantly recompute the sums (with floats I would argue differently if you have a high precision demand, but with integers it is not needed):最重要的是，您实际上并不需要不断地重新计算总和（如果您有高精度需求，我会以不同的方式争论浮点数，但对于整数则不需要）：

This solution assumes an equal amount of columns and a sliding window of n :此解决方案假定列数相等且滑动 window 为n ：

awk -v n=5 '{for(i=1;i<=NF;++i) {s[i] = s[i] - a[FNR%n,i] + $i; a[FNR%n,i]=$i } }
            (FNR >= n)  { for(i=1;i<=NF;++i) printf "%s" (i==NF?ORS:OFS), s[i]/n }' file

在所有列中使用 awk 移动平均

问题描述

4 个解决方案

解决方案1
3 已采纳 2020-06-22 05:31:15

解决方案2
2 2020-06-22 05:42:50

解决方案3
2 2020-06-22 06:06:41

解决方案4
1 2020-06-22 07:02:36

在所有列中使用 awk 移动平均

问题描述

4 个解决方案

解决方案1 3 已采纳 2020-06-22 05:31:15

解决方案2 2 2020-06-22 05:42:50

解决方案3 2 2020-06-22 06:06:41

解决方案4 1 2020-06-22 07:02:36

解决方案1
3 已采纳 2020-06-22 05:31:15

解决方案2
2 2020-06-22 05:42:50

解决方案3
2 2020-06-22 06:06:41

解决方案4
1 2020-06-22 07:02:36