简体   繁体   English

用awk在列中打印某些参数

[英]Print certain parameter in column with awk

I stumbled over a little problem, which I am not able to solve with awk in a bash script. 我偶然发现了一个小问题,在bash脚本中用awk无法解决该问题。

I do have following data file: 我确实有以下数据文件:

 33   1000   1.108932e-01   2.825803e+00  -9.955642e-05    0.0000e+00       0.0000e+00    8.012180e-02 4.081916e-02

 0.0000e+00   7.8557e-01   6.1128e+01   4.0468e+00  -9.9558e-05   3.8526e-02   3.1874e-03   5.1303e-01   0.0000e+00

 1.6667e-02   7.8530e-01   6.0977e+01   4.0552e+00   1.0627e-01   7.8951e-02   6.2521e-03   5.0750e-01   0.0000e+00

...

which has a header line with 10 elements, followed by an array with 33 rows and 9 columns. 标题行包含10个元素,后跟一个33行9列的数组。

I would like to use the data in this file to print out the forth parameter from the header line followed by the average of line 3 (ie sum+=$3 / {Number of lines} ). 我想使用此文件中的数据从标题行开始打印第四个参数,然后打印第3行的平均值(即sum+=$3 / {Number of lines} )。 At the moment, I try to do it like: 目前,我尝试这样做:

gawk '{time=FNR==1{$4};if(NR>1)sum+=$3}; time = FNR == 1{$4} END {sum=sum/(NR-1); print time " " sum}' $tmpn.data >> $tmpn.vrms

It works fine for the average, however, the time paramter is not correct and I only get a 0 as return. 对于平均而言,它可以正常工作,但是,时间参数不正确,我只能得到0作为回报。 Maybe I am missing only a small thing, but, unfortunately I couldn't find anything online. 也许我只是想念一小件事,但是,不幸的是我找不到任何在线内容。 What would be the best way to solve this issue. 解决此问题的最佳方法是什么。

Thanks for the help. 谢谢您的帮助。

Cheers. 干杯。

Try: 尝试:

awk 'NR==1 {time=$4;next} {sum+=$3} END {print time, (sum/(NR-1))}' $tmpn.data >>$tmpn.vrms
  • NR==1 {time=$4;next} is a pattern-action pair: NR==1 {time=$4;next}是一个模式-动作对:

    • Pattern (condition) NR==1 is only true for the first input line. 模式(条件) NR==1仅对第一条输入线有效。
    • Thus, action {time=$4;next} is only executed for the first line, and it stores the header's 4th field in variable time , then proceeds to the next record (line; next ). 因此,动作{time=$4;next}仅在第一行执行,它在可变的time存储标头的第4个字段,然后前进到下一条记录(行; next )。
  • {sum+=$3} , which is processed for all remaining records (ie, the data records), iteratively sums up the values in the 3rd field in variable sum . {sum+=$3}对所有剩余的记录(即数据记录)进行处理,将变量sum中的第三字段中的值进行迭代sum

  • END {print time, (sum/(NR-1))} : END {print time, (sum/(NR-1))}

    • The END block is executed after all input records have been processed. 处理END所有输入记录后,将执行END块。
    • {print time, (sum/(NR-1))} prints the header field and the average of the 3rd-field values, separated by the default output field separator ( OFS ), which is a space. {print time, (sum/(NR-1))}打印标题字段和第三字段值的平均值,并用默认输出字段分隔符( OFS )分隔,该间隔为空格。 Note that NR contains the total number of input records inside the END block. 注意, NR包含END块中输入记录的总数。

A note on your solution attempt and awk 's philosophy : 关于您的解决方案尝试和awk的哲学的注释

  • As (currently) stated, your command breaks, because you've enclosed the entire script in {...} . 如(当前)所述,您的命令中断了,因为您已将整个脚本包含在{...}

  • Generally, awk 's terse elegance comes from a sequence of carefully crafted pattern-action pairs . 通常, awk简洁的优雅来自一系列精心制作的图案动作

    • A pattern is a condition (Boolean expression) that executes the associated action (a sequence of statements) only, if the condition is true. 模式是条件(布尔表达式),仅在条件为true时才执行关联的动作 (语句序列)。
    • Think of the pattern as the conditional part of an if statement with the "syntactic noise" removed, and the action as the body of that if statement: 将模式视为if语句的条件部分,其中删除了“句法杂音”,而将动作作为if语句的主体:
      <pattern> { <action-cmd1>; ... } <pattern> { <action-cmd1>; ... } is (conceptually) short for if (<pattern>) { <action-cmd1>; ... } <pattern> { <action-cmd1>; ... }从概念上讲是if (<pattern>) { <action-cmd1>; ... }缩写if (<pattern>) { <action-cmd1>; ... } if (<pattern>) { <action-cmd1>; ... }
  • In a given pair, you may either omit the action, or the pattern : 在给定对中, 您可以省略操作或模式

    • If you omit the pattern , the action is executed unconditionally (though the action may still not get to execute, if a previous pattern-action pair skipped further processing, such as with next or exit ). 如果您省略了pattern ,那么该动作将无条件执行 (尽管如果上一个pattern-action对跳过了诸如nextexit进一步处理,则动作可能仍然无法执行)。

    • If you omit the action , the default action is { print } , ie, to print the (potentially modified) current record. 如果省略该操作 ,则默认操作为{ print } ,即打印(可能已修改的)当前记录。

      • This behavior is enables the common shorthand 1 to simply print the current record: 1 is a pattern that, in the Boolean context in which patterns are evaluated, is always true, and, in the absence of an associated action, the current record is printed by default. 此行为使通用速记1可以简单地打印当前记录: 1是一种模式,在评估模式的布尔上下文中,该模式始终为true,并且在没有相关动作的情况下,将打印当前记录默认。

Another version in awk using getline in while loop to read and detect the end of file and then output the header buffer b and the average: awk中的另一个版本,它在while循环中使用getline读取和检测文件结尾,然后输出标头缓冲区b和平均值:

$ awk 'NR==1{b=$4; while(getline==1){s+=$3;c++} print b,s/c}' data
4th 40.7386

It expects the data file to have a header line. 它期望data文件具有标题行。 Explained: 解释:

NR==1 {                  # read in the first line and ...
    b=$4                 # ... buffer the 4th field of the header 
    while(getline==1) {  # then read while there are records to read
        s+=$3            # sum up the values in the 3rd field
        c++              # count the number of values, add if($3!="") if needed
    } 
    print b, s/c         # after while output header and average
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM