在awk中打印搜索模式

Question

我想打印匹配的搜索模式，然后计算平均行数。 最好的例子是：

输入文件：

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280 
chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282 
chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285 
chr17   41275978    41276294    BRCA1_ex02_04   286 
chr17   41275978    41276294    BRCA1_ex02_04   287 
chr17   41275978    41276294    BRCA1_ex02_04   288

我想在bash循环中提取相同的第四列：

输出1：

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280

输出2：

chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282

输出3：

chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285

等等，依此类推。然后计算第5列的平均值非常容易：

awk'END {sum + = $ 5} {print NR / sum}'in_file.txt

在我的情况下，有数千行BRCA1_exXX_XX-那么有什么主意可以拆分吗？

保罗

Answer 1

我认为这会做您想要的。

awk '{
    # Keep running sum of fifth column based on value of fourth column.
    v[$4]+=$5;
    # Keep count of lines with similar fourth column values.
    n[$4]++
}

END {
    # Loop over all the values we saw and print out their fourth columns and the sum of the fifth columns.
    for (val in n) {
        print val ": " v[val] / n[val]
    }
}' $file

Answer 2

假定条目按照给定数据中的第四列进行排序，则可以这样进行：

awk '

  $4 != prev {              # if this line's 4th column is different from the previous line
    if (cnt > 0)            # if count of lines is greater than 0
      print prev, sum / cnt #   print the average
    prev = $4               # save previous 4th column
    sum = $5                # initialize sum to column 5
    cnt = 1                 # initialize count to 1
    next                    # go to next line
  }

  {
    sum += $5               # accumulate total of 5th column
    ++cnt                   # increment count of lines
  }

  END {
    if (cnt > 0)             # if count > 0 (avoid divide by 0 on empty file)
      print prev, sum / cnt  #   print the average for the last line
  }

' file

在awk中打印搜索模式

问题描述

2 个解决方案

解决方案1
2 2014-07-07 14:51:35

解决方案2
1 已采纳 2014-07-07 14:44:31

在awk中打​​印搜索模式

问题描述

2 个解决方案

解决方案1 2 2014-07-07 14:51:35

解决方案2 1 已采纳 2014-07-07 14:44:31

在awk中打印搜索模式

解决方案1
2 2014-07-07 14:51:35

解决方案2
1 已采纳 2014-07-07 14:44:31