簡體   English   中英

在awk中打​​印搜索模式

[英]Print the Search pattern in awk

我想打印匹配的搜索模式,然后計算平均行數。 最好的例子是:

輸入文件:

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280 
chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282 
chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285 
chr17   41275978    41276294    BRCA1_ex02_04   286 
chr17   41275978    41276294    BRCA1_ex02_04   287 
chr17   41275978    41276294    BRCA1_ex02_04   288 

我想在bash循環中提取相同的第四列:

輸出1:

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280 

輸出2:

chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282 

輸出3:

chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285 

等等,依此類推。然后計算第5列的平均值非常容易:

awk'END {sum + = $ 5} {print NR / sum}'in_file.txt

在我的情況下,有數千行BRCA1_exXX_XX-那么有什么主意可以拆分嗎?

保羅

我認為這會做您想要的。

awk '{
    # Keep running sum of fifth column based on value of fourth column.
    v[$4]+=$5;
    # Keep count of lines with similar fourth column values.
    n[$4]++
}

END {
    # Loop over all the values we saw and print out their fourth columns and the sum of the fifth columns.
    for (val in n) {
        print val ": " v[val] / n[val]
    }
}' $file

假定條目按照給定數據中的第四列進行排序,則可以這樣進行:

awk '

  $4 != prev {              # if this line's 4th column is different from the previous line
    if (cnt > 0)            # if count of lines is greater than 0
      print prev, sum / cnt #   print the average
    prev = $4               # save previous 4th column
    sum = $5                # initialize sum to column 5
    cnt = 1                 # initialize count to 1
    next                    # go to next line
  }

  {
    sum += $5               # accumulate total of 5th column
    ++cnt                   # increment count of lines
  }

  END {
    if (cnt > 0)             # if count > 0 (avoid divide by 0 on empty file)
      print prev, sum / cnt  #   print the average for the last line
  }

' file

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM