简体   繁体   中英

Print the Search pattern in awk

I would like to print the matching search pattern and then calculate average row. Best would be an expample:

input file:

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280 
chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282 
chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285 
chr17   41275978    41276294    BRCA1_ex02_04   286 
chr17   41275978    41276294    BRCA1_ex02_04   287 
chr17   41275978    41276294    BRCA1_ex02_04   288 

I wana extract in bash loop (for example) just the same 4th column:

output1:

chr17   41275978    41276294    BRCA1_ex02_01   278 
chr17   41275978    41276294    BRCA1_ex02_01   279 
chr17   41275978    41276294    BRCA1_ex02_01   280 

output2:

chr17   41275978    41276294    BRCA1_ex02_02   281 
chr17   41275978    41276294    BRCA1_ex02_02   282 

output3:

chr17   41275978    41276294    BRCA1_ex02_03   283 
chr17   41275978    41276294    BRCA1_ex02_03   284 
chr17   41275978    41276294    BRCA1_ex02_03   285 

an so on so on.. Then calculate average for 5th column is very easy:

awk 'END{sum+=$5}{print NR/sum}' in_file.txt

In my case, there are thousands lines BRCA1_exXX_XX - so any idea hot to split it?

Paul.

I think this will do what you want.

awk '{
    # Keep running sum of fifth column based on value of fourth column.
    v[$4]+=$5;
    # Keep count of lines with similar fourth column values.
    n[$4]++
}

END {
    # Loop over all the values we saw and print out their fourth columns and the sum of the fifth columns.
    for (val in n) {
        print val ": " v[val] / n[val]
    }
}' $file

Assuming the entries are sorted by the 4th column as in your given data, you could do it like this:

awk '

  $4 != prev {              # if this line's 4th column is different from the previous line
    if (cnt > 0)            # if count of lines is greater than 0
      print prev, sum / cnt #   print the average
    prev = $4               # save previous 4th column
    sum = $5                # initialize sum to column 5
    cnt = 1                 # initialize count to 1
    next                    # go to next line
  }

  {
    sum += $5               # accumulate total of 5th column
    ++cnt                   # increment count of lines
  }

  END {
    if (cnt > 0)             # if count > 0 (avoid divide by 0 on empty file)
      print prev, sum / cnt  #   print the average for the last line
  }

' file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM