I would like to print the matching search pattern and then calculate average row. Best would be an expample:
input file:
chr17 41275978 41276294 BRCA1_ex02_01 278
chr17 41275978 41276294 BRCA1_ex02_01 279
chr17 41275978 41276294 BRCA1_ex02_01 280
chr17 41275978 41276294 BRCA1_ex02_02 281
chr17 41275978 41276294 BRCA1_ex02_02 282
chr17 41275978 41276294 BRCA1_ex02_03 283
chr17 41275978 41276294 BRCA1_ex02_03 284
chr17 41275978 41276294 BRCA1_ex02_03 285
chr17 41275978 41276294 BRCA1_ex02_04 286
chr17 41275978 41276294 BRCA1_ex02_04 287
chr17 41275978 41276294 BRCA1_ex02_04 288
I wana extract in bash loop (for example) just the same 4th column:
output1:
chr17 41275978 41276294 BRCA1_ex02_01 278
chr17 41275978 41276294 BRCA1_ex02_01 279
chr17 41275978 41276294 BRCA1_ex02_01 280
output2:
chr17 41275978 41276294 BRCA1_ex02_02 281
chr17 41275978 41276294 BRCA1_ex02_02 282
output3:
chr17 41275978 41276294 BRCA1_ex02_03 283
chr17 41275978 41276294 BRCA1_ex02_03 284
chr17 41275978 41276294 BRCA1_ex02_03 285
an so on so on.. Then calculate average for 5th column is very easy:
awk 'END{sum+=$5}{print NR/sum}' in_file.txt
In my case, there are thousands lines BRCA1_exXX_XX - so any idea hot to split it?
Paul.
I think this will do what you want.
awk '{
# Keep running sum of fifth column based on value of fourth column.
v[$4]+=$5;
# Keep count of lines with similar fourth column values.
n[$4]++
}
END {
# Loop over all the values we saw and print out their fourth columns and the sum of the fifth columns.
for (val in n) {
print val ": " v[val] / n[val]
}
}' $file
Assuming the entries are sorted by the 4th column as in your given data, you could do it like this:
awk '
$4 != prev { # if this line's 4th column is different from the previous line
if (cnt > 0) # if count of lines is greater than 0
print prev, sum / cnt # print the average
prev = $4 # save previous 4th column
sum = $5 # initialize sum to column 5
cnt = 1 # initialize count to 1
next # go to next line
}
{
sum += $5 # accumulate total of 5th column
++cnt # increment count of lines
}
END {
if (cnt > 0) # if count > 0 (avoid divide by 0 on empty file)
print prev, sum / cnt # print the average for the last line
}
' file
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.