I have a dataset with 2 different row identifiers, I would like to get the ratio between 2 separate columns using the 2 different row identifiers and output into a separate file.
For example:
Input
Avpr1a CG 1 30
Avpr1a CHG 2 15
Avpr1a CHH 1 15
Avpr1a CG 2 25
Avpr1a CHG 5 15
Avpr1a CHH 8 15
BDNF CG 1 15
BDNF CHG 2 15
BDNF CHH 3 10
BDNF CG 8 20
What i want is based on column $1,$2 ,get the ratio of sum of $3/sum of $4 to obtain the following (for ex. AVPR1a CG 3/55 = 0.05)
Output
Avpr1a CG 0.05
Avpr1a CHG 0.233
Avpr1a CHH 0.3
BDNF CG 0.xxx
BDNF CHG 0.xxx
BDNF CHH 0.xx
You get the idea.
I am currently doing it really stupidly by separately summing the columns, merge and divide
awk '{a[$1,$2]+=$3}END{for(i in a){print i, a[i]}}'
awk '{a[$1,$2]+=$4}END{for(i in a){print i, a[i]}}'
merge
awk and print $3/$4 from intermediate files
Is it possible to achieve what I want to do in a single awk command?
Thank you!
Yes, it is even fairly easy:
awk '{s1[$1,$2] = $1; s2[$1,$2] = $2; s3[$1,$2] += $3; s4[$1,$2] += $4}
END { for (i in s3) print s1[i], s2[i], s3[i]/s4[i] }' data
Output:
Avpr1a CG 0.0545455
BDNF CHG 0.133333
BDNF CHH 0.3
Avpr1a CHG 0.233333
BDNF CG 0.257143
Avpr1a CHH 0.3
If you don't capture the separate items in s1
and s2
but print i
instead, you get output with the \\034
character separating the two name files. You can fix that, with tr
for example, but it is simpler not to need to do so.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.