[英]Dividing two columns in a file and printing the output in new column to the same file for multiple files
I have a number of files which is in VCF format.That is how it looks like 我有许多VCF格式的文件。这就是它的样子
1 127573 rs7 G A 79.78 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.231;ClippingRankSum=-0.358;DB;DP=5;FS=3.979;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.358;QD=15.96;ReadPosRankSum=1.231 GT:AD:DP:GQ:PL 0/1:2,3:5:27:108,0,27
In which i need to divide the second part of last column and and print the output in new column.. ie, from the above example, its 3 and 5 ( from 10th column 0/1:2,3:5:27:108,0,27) and the output it should look like, That is with 0.6 (ie 3/5) as last column 其中我需要划分最后一列的第二部分,并在新列中打印输出..即,从上面的例子,它的3和5(从第10列0/1:2,3:5:27:108 ,0,27)和它应该看起来的输出,即0.6(即3/5)作为最后一列
1 127573 rs7 G A 79.78 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.231;ClippingRankSum=-0.358;DB;DP=5;FS=3.979;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.358;QD=15.96;ReadPosRankSum=1.231 GT:AD:DP:GQ:PL 0/1:2,3:5:27:108,0,27 0.6
In order to achieve this I used awk in unix, as follows, 为了达到这个目的,我在unix中使用了awk,如下所示,
cat result_1 |cut -f10 | sed 's/:/\t/g' >sample
cat sample | cut -f2 | sed 's/,/\t/g' | awk '$2!=0 || $3!=0{print $1"\t"$2"\t"$2/$3}' >result_1
But it complains as 但它抱怨道
awk: (FILENAME=- FNR=1) fatal: division by zero attempted
any other alternative solutions in Python or Perl would be great..!!! Python或Perl中的任何其他替代解决方案都会很棒.. !!!
awk '{split($NF, a, /[,:]/); $(++NF) = a[3]/a[4]; print}' file
好的,除以零:
awk '{split($NF, a, /[,:]/); $(++NF) = (a[4]==0 ? "Inf" : a[3]/a[4]); print}' file
Here's one perl way of doing it: 这是一种perl方式:
perl -ne 'chomp;if(/\t[^, ]+,(\d+):0*([1-9]\d*)[\S ]*$/){$n=$1;$d=$2;print("$_\t",$n/$d,"\n")}else{print("$_\t\n")}' < result_1 > result_1.new
This will do it. 这样做。 It will ensure a non-0 positive value for the denominator in the match ([1-9]\\d*), and allows for leading zeros with the '0*' in front of it. 它将确保匹配中分母的非0正值([1-9] \\ d *),并允许前面带有'0 *'的前导零。
The chomp removes the hard return ("\\n"), so it's tacked on in the print. chomp删除硬回车(“\\ n”),因此它在打印中被加上。
It ensures you're parsing the last column from the last tab to the end of the string and it allows spaces. 它确保您正在解析从最后一个选项卡到字符串末尾的最后一列,并且它允许空格。
The -n wraps the code in while(){...}. -n将代码包装在while(){...}中。
It adds a tab even if there would have been a division by zero but in that case, leaves the last column empty. 它会添加一个选项卡,即使存在除零,但在这种情况下,将最后一列留空。
You can mv the file afterward if you want to overwrite the original, but I prefer to save precursors as a backup. 如果要覆盖原始文件,可以在之后复制文件,但我更喜欢将前体保存为备份。
There probably exists a more succinct/readable way of doing it in perl or via another language, but this suffices. 在perl中或通过其他语言可能存在更简洁/可读的方式,但这就足够了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.