简体   繁体   English

在文件中划分两列并将新列中的输出打印到多个文件的同一文件中

[英]Dividing two columns in a file and printing the output in new column to the same file for multiple files

I have a number of files which is in VCF format.That is how it looks like 我有许多VCF格式的文件。这就是它的样子

1   127573  rs7 G   A   79.78   .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.231;ClippingRankSum=-0.358;DB;DP=5;FS=3.979;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.358;QD=15.96;ReadPosRankSum=1.231  GT:AD:DP:GQ:PL  0/1:2,3:5:27:108,0,27

In which i need to divide the second part of last column and and print the output in new column.. ie, from the above example, its 3 and 5 ( from 10th column 0/1:2,3:5:27:108,0,27) and the output it should look like, That is with 0.6 (ie 3/5) as last column 其中我需要划分最后一列的第二部分,并在新列中打印输出..即,从上面的例子,它的3和5(从第10列0/1:2,3:5:27:108 ,0,27)和它应该看起来的输出,即0.6(即3/5)作为最后一列

 1  127573  rs7 G   A   79.78   .   AC=1;AF=0.500;AN=2;BaseQRankSum=1.231;ClippingRankSum=-0.358;DB;DP=5;FS=3.979;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.358;QD=15.96;ReadPosRankSum=1.231  GT:AD:DP:GQ:PL  0/1:2,3:5:27:108,0,27 0.6

In order to achieve this I used awk in unix, as follows, 为了达到这个目的,我在unix中使用了awk,如下所示,

cat result_1 |cut -f10 | sed 's/:/\t/g' >sample
cat sample | cut -f2 | sed 's/,/\t/g' | awk '$2!=0 || $3!=0{print $1"\t"$2"\t"$2/$3}' >result_1 

But it complains as 但它抱怨道

awk: (FILENAME=- FNR=1) fatal: division by zero attempted

any other alternative solutions in Python or Perl would be great..!!! Python或Perl中的任何其他替代解决方案都会很棒.. !!!

awk '{split($NF, a, /[,:]/); $(++NF) = a[3]/a[4]; print}' file

好的,除以零:

awk '{split($NF, a, /[,:]/); $(++NF) = (a[4]==0 ? "Inf" : a[3]/a[4]); print}' file

Here's one perl way of doing it: 这是一种perl方式:

perl -ne 'chomp;if(/\t[^, ]+,(\d+):0*([1-9]\d*)[\S ]*$/){$n=$1;$d=$2;print("$_\t",$n/$d,"\n")}else{print("$_\t\n")}' < result_1 > result_1.new

This will do it. 这样做。 It will ensure a non-0 positive value for the denominator in the match ([1-9]\\d*), and allows for leading zeros with the '0*' in front of it. 它将确保匹配中分母的非0正值([1-9] \\ d *),并允许前面带有'0 *'的前导零。

The chomp removes the hard return ("\\n"), so it's tacked on in the print. chomp删除硬回车(“\\ n”),因此它在打印中被加上。

It ensures you're parsing the last column from the last tab to the end of the string and it allows spaces. 它确保您正在解析从最后一个选项卡到字符串末尾的最后一列,并且它允许空格。

The -n wraps the code in while(){...}. -n将代码包装在while(){...}中。

It adds a tab even if there would have been a division by zero but in that case, leaves the last column empty. 它会添加一个选项卡,即使存在除零,但在这种情况下,将最后一列留空。

You can mv the file afterward if you want to overwrite the original, but I prefer to save precursors as a backup. 如果要覆盖原始文件,可以在之后复制文件,但我更喜欢将前体保存为备份。

There probably exists a more succinct/readable way of doing it in perl or via another language, but this suffices. 在perl中或通过其他语言可能存在更简洁/可读的方式,但这就足够了。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较两个csv文件的多列并将输出保存为匹配/不匹配的新csv文件 - Comparing multiple columns of two csv files and save output as matching/not matching in new csv file Pandas 将 pivot 表和 output 表中的两列划分为新列(多索引) - Pandas dividing two columns in pivot table and output to a new column (multi indexed) 将 csv 文件中的两列数据一起添加到 python 中同一 csv 文件中的新列中 - Adding two columns of data together from a csv file into a new column in the same csv file in python 如何通过在Graphlab SFrame中划分两列来创建新列? - How to create a new column by dividing two columns in Graphlab SFrame? 如果两个文件中的列匹配,则打印出文件 - printing out file if columns from two files match 从多个文件中选择两列并希望将这些列合并到一个文件中 - selecting two columns from multiple files and want to merge those column - column wise into one file 如何比较两个文件并使用python输出新文件中的相同内容 - How do I compare two files and output what is the same in a new file with python 将文本文件分为两个不重叠的文件 - Dividing a text file into two files with non-overlapping entries 将一个大文件分为几个较小的随机写入多个文件 - Dividing a large file into several smaller, stochastic writing to multiple files 以整洁有序的格式打印新文件的输出 - Printing the output of a new file in a clean and orderly format
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM