简体   繁体   English

如何修改vcf文件中的个别代码

[英]how to modify individual codes in a vcf file

I have genotypes of over 20k individuals in a vcf file got after imputation.我在插补后获得的 vcf 文件中有超过 20k 个人的基因型。 I'll give you an example of the aspect of this vcf file, with only 7 samples:我会给你一个这个 vcf 文件的方面的例子,只有 7 个样本:

#CHROM   POS       ID            REF   ALT    QUAL    FILTER     FORMAT      INFO    0_0_473294.CEL      0_0_347293_v2.CEL       0_0_9588393_RS.CEL        0_0_999444_rp.CEL       0_0_26:9494949.CEL     0_0_237485_RS_rp.CEL    0_0_27:484848.CEL
16       11781     rs549521730    G     C       .       PASS    IMPUTED       GP                  

So, starting from column 10, genotypes of individuals start.因此,从第 10 列开始,个体的基因型开始。 Now, I need to modify individual code of this vcf file, so as to have a vcf file with the following aspect:现在,我需要修改这个 vcf 文件的个别代码,以便拥有一个具有以下方面的 vcf 文件:

#CHROM   POS       ID            REF   ALT    QUAL    FILTER     FORMAT      INFO    473294     347293       9588393        999444       9494949     237485     484848
 16     11781     rs549521730    G     C       .       PASS    IMPUTED       GP                  

Therefore, I need only serial numbers, without the flanking stuff, like.CEL, _RS, 26:, and so on.因此,我只需要序列号,不需要侧翼的东西,如.CEL、_RS、26: 等等。

Do you know a tool, like bcftools, being able to re-annotate sample codes of a vcf file?你知道像 bcftools 这样的工具能够重新注释 vcf 文件的示例代码吗? Or is it possible to do it in bash?或者可以在 bash 中完成吗? Thank you!谢谢你!

If I'm reading your question correctly it looks like you just want to change the column names?如果我没看错您的问题,您似乎只想更改列名?

It looks like there are a lot of different formats to the column sample names;看起来列样本名称有很多不同的格式; How you go about converting those to just the number you want will depend on the specifics but will probably involve regex. go 如何将这些转换为您想要的数字将取决于具体情况,但可能会涉及正则表达式。 I'm not sure your example has enough info to answer that part.我不确定您的示例是否有足够的信息来回答该部分。

I'd recommend something like making a single-line header text file ( header.txt ), making a new vcf file from it ( output.vcf ), and appending all but the header line of the input vcf file ( input.vcf ) to the new file.我建议像制作一个单行 header 文本文件( header.txt ),从它制作一个新的 vcf 文件( output.vcf ),并附加除了输入 vcf 文件( input.vcf )的 header 行之外的所有内容到新文件。

cp header.txt output.vcf
tail -n +2 input.vcf >> output.vcf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM