简体   繁体   English

如何使用 Python3 中的 biopython 库从 TB 测序 fasta 文件中找到反向基因(如 pncA)的突变?

[英]How to find Mutations for a reverse oriented gene(like pncA) from TB sequencing fasta file using biopython library in Python3?

To find a mutation like for S104R(from 2288681 to 2289241 for pyrazinamide), we have to first remove '-'(for stripping insertion/deletions if/any present in fasta file), then take reverse complement of it and then look for the particular mutation assigned with the codon number(here is 104).要找到类似 S104R 的突变(吡嗪酰胺从 2288681 到 2289241),我们必须首先删除“-”(用于在 fasta 文件中删除插入/删除),然后对其进行反向补码,然后查找指定密码子编号的特定突变(这里是 104)。 And I have found the answer using basic string functions but wanted more clean and simple if it is possible with biopython library.我已经使用基本的字符串函数找到了答案,但如果可以使用 biopython 库,我希望更简洁。

So the following code works fine for me:所以下面的代码对我来说很好:

from Bio import SeqIO
sample_file=SeqIO.parse('fasta_file_location', 'fasta') // there are two items in sample_file(reference and patient sequence)

ref=str(sample_file[0].seq).replace('-','')[2288681:2289241].replace('A', 't').replace('T', 'a').replace('C', 'g').replace('G', 'c')[::-1].upper()[(104-1)*3:(104-1)*3+3]
pat=str(sample_file[1].seq).replace('-','')[2288681:2289241].replace('A', 't').replace('T', 'a').replace('C', 'g').replace('G', 'c')[::-1].upper()[(104-1)*3:(104-1)*3+3]

print("ref: ",ref, "pat: ", pat)  // output-> ref: AGC, pat: CGG

but the below code is not working for me:但下面的代码对我不起作用:

ref=sample_file[0].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]
pat=sample_file[1].seq.strip("-")[2288681:2289241].reverse_complement()[(104-1)*3:(104-1)*3+3]

Its good to have more convenient approach as the latter one uses biopython functions, so please help if you know how to make it better.由于后者使用了 biopython 函数,因此有更方便的方法很好,所以如果您知道如何使它更好,请提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM