Python / Biopython。用蛋白质序列解析文件后，获取匹配单词的序列枚举列表

Question

In Python/Biopython, I am trying to get an enumerated list of protein sequences that match the string "Human adenovirus". 在Python / Biopython中，我试图获取与字符串“人类腺病毒”匹配的蛋白质序列的枚举列表。 The problem with the code below is that I get the enumeration of the sequences to be parsed, but not of those which pass the if loop filter. 以下代码的问题是我得到了要解析的序列的枚举，但没有通过if循环过滤器的序列的枚举。

EDITED CODE with proper syntax: 具有正确语法的已编辑代码：

from Bio import SeqIO
import sys  
sys.stdout = open("out_file.txt","w")

for index, seq_record in enumerate(SeqIO.parse("in_file.txt", "fasta")):
    if "Human adenovirus" in seq_record.description:

        print "%i]" % index, str(seq_record.description) 
        print str(seq_record.seq) + "\n"

This is a piece of the input file: 这是输入文件的一部分：

>gi|927348286|gb|ALE15299.1| penton [Bottlenose dolphin adenovirus 1]
MQRPQQTPPPPYESVVEPLYVPSRYLAPSEGRNSIRYSQLPPLYD

>gi|15485528|emb|CAC67483.1| penton [Human adenovirus 2]
MQRAAMYEEGPPPSYESVVSAAPVAAALGSPFDAPLDPPFVPPRYLRPTGGRNSIRYSELAPLFDTTRVY
LVDNKSTDVASLNYQNDHSNFLTTVIQNNDY

>gi|1194445857|dbj|BAX56610.1| fiber, partial [Human mastadenovirus C]
FNPVYPYDTETGPPTVPFLTPPFVSPNG

The output file I get looks like this: 我得到的输出文件如下所示：

2] gi|15485528|emb|CAC67483.1| penton [Human adenovirus 2]
MQRAAMYEEGPPPSYESVVSAAPVAAALGSPFDAPLDPPFVPPRYLRPTGGRNSIRYSELAPLFDTTRVY
LVDNKSTDVASLNYQNDHSNFLTTVIQNNDY

I would like the first sequence that pass the filter to get the enumeration starting with 1], not with 2] as it is shown before. 我希望第一个通过过滤器的序列以1]而不是2]开头的枚举，如前所示。 I know I need to somehow add a counter after the if loop, but I have tried many alternatives and I do not get the desired output. 我知道我需要在if循环后添加一个计数器，但是我尝试了许多替代方法，但未获得所需的输出。 This should not be difficult, I know how to do it in Perl but not with Python/Biopython. 这应该不难，我知道如何在Perl中做到这一点，但不支持Python / Biopython。

Answer 1

The issue is that you only want to increment the index if the description contains "Human adenovirus", but you are enumerating everything. 问题是，如果描述中包含“人类腺病毒”，则只想增加索引，但是您正在枚举所有内容。

If we modify your code sample to only increment the index when a match is found, we get this: 如果我们修改您的代码示例以仅在找到匹配项时才增加索引，则可以得到以下信息：

from Bio import SeqIO
index = 0
with open("out_file.txt","w") as f:
    for seq_record in SeqIO.parse("in_file.txt", "fasta"):
        if "Human adenovirus" in seq_record.description:
            index += 1
            print "%i]" % index, str(seq_record.description) 
            print str(seq_record.seq) + "\n"

Btw, why are you opening a file for writing, but never writing to it? 顺便说一句，为什么您要打开一个文件进行写入，却从未写入？

Python / Biopython。用蛋白质序列解析文件后，获取匹配单词的序列枚举列表

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-08-31 15:29:53

Python / Biopython。 用蛋白质序列解析文件后，获取匹配单词的序列枚举列表

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-08-31 15:29:53

Python / Biopython。用蛋白质序列解析文件后，获取匹配单词的序列枚举列表

解决方案1
2 已采纳 2017-08-31 15:29:53