简体   繁体   English

Python 忽略具有多个模式匹配的行

[英]Python ignore lines with multiple pattern match

I have a list as below:我有一个列表如下:

Index1_list=['ATTACTCG','TCCGGAGA','CGCTCATT','GAGATTCC','ATTCAGAA']

What I want to do is to save lines if lines only have one of the list elements inside (not two different or three different)我想要做的是保存行,如果行内只有一个列表元素(not two different or three different)

>seq1
NNNNNNNNNNNNNNNNATTACTCGNNNNNNNNNNNGAGATTCCNNNNN
>seq2
NNNNNNNNNNNNNATTACTCGNNNNNNNNNN
>seq3
NNNNNNNNNNNNNGAGATTCCNNNNNNNNNNN

the output should be输出应该是

>seq2
NNNNNNNNNNNNNATTACTCGNNNNNNNNNN
>seq3
NNNNNNNNNNNNNGAGATTCCNNNNNNNNNNN

I used the script below but have not been able to filter out reads with two different matches.我使用了下面的脚本,但无法过滤掉具有两个不同匹配项的读数。

from Bio import SeqIO

Index1_list=['ATTACTCG','TCCGGAGA','CGCTCATT','GAGATTCC','ATTCAGAA']


with open('All.fastq','r') as R1:
    for record in SeqIO.parse(R1,'fasta'):
        for i in Index1_list:
            if i in record.seq:
                sequences = record.format('fasta')
                print(sequences)

Thank you.谢谢你。

You should be able to do what you want by checking how many elements from your list are in the desired string, like this:您应该能够通过检查列表中的元素在所需字符串中的数量来执行您想要的操作,如下所示:

from Bio import SeqIO

Index1_list=['ATTACTCG','TCCGGAGA','CGCTCATT','GAGATTCC','ATTCAGAA']


with open('All.fastq','r') as R1:
    for record in SeqIO.parse(R1,'fasta'):
        count = 0

        for i in Index1_list:
            if i in record.seq:
                count += 1

        if count == 1:
            sequences = record.format('fasta')
            print(sequences)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM