简体   繁体   中英

How to isolate protein codons from DNA strand Python

Im working with DNA strands and this code is meant to find the initiation codon (codaoi) and one of the 3 stop codons (codaof1, codaof2 or codaof3) and slice the initial DNA strand from this positions.

So and example: XXXATGYYYYYYTAGXXX

With the correct code i would get YYYYYY. But im always getting the else answer "no protein"

def isolarprot(seqDNA):
    codaof1=("TAG")
    codaof2=("TAA")
    codaof3=("TGA")
    codaoi=("ATG")
    i=0
    f=0
    for i in range(0,len(seqDNA),3):
        pi=seqDNA.find(codaoi)
    for f in range(0,len(seqDNA),3):
        if codaof1 in seqDNA[i:(i+3)] and codaoi in seqDNA[i:(i+3)]:
            pf1=seqDNA.find(codaof1)
            prote=slice(pi,pf1+3)
            return seqDNA[prote]
        elif codaof2 in seqDNA[i:(i+3)] and codaoi in seqDNA[i:(i+3)]:
            pf2=seqDNA.find(codaof2)
            prote=slice(pi,pf2+3)
            return seqDNA[prote]
        elif codaof3 in seqDNA[i:(i+3)] and codaoi in seqDNA[i:(i+3)]:
            pf3=seqDNA.find(codaof3)
            prote=slice(pi,pf3+3)
            return seqDNA[prote]
        else:
            return "No protein"

Below a regular expression pattern able to catch multiple occurrences of the DNA section searched for. It uses positive look behind and positive look forward coupled with a lazy quantifier *? to allow finding multiple occurrences:

seqDNA = "XXXATGYYYYYYTAGXXX XXXATGyyyyyTAAXXX ATGvvvvTGA ATGxxxxGTA"
import re
regex = r"(?<=ATG)(.*?)(?=TAG|TAA|TGA)"
# or: 
#    regex = r"ATG(.*?)(?:TAG|TAA|TGA)"
DNAsliceList = re.findall(regex, seqDNA)
print(DNAsliceList)

gives:

['YYYYYY', 'yyyyy', 'vvvv']

Python's regex module provides a way to search for sub-strings within complicated strings. You can find regex testing webpages such as this one

import re
def isolarprot(seqDNA):
    re_pattern = r'ATG(.*)(TAG|TAA|TGA)'
    matches = re.findall(re_pattern, seqDNA)
    return [match[0] for match in matches]
    
dna_str = 'XXXATGYYYYYYTGAXXX'
print(isolarprot(dna_str))

In the sample code above re_pattern is what you are searching for. Within the pattern anything that matches and is in parentheses () will be captured. In this case you want the first capture group which matches anything between the initiation codon and the stop codons (which are captured in the second capture group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM