简体   繁体   English

找到一段 DNA 的最长回文 substring

[英]Find longest palindrome substring of a piece of DNA

I have to make a function that prints the longest palindrome substring of a piece of DNA.我必须制作一个 function 来打印一段 DNA 的最长回文 substring。 I already wrote a function that checks whether a piece of DNA is a palindrome itself.我已经写了一个 function 来检查一段 DNA 本身是否是回文。 See the function below.请参阅下面的 function。

def make_complement_strand(DNA):
    complement=[]
    rules_for_complement={"A":"T","T":"A","C":"G","G":"C"}
    for letter in DNA:
        complement.append(rules_for_complement[letter])
    return(complement)

def is_this_a_palindrome(DNA): 
        DNA=list(DNA)
        if DNA!=(make_complement_strand(DNA)[::-1]):     
            print("false")                  
            return False
        else:                             
            print("true")
            return True

is_this_a_palindrome("GGGCCC") 

But now: how to make a function printing the longest palindrome substring of a DNA string?但是现在:如何制作一个 function 打印 DNA 字符串的最长回文 substring?

The meaning of palindrome in the context of genetics is slightly different from the definition used for words and sentences.回文在遗传学背景下的含义与用于单词和句子的定义略有不同。 Since a double helix is formed by two paired strands of nucleotides that run in opposite directions in the 5'- to-3' sense, and the nucleotides always pair in the same way (Adenine (A) with Thymine (T) for DNA, with Uracil (U) for RNA; Cytosine (C) with Guanine (G)), a (single-stranded) nucleotide sequence is said to be a palindrome if it is equal to its reverse complement.由于双螺旋是由两条成对的核苷酸链形成的,它们在 5' 到 3' 方向上以相反的方向运行,并且核苷酸总是以相同的方式配对(对于 DNA,腺嘌呤 (A) 与胸腺嘧啶 (T),与尿嘧啶 (U) 代表 RNA;胞嘧啶 (C) 与鸟嘌呤 (G)),如果(单链)核苷酸序列与其反向互补序列相等,则称为回文序列。 For example, the DNA sequence ACCTAGGT is palindromic because its nucleotide-by-nucleotide complement is TGGATCCA, and reversing the order of the nucleotides in the complement gives the original sequence.例如,DNA 序列 ACCTAGGT 是回文的,因为它的逐个核苷酸补体是 TGGATCCA,并且颠倒补体中核苷酸的顺序可以得到原始序列。

Here, this should be decent starting point for getting longest palindrome substring.在这里,这应该是获得最长回文 substring 的不错起点。

def make_complement_strand(DNA):
    complement=[]
    rules_for_complement={"A":"T","T":"A","C":"G","G":"C"}
    for letter in DNA:
        complement.append(rules_for_complement[letter])
    return(complement)

def is_this_a_palindrome(DNA): 
        DNA=list(DNA)
        if DNA!=(make_complement_strand(DNA)[::-1]):     
            #print("false")                  
            return False
        else:                             
            #print("true")
            return True


def longest_palindrome_ss(org_dna, palindrone_func):
    '''
    Naive implementation-

    We start with 2 pointers.
    i starts at start of current subsqeunce and j starts from i+1 to end
    increment i with every loop

    Uses palindrome function provided by user

    Further improvements- 
    1. Start with longest sequence instead of starting with smallest. i.e. start with i=0 and j=final_i and decrement.
    '''
    longest_palin=""
    i=j=0
    last_i=len(org_dna)
    while i < last_i:
        j=i+1
        while j < last_i:
            current_subsequence = org_dna[i:j+1]
            if palindrone_func(current_subsequence):
                if len(current_subsequence)>len(longest_palin):
                    longest_palin=current_subsequence
            j+=1
        i+=1
    print(org_dna, longest_palin)
    return longest_palin


longest_palindrome_ss("GGGCCC", is_this_a_palindrome)
longest_palindrome_ss("GAGCTT", is_this_a_palindrome)
longest_palindrome_ss("GGAATTCGA", is_this_a_palindrome)

Here are some executions -以下是一些处决——

mahorir@mahorir-Vostro-3446:~/Desktop$ python3 dna_paln.py 
GGGCCC GGGCCC
GAGCTT AGCT
GGAATTCGA GAATTC

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM