简体   繁体   English

如何验证 python 中的 DNA 序列?

[英]How to valid the DNA sequence in python?

Here is the question.这是问题。

Write a script that get the input of dna sequence from user and validate the input is a dna sequence.编写一个脚本,从用户那里获取 dna 序列的输入并验证输入是否为 dna 序列。 If the input is a dna sequence:-如果输入是 dna 序列:-

  1. calculate the length of sequence计算序列的长度
  2. calculate the percentage of each nucleotide计算每个核苷酸的百分比
  3. calculate the GC percent of dna计算 dna 的 GC 百分比
  4. get the rna sequence获取 rna 序列

This output must be written into an external file.这个 output 必须写入外部文件。

If the input is NOT a dna sequence, an error message will be displayed and ask user whether to enter the correct dna sequence OR the program will be terminated.如果输入不是 dna 序列,将显示错误消息并询问用户是否输入正确的 dna 序列或程序将被终止。

Besides that, you are required to show function concept in the script.除此之外,您还需要在脚本中显示 function 概念。


Below is my script,下面是我的脚本,

def main():
seq = input("enter dna sequence: ").casefold()

for letter in seq:
    if letter not in "atgc":
        answer = input("This is not dna.\nEnter 'Y' to enter a dna sequence or 'N' to terminate the program: ")

    if answer == "Y":
        answer = input("Plase enter: ")

    elif answer == "N":
        print("Program is terminated")
        break
    else:
        print("Please enter Y or N")
else:
    if letter in "acgt":
    caps = seq.casefold()
    length = len(seq)
    fo = open("aa.txt", "w+")
    fo.write("Sequence: " + seq)
    fo.write("\nLength of sequence: " + str(length))
    fo.write("\nPercentage of nucleotides:- " + "\n")

    accepted_bases = ('a', 'c', 'g', 't')

    for bases in accepted_bases:
        count = caps.count(bases)
        content = round(((count / length) * 100), 2)
        fo.write(str(bases) + "=" + str(content) + "%" + "\n")

    GC_Count = 0

    for letter in seq:
        if (letter == 'g' or letter == 'c'):
             GC_Count += 1

     GC = round((float(str(GC_Count)) / float(str(length)) * 100), 2)
     fo.write("GC percentage: " + str(GC) + "%" + "\n")

     RNA = seq.replace('t', 'u')
     fo.write("The rna sequence: " + RNA)
     fo.close()

if __name__ == "__main__":
    main()

I basically want the script to check the user's input whether it is a DNA sequence.我基本上希望脚本检查用户的输入是否是 DNA 序列。 If yes, it will print the output file.如果是,它将打印 output 文件。 If no, it will tell the user about the input sequence is not correct, enter 'Y' to enter a dna sequence (if it is dna sequence, then the output file will be print out.) and enter 'N' to terminate.如果不是,它会告诉用户输入的序列不正确,输入'Y'输入一个dna序列(如果是dna序列,则打印出output文件。)并输入'N'终止。 But if user enter letter other than 'Y' and 'N', it will print ("Please enter Y or N").但是如果用户输入的字母不是'Y'和'N',它会打印(“请输入Y或N”)。

But my problem is, the python show my input is not dna even though the input dna is correct when second time testing.但我的问题是,python 显示我的输入不是 dna,即使在第二次测试时输入的 dna 是正确的。 I didn't call python to stop the program but it stop automatically when the enter the dna at 2nd time and when enter the letter other than Y and N.我没有调用 python 来停止程序,但是当第二次输入 dna 和输入除 Y 和 N 以外的字母时它会自动停止。

Anyone knows what's wrong here?有人知道这里有什么问题吗? Look forward to your reply, Thank you.期待您的回复,谢谢。

I don't know anything about DNA, but I think this might work.我对DNA一无所知,但我认为这可能有效。

# dna = "gcacgctcccagcgatgctctctcagccctcacgggtcatctgaaataatcatattaccccacacaactggcctttgttctgatacatgcatttcgtcttaagcttagtaatcgtcgtattgacgaggaacgaaagttttaagtttttagatcgtattgtaacacgtccatgtgctaaagaacactgtgcgtttcccggatgactcgtgcaccgacattgagtccagctcgaatgacccccgacgctcctggatttcgcgttctcactcgattcccgctgatgaccgacgcgggaaaccattgtctcacgcagaagtccgatcccatatagagcgaaagtctctcagtctcatgactgagcaacattggcggcgaggaccgttggcccttctcgtgtacatcagacgcgcaacttccaatcttgtgcttccaatacatcgaagaaagtctatgatatagcagagaactggcctgtttgtcacttgcgcagaagggggcgtcaaactggaatgtcaacataacgccagtatctctaattttactcgacttcggtaacgcatcatgctacaggatcagttcatcctggagaaagctgtgacaatattcttactagcgcgcggaaggggggggtaactgacaggctgggtatgctgacgggggcgatcccaaatcgaaaactgcccttcccctcgcaacatgagaacaaaaattttgtaagtgaaaagccccctgaaacgtttcgccttgactctcttgagccccggggttttaatacataccccatctgattcgttctagtgctcaccaacactgctacatgatcataggttatatgtggtgcgcccttcgccaatgggcaccaagaaacctactgcgtaaaccaaccttggccgtcggcgaagcttctaagcactgtgtctcgcgaaagagagtaggacgccacctcggcatcaatgtagtacttatgtcggcacccgcatgcgtggtggtcgccctatcg"


def main():
    while True:
        seq = input("Enter DNA sequence: ")

        if 'atgc' in seq:
            caps = seq.casefold()
            length = len(seq)
            fo = open("aa.txt", "w+")
            fo.write("Sequence: " + seq)
            fo.write("\nLength of sequence: " + str(length))
            fo.write("\nPercentage of nucleotides:- " + "\n")

            accepted_bases = ('a', 'c', 'g', 't')

            for bases in accepted_bases:
                count = caps.count(bases)
                content = round(((count / length) * 100), 2)
                fo.write(str(bases) + "=" + str(content) + "%" + "\n")

            GC_Count = 0

            for letter in seq:
                if (letter == 'g' or letter == 'c'):
                    GC_Count += 1

            GC = round((float(str(GC_Count)) / float(str(length)) * 100), 2)
            fo.write("GC percentage: " + str(GC) + "%" + "\n")

            RNA = seq.replace('t', 'u')
            fo.write("The rna sequence: " + RNA)
            fo.close()
            print("[+] File created!")

        else:
            print("[-] This is not a DNA sequence.")
        
        again = input("Try another sequence? Y/N: ").capitalize().strip()
        if again == 'Y':
            continue
        else:
            break

if __name__ == "__main__":
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM