简体   繁体   中英

How to valid the DNA sequence in python?

Here is the question.

Write a script that get the input of dna sequence from user and validate the input is a dna sequence. If the input is a dna sequence:-

  1. calculate the length of sequence
  2. calculate the percentage of each nucleotide
  3. calculate the GC percent of dna
  4. get the rna sequence

This output must be written into an external file.

If the input is NOT a dna sequence, an error message will be displayed and ask user whether to enter the correct dna sequence OR the program will be terminated.

Besides that, you are required to show function concept in the script.


Below is my script,

def main():
seq = input("enter dna sequence: ").casefold()

for letter in seq:
    if letter not in "atgc":
        answer = input("This is not dna.\nEnter 'Y' to enter a dna sequence or 'N' to terminate the program: ")

    if answer == "Y":
        answer = input("Plase enter: ")

    elif answer == "N":
        print("Program is terminated")
        break
    else:
        print("Please enter Y or N")
else:
    if letter in "acgt":
    caps = seq.casefold()
    length = len(seq)
    fo = open("aa.txt", "w+")
    fo.write("Sequence: " + seq)
    fo.write("\nLength of sequence: " + str(length))
    fo.write("\nPercentage of nucleotides:- " + "\n")

    accepted_bases = ('a', 'c', 'g', 't')

    for bases in accepted_bases:
        count = caps.count(bases)
        content = round(((count / length) * 100), 2)
        fo.write(str(bases) + "=" + str(content) + "%" + "\n")

    GC_Count = 0

    for letter in seq:
        if (letter == 'g' or letter == 'c'):
             GC_Count += 1

     GC = round((float(str(GC_Count)) / float(str(length)) * 100), 2)
     fo.write("GC percentage: " + str(GC) + "%" + "\n")

     RNA = seq.replace('t', 'u')
     fo.write("The rna sequence: " + RNA)
     fo.close()

if __name__ == "__main__":
    main()

I basically want the script to check the user's input whether it is a DNA sequence. If yes, it will print the output file. If no, it will tell the user about the input sequence is not correct, enter 'Y' to enter a dna sequence (if it is dna sequence, then the output file will be print out.) and enter 'N' to terminate. But if user enter letter other than 'Y' and 'N', it will print ("Please enter Y or N").

But my problem is, the python show my input is not dna even though the input dna is correct when second time testing. I didn't call python to stop the program but it stop automatically when the enter the dna at 2nd time and when enter the letter other than Y and N.

Anyone knows what's wrong here? Look forward to your reply, Thank you.

I don't know anything about DNA, but I think this might work.

# dna = "gcacgctcccagcgatgctctctcagccctcacgggtcatctgaaataatcatattaccccacacaactggcctttgttctgatacatgcatttcgtcttaagcttagtaatcgtcgtattgacgaggaacgaaagttttaagtttttagatcgtattgtaacacgtccatgtgctaaagaacactgtgcgtttcccggatgactcgtgcaccgacattgagtccagctcgaatgacccccgacgctcctggatttcgcgttctcactcgattcccgctgatgaccgacgcgggaaaccattgtctcacgcagaagtccgatcccatatagagcgaaagtctctcagtctcatgactgagcaacattggcggcgaggaccgttggcccttctcgtgtacatcagacgcgcaacttccaatcttgtgcttccaatacatcgaagaaagtctatgatatagcagagaactggcctgtttgtcacttgcgcagaagggggcgtcaaactggaatgtcaacataacgccagtatctctaattttactcgacttcggtaacgcatcatgctacaggatcagttcatcctggagaaagctgtgacaatattcttactagcgcgcggaaggggggggtaactgacaggctgggtatgctgacgggggcgatcccaaatcgaaaactgcccttcccctcgcaacatgagaacaaaaattttgtaagtgaaaagccccctgaaacgtttcgccttgactctcttgagccccggggttttaatacataccccatctgattcgttctagtgctcaccaacactgctacatgatcataggttatatgtggtgcgcccttcgccaatgggcaccaagaaacctactgcgtaaaccaaccttggccgtcggcgaagcttctaagcactgtgtctcgcgaaagagagtaggacgccacctcggcatcaatgtagtacttatgtcggcacccgcatgcgtggtggtcgccctatcg"


def main():
    while True:
        seq = input("Enter DNA sequence: ")

        if 'atgc' in seq:
            caps = seq.casefold()
            length = len(seq)
            fo = open("aa.txt", "w+")
            fo.write("Sequence: " + seq)
            fo.write("\nLength of sequence: " + str(length))
            fo.write("\nPercentage of nucleotides:- " + "\n")

            accepted_bases = ('a', 'c', 'g', 't')

            for bases in accepted_bases:
                count = caps.count(bases)
                content = round(((count / length) * 100), 2)
                fo.write(str(bases) + "=" + str(content) + "%" + "\n")

            GC_Count = 0

            for letter in seq:
                if (letter == 'g' or letter == 'c'):
                    GC_Count += 1

            GC = round((float(str(GC_Count)) / float(str(length)) * 100), 2)
            fo.write("GC percentage: " + str(GC) + "%" + "\n")

            RNA = seq.replace('t', 'u')
            fo.write("The rna sequence: " + RNA)
            fo.close()
            print("[+] File created!")

        else:
            print("[-] This is not a DNA sequence.")
        
        again = input("Try another sequence? Y/N: ").capitalize().strip()
        if again == 'Y':
            continue
        else:
            break

if __name__ == "__main__":
    main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM