简体   繁体   中英

Python finding the longest ORF

Can someone show me a straightforward solution for how to calculate the longest open reading frame (ORF) > 30bp in length in a DNA sequence? ATG is the start codon (ie, the beginning of an ORF) and TAG, TGA, and TAA are stop codons (ie, the end of an ORF). Without the use of BioPython.

This regex might be able to do the job:

ATG(...){30,}(TAG|TGA|TAA)

(...) is a three letter codon, that is matched 30 or more times with {30,} and stops whenever it finds one of (TAG|TGA|TAA) .

This regex can help you find all ORF and now you just have to find the longest which should be trivial.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM