简体   繁体   中英

Finding ATG in an mRNA sequence using Python

I know nothing about python and I've tried to piece together information from various thread to complete an assignment but I still can't crack it.

Here is the assignment:

Instructions a) Download the sequence for RAI1 mRNA NM_030665 , and use Python to count the number of ATG subsequences, using:

countATG = seq.count('ATG'). 

For example, for SREBF1 NM_001005291.2 , the answer is 45 .

I am NOT looking for the answer to the question. I genuinely want to learn more about python and would REALLY appreciate it if someone could tell me how to go about completing this problem. I have the sequence saved to my desktop as a .txt file, but I don't know how to specify that seq1 should equal the data file (if that makes sense). Yes, I could Ctrl+F the sequence on NCBI, but I want to learn how to use python.

Thank you!!

Here you go:

filepath = '/path/to/file.txt'

with open(filepath) as infile:
    seq = infile.readlines()

# This will bring in the sequence, but if its split up on multiple lines
# (like if its cut off at every 50 bp), then you'll want to piece it back
# together, so you don't miss any ATG's.

seq = ''.join([line.strip() for line in seq.split()])

ATG_count = seq.count('ATG')

print(ATG_count)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM