Im currently working on a study project where I am to make a python program where I can enter a DNA sequence, get all the possible reading frames from it and then find any open reading frame. I can't use Biopython as we are to do this ourselves.
From the code I have written, I will get an output in the style of: ["TGC", "ATG", "ATA", "TGG", "AGG", "AGG", "CCG", TAA", "TAG", "TGA"]
What I want to do now is to define the start codon as "ATG" and get its index as well as define stop codons as ["TAA", "TAG", "TGA"]
and if any of these three are found, the index of the first found is reported and the rest is ignored. And if no stop codon is found to return some string.
In addition to this, i want to be able to compare the "lengts" of up to 6 different input in the style mentioned above and chone the one which is the longest.
This is my first time posting here so appologies if the question is not well phrased and thanks for any help!
I am not completely sure if this is what you want, but to find the first occurrence of strings in a longer string you can do like this for example
s = "This is a long string. This is the second sentence."
short_strings = ["his", "is", "sec", "dummy"]
first_occurrence = [s.find(short) for short in short_strings]
print(first_occurrence)
which will produce the output
[1, 2, 35, -1]
Note that you get -1
for things that don't match.
If you want to find the first occurrence in a list for every element in another list you can do the following
a_list = ["ACA", "ATG", "CGC", "ATA", "TAT", "TAA", "TAG", "TGA", "ATG"]
b_list = ["ATG", "TAA", "AAA"]
x = {
b : next(a_index for a_index, a in enumerate(a_list) if a == b)
for b in b_list
if b in a_list
}
print(x)
which produces the output
{'ATG': 1, 'TAA': 5}
If you want a solution that goes through the list a_list
fewer times you could rely more on list generators like the following example
gen = ((a_index, a) for a_index, a in enumerate(a_list) if a in b_list)
for elem in gen:
b_list.remove(elem[1])
print(elem)
This will report matches as it finds it, not caring about which element in b_list
it finds first. You can modify the print statement to whatever functionality you want, but you must keep the remove
statement since otherwise the generator will find more than one match.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.