简体   繁体   中英

Finding the first occurence of several different strings in a longer string

Im currently working on a study project where I am to make a python program where I can enter a DNA sequence, get all the possible reading frames from it and then find any open reading frame. I can't use Biopython as we are to do this ourselves.

From the code I have written, I will get an output in the style of: ["TGC", "ATG", "ATA", "TGG", "AGG", "AGG", "CCG", TAA", "TAG", "TGA"]

What I want to do now is to define the start codon as "ATG" and get its index as well as define stop codons as ["TAA", "TAG", "TGA"] and if any of these three are found, the index of the first found is reported and the rest is ignored. And if no stop codon is found to return some string.

In addition to this, i want to be able to compare the "lengts" of up to 6 different input in the style mentioned above and chone the one which is the longest.

This is my first time posting here so appologies if the question is not well phrased and thanks for any help!

Strings in longer string

I am not completely sure if this is what you want, but to find the first occurrence of strings in a longer string you can do like this for example

s = "This is a long string. This is the second sentence."
short_strings = ["his", "is", "sec", "dummy"]
first_occurrence = [s.find(short) for short in short_strings]

print(first_occurrence)

which will produce the output

[1, 2, 35, -1]

Note that you get -1 for things that don't match.

Strings in list of strings

If you want to find the first occurrence in a list for every element in another list you can do the following

a_list = ["ACA", "ATG", "CGC", "ATA", "TAT", "TAA", "TAG", "TGA", "ATG"] 
b_list = ["ATG", "TAA", "AAA"]

x = {
    b : next(a_index for a_index, a in enumerate(a_list) if a == b) 
    for b in b_list 
    if b in a_list
}

print(x)

which produces the output

{'ATG': 1, 'TAA': 5}

Alternative

If you want a solution that goes through the list a_list fewer times you could rely more on list generators like the following example

gen = ((a_index, a) for a_index, a in enumerate(a_list) if a in b_list) 

for elem in gen:
    b_list.remove(elem[1])
    print(elem)

This will report matches as it finds it, not caring about which element in b_list it finds first. You can modify the print statement to whatever functionality you want, but you must keep the remove statement since otherwise the generator will find more than one match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM