简体   繁体   中英

How to find positions of the last occurrence of a pattern in a string, and use these to extract a substring from another string

I need some help with a specific problem, which I cannot seem to find on this website. I have a result which looks something like this:

result = "ooooooooooooooooooooooMMMMMMooooooooooooooooooMMMMMMooooooooooMMMMMMMMoo"

This is a transmembrane prediction. So for this string, I have another string of the same length, but is an amino acid code, for example:

amino_acid_code = "MSDENKSTPIVKASDITDKLKEDILTISKDALDKNTWHVIVGKNFGSYVTHEKGHFVYFYIGPLAFLVFKTA"

I want to do some research on the last "M" region. This can vary in length, as well as the "o" that comes after. So in this case I need to extract "PLAFLVFK" from the last string, which corresponds to the last "M" region.

I have something like this already, but I cannot figure out how to obtain the start position, and I also believe a simpler (or computationally better) solution is possible.

end = result.rfind('M')
start = ?
region_I_need = amino_acid_code[start:end]

Thanks in advance

To also find the start position, use rfind again after slicing off the characters after the end of the result string:

result = "ooooooooooooooooooooooMMMMMMooooooooooooooooooMMMMMMooooooooooMMMMMMMMoo"
amino_acid_code = "MSDENKSTPIVKASDITDKLKEDILTISKDALDKNTWHVIVGKNFGSYVTHEKGHFVYFYIGPLAFLVFKTA"

# add 1 to the indices to get the correct positions
end = result.rfind('M') + 1
start = result[:end].rfind('o') + 1
region_I_need = amino_acid_code[start:end]

print(start, end)
print(amino_acid_code[start:end])
>>> 62 70
>>> PLAFLVFK

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM