Using python 2.7 and regex to find substring using beginning and end of substring (codons)

Question

(python 2.7) I have a RNA sequence and I am trying to find all the non-overlaping substrings that start with 'AUG' and end in either 'UAG' or 'UGA' or 'UAA' this is what I'm working with:

import re
sequence = GAUGCAAAAUAAAUGAUGUAAUAA
search = r"^(AUG(.)*(?:UAG|UAA|UGA))" 
regions = re.findall(search, sequence)
print regions

The output should be "AUGCAAAA" and "AUGAUG". However I am getting the entire region 'AUGCAAAAUAAAUGAUGUAAUAA'

Answer 1

Looks like you need to use

AUG.*?(?=UAG|UAA|UGA)

See this regex demo

Details :

AUG - match AUG
.*? - any 0+ chars other than line break chars as few as possible up to the first...
(?=UAG|UAA|UGA) - UAG or UAA or UGA (that are not part of the return value since the pattern is inside a positive lookahead that is a zero-width assertion).

Using python 2.7 and regex to find substring using beginning and end of substring (codons)

Question

1 answers

solution1
2 2017-01-27 20:53:10

Using python 2.7 and regex to find substring using beginning and end of substring (codons)

Question

1 answers

solution1 2 2017-01-27 20:53:10

solution1
2 2017-01-27 20:53:10