(python 2.7) I have a RNA sequence and I am trying to find all the non-overlaping substrings that start with 'AUG' and end in either 'UAG' or 'UGA' or 'UAA' this is what I'm working with:
import re
sequence = GAUGCAAAAUAAAUGAUGUAAUAA
search = r"^(AUG(.)*(?:UAG|UAA|UGA))"
regions = re.findall(search, sequence)
print regions
The output should be "AUGCAAAA" and "AUGAUG". However I am getting the entire region 'AUGCAAAAUAAAUGAUGUAAUAA'
Looks like you need to use
AUG.*?(?=UAG|UAA|UGA)
See this regex demo
Details :
AUG
- match AUG
.*?
- any 0+ chars other than line break chars as few as possible up to the first... (?=UAG|UAA|UGA)
- UAG
or UAA
or UGA
(that are not part of the return value since the pattern is inside a positive lookahead that is a zero-width assertion).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.