使用python 2.7和regex使用子字符串的开头和结尾（密码子）查找子字符串

Question

(python 2.7) I have a RNA sequence and I am trying to find all the non-overlaping substrings that start with 'AUG' and end in either 'UAG' or 'UGA' or 'UAA' this is what I'm working with: （python 2.7）我有一个 RNA 序列，我试图找到所有以“AUG”开头并以“UAG”或“UGA”或“UAA”结尾的非重叠子串，这就是我正在使用的：

import re
sequence = GAUGCAAAAUAAAUGAUGUAAUAA
search = r"^(AUG(.)*(?:UAG|UAA|UGA))" 
regions = re.findall(search, sequence)
print regions

The output should be "AUGCAAAA" and "AUGAUG".输出应为“AUGCAAAA”和“AUGAUG”。 However I am getting the entire region 'AUGCAAAAUAAAUGAUGUAAUAA'但是我得到了整个区域'AUGCAAAAUAAAUGAUGUAAUAA'

Answer 1

Looks like you need to use看起来你需要使用

AUG.*?(?=UAG|UAA|UGA)

See this regex demo看到这个正则表达式演示

Details :详情：

AUG - match AUG AUG - 匹配AUG
.*? - any 0+ chars other than line break chars as few as possible up to the first... - 除换行符以外的任何 0+ 个字符，在第一个之前尽可能少......
(?=UAG|UAA|UGA) - UAG or UAA or UGA (that are not part of the return value since the pattern is inside a positive lookahead that is a zero-width assertion). (?=UAG|UAA|UGA) - UAG或UAA或UGA （它们不是返回值的一部分，因为该模式位于作为零宽度断言的正前瞻内）。

使用python 2.7和regex使用子字符串的开头和结尾（密码子）查找子字符串

问题描述

1 个解决方案

解决方案1
2 2017-01-27 20:53:10

使用python 2.7和regex使用子字符串的开头和结尾（密码子）查找子字符串

问题描述

1 个解决方案

解决方案1 2 2017-01-27 20:53:10

解决方案1
2 2017-01-27 20:53:10