[英]Using python 2.7 and regex to find substring using beginning and end of substring (codons)
(python 2.7) I have a RNA sequence and I am trying to find all the non-overlaping substrings that start with 'AUG' and end in either 'UAG' or 'UGA' or 'UAA' this is what I'm working with: (python 2.7)我有一个 RNA 序列,我试图找到所有以“AUG”开头并以“UAG”或“UGA”或“UAA”结尾的非重叠子串,这就是我正在使用的:
import re
sequence = GAUGCAAAAUAAAUGAUGUAAUAA
search = r"^(AUG(.)*(?:UAG|UAA|UGA))"
regions = re.findall(search, sequence)
print regions
The output should be "AUGCAAAA" and "AUGAUG".输出应为“AUGCAAAA”和“AUGAUG”。 However I am getting the entire region 'AUGCAAAAUAAAUGAUGUAAUAA'
但是我得到了整个区域'AUGCAAAAUAAAUGAUGUAAUAA'
Looks like you need to use看起来你需要使用
AUG.*?(?=UAG|UAA|UGA)
See this regex demo看到这个正则表达式演示
Details :详情:
AUG
- match AUG
AUG
- 匹配AUG
.*?
- any 0+ chars other than line break chars as few as possible up to the first... (?=UAG|UAA|UGA)
- UAG
or UAA
or UGA
(that are not part of the return value since the pattern is inside a positive lookahead that is a zero-width assertion). (?=UAG|UAA|UGA)
- UAG
或UAA
或UGA
(它们不是返回值的一部分,因为该模式位于作为零宽度断言的正前瞻内)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.