简体   繁体   中英

Using look-ahead/behind on complex search in R/Perl regex

I can't figure out how to utilize lookaheads/behinds in a regular expression to find matches across individual search bits (?) of the word/motif I'm searching for.

In a set of DNA strings, I need to match TGGA + one C or T + 0-4 A/C/T/G + >= 5 C/T, but don't want a GT anywhere in the match. I've figured out how to eliminate this within the 0-4 A/C/T/G (example #1), but I can't figure out how to deal with cases where the G comes from the [A,C,T,G]{0,4} and the adjacent T comes from the {5,}.

I've tried adding a look behind after expanding the last part to [C,T](?>!GT)[C,T]{4,} and the look behind in front of the [A,C,T,G]{0,4} doesn't pick up the split GT instance. Any tips/help would be appreciated!

Current regex:

TGGA[C,T](?!GT)[A,C,T,G]{0,4}[C,T]{5,}

Example set:
1) TGGACGTGGTCCCCC (bad, dealt with)
2) TGGACGCCCCC (good)
3) TGGACGGGGTCCCCC... (bad, how do I fix this?)

在相关的G字符后使用否定前瞻表示T不应该遵循:

/TGGA[CT](?:[ACT]|G(?!T)){0,4}[CT]{5,}/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM