I've got data of this type (repeated many times):
@@@FFDFFHHHHHJJFFHGIJJJJGI
@M00332:5:000000000-A0TVJ:1:1:13498:26189 2:N:0:1
ACCACAGCCGCTGCCCATTTGCATAA
+
Using regexp I'm trying to select all lines which contain a specific string cagccgctgcccatttg
. I'm a regex newbie, so I've tried this: \\w{3,}(cagccgctgcccatttg)\\w{3,}
Any help is much appreciated.
Cheers Simon
From what I understand, you want to gather all sequences which contain a single sub-sequence. I don't know what environment you're using, but this should return any sequence you're looking for in a very simple way.
([ACGT]{3,}CAGCCGCTGCCCATTTG[ACGT]{3,})
The brackets are a character class, meaning it matches any single character inside. You don't want to match \\w, you only want to match a character if it's one of the 4 you're looking for. Also, you can use parens to cover the whole regex to pick up the entire match.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.