简体   繁体   中英

Reg ex - find part of string

I've got data of this type (repeated many times):

@@@FFDFFHHHHHJJFFHGIJJJJGI   
@M00332:5:000000000-A0TVJ:1:1:13498:26189 2:N:0:1   
ACCACAGCCGCTGCCCATTTGCATAA 
+

Using regexp I'm trying to select all lines which contain a specific string cagccgctgcccatttg . I'm a regex newbie, so I've tried this: \\w{3,}(cagccgctgcccatttg)\\w{3,}

Any help is much appreciated.

Cheers Simon

From what I understand, you want to gather all sequences which contain a single sub-sequence. I don't know what environment you're using, but this should return any sequence you're looking for in a very simple way.

([ACGT]{3,}CAGCCGCTGCCCATTTG[ACGT]{3,})

The brackets are a character class, meaning it matches any single character inside. You don't want to match \\w, you only want to match a character if it's one of the 4 you're looking for. Also, you can use parens to cover the whole regex to pick up the entire match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM